Some Linux server admins are comfortable with
wading through text logfiles, but why wade when you can create beautiful charts
and graphs that highlight trouble spots? Try the excellent CairoPlot for
beautiful, informative visual server log.
CairoPlot isn't packaged for most distros, but
it's an easy install. The current release is version 1.1 at the CairoPlot Launchpad page. You can
download the cairoplot-1.1.tar.gz from there, or check it out with bzr if you
prefer. (Once 1.2 is ready the project may move to Sourceforge.)
Extract the tarball:
$ tar xvf cairoplot-1.1.tar.gzthen copy one file, cairoplot-1.1/CairoPlot.py, to the directory where you'll be developing your Python script.
Pie Charts: Who's sending spam?
When playing with plotting, finding a good
source of data is always the first step. For this project, let's analyze a
Postfix log file, /var/log/mail.info to look at the sources of one
class of spam.
A casual glimpse through the file reveals that
we're getting a lot of mail delivery attempts where the sender claims an address
that doesn't really exist, like this one:
Mar 5 15:05:45 mailserver postfix/smtpd[29764]:
NOQUEUE: reject: RCPT from
212.199.94.45.static.012.net.il[212.199.94.45]: 450 4.7.1 <ex02.maccabiworld.org>: Helo command
rejected: Host not found; from=<> to= proto=ESMTP
helo=
Our postfix server rejects mail like this,
because it's usually spam. Properly configured mail servers shouldn't make up
bogus addresses -- though a few misconfigured ones do.
But where do these bogus requests come from? Do
they come from specific countries? How many from .com or .org versus from
specific country domains?
To find out, I'll create a Python dictionary,
then use CairoPlot to plot a pie chart. Each key in the dictionary will be a
top-level domain, e.g. "com"; the value will be the number of rejected messages
seen from that domain.
Parsing the Log File
Filling out the dictionary means
parsing /var/log/mail.info. The address each message really came from
shows up in the RCPT from; get it using Python's re module. Since this
is an article about CairoPlot, not Python regular expressions,
just take my word for the code that follows.
#! /usr/bin/env python import CairoPlot, re MAIL_INFO = "/var/log/mail.info" # Dictionary to store the results as (domain : number of rejects) rejected = {} # Parse mail.info to find all the 'NOQUEUE: reject' lines and # figure out what top-level domains (TLDs) they're coming from. f = open(MAIL_INFO) for line in f : if line.find('status=sent') > 0 : pass elif line.find('NOQUEUE: reject') > 0 : # An attempt we rejected. Look for a pattern like # RCPT from foo.example.com[nnn.nnn.nnn.nnn] rcpt = re.search("RCPT from ([^[]*)\[([0-9\.]+)\]", line) if not rcpt : continue # Now rcpt.group(1) is the reverse-DNS hostname (if any) # from the log file, rcpt.group(2) is the IP address. if rcpt.group(1) and rcpt.group(1) != 'unknown' : hostname = rcpt.group(1) else : hostname = None # Find the part after the last "." tld = "Unknown" # default there's no "." in the hostname if hostname : dot = hostname.rfind(".") if dot >= 0 : tld = hostname[dot+1:] if tld in rejected : # We've seen this TLD before; add 1. rejected[tld] += 1 else : # First time we've seen this TLD. rejected[tld] = 1 f.close()
At the end of this, rejected is a
dictionary suitable for passing to CairoPlot, like this:
{'ru': 3, 'ch': 1, 'ma': 2, 'rs': 2, 'it': 4, 'hu': 1, 'cz': 1, 'ar': 2, 'il': 35, 'br': 16, 'es': 1, 'co': 2, 'net': 4, 'com': 24, 'pl': 7, 'at': 2}
Generating a Pie Chart
How do you generate
a pie chart from a dictionary? It only takes one line:
figure 1
CairoPlot.pie_plot("piechart", rejected, 500, 500, None, True, False, None)
CairoPlot will produce a graphics file named
pie.svg (Figure 1).
figure 1
The arguments are:
pie_plot(name, data, width, height, background=None, gradient=False, shadow=False, colors=None)
name is the filename: if you include an
extension such as .jpg, CairoPlot will use that format instead of SVG format, in
case you need a graphic that even IE users can view on a website.
data, of course, is the dictionary of
values.
width and height are the
desired size of the plot. Notice that CairoPlot leaves quite a bit of extra
space around the outside of the pie, so plan accordingly.
background lets you specify a
background color as a tuple of red, green and blue, so background=(0, 1,
0) would give a solid green background. You can also pass a Cairo gradient
here. gradientspecifies whether the pie slices themselves should show a
gradient, which makes the plot prettier.shadow lets you add a drop
shadow on the whole piechart, and you can pass an array of
custom colors-- again, tuples or gradients -- if you don't like the
default colors. The colors list must have exactly the same number of
entries as the data dictionary.
A minor problem with the chart in Figure 1: it
turns out most hosts with invalid HELO addresses aren't resolvable at all, and
the rest of the chart gets all squinched into a tiny piece of pie. What happens
if you toss out all those unknowns? You can do that by adding
one else clause after the if hostname:
if hostname : dot = hostname.rfind(".") if dot >= 0 : ext = hostname[dot+1:] else : continue
Run that, and the piechart looks like Figure 2.
Quite interesting! I had no idea, before writing this example, that I got so
much spam from Israel and Brazil compared to other countries. Sometimes a
picture really is worth a thousand words.
figure 2
Bar Charts
CairoPlot makes pretty bar charts, too.
Unfortunately, CairoPlot's various methods aren't consistent about their input,
and bar_plot wants a list, not a dictionary.
No problem! Just convert that dictionary to two
lists -- one for the labels, one for the data -- and
callbar_plot (Figure 3):
h_labels = [ k for k in rejected.keys() ] rejlist = [ rejected[k] for k in rejected.keys() ] CairoPlot.bar_plot ('bars', rejlist, 500, 400, border=5, three_dimension=True, h_labels=h_labels)
figure 3
Again, you can pass a list of colors if you want
custom colors, and there are a few other options available,
like background, grid, rounded_corners, h_bounds and v_bounds,
and of course v_labels as well as h_labels.
Of course, CairoPlot can do other types of
graphs as well. There's some documentation
here, or you can use the interactive Python interpreter and
type
import CairoPlot help(CairoPlot.pie_plot)Eventually CairoPlot may move to Sourceforge and have a more organized website. But in the meantime, if you experiment a bit, you'll find it's one of the best packages around for making pretty, colorful graphs.
Akkana Peck is a longtime Linux programmer,
and the author of Beginning GIMP: From Novice to
Professional.
I installed cairo using pip: pip install cairocffi then changed 'import cairo' to 'import cairocffi as cairo' in my code & in CairoPlot.py file. but when I tried to generate bar graph, I got this error ' IOError, [Errno cairo returned CAIRO_STATUS_WRITE_ERROR: error while writing to output stream] 11 '
ReplyDeleteMy code is:
data = [[3,4], [4,8], [5,3], [9,1]]
v_labels = ["line1", "line2", "line3", "line4", "line5", "line6"]
h_labels = ["group1", "group2", "group3", "group4"]
CairoPlot.bar_plot ('graphName', data, 600, 200, border = 20, grid = True, h_labels = h_labels, v_labels = v_labels)