Sunday, January 16, 2011

Book review: Pro Python System Administration

Summary: Pro Python System Administration is a comprehensive book showing how Python can be used effectively to perform a variety of system administration tasks.  I would recommend it highly to anyone having to do system administration work.  For more information, please consult the author's web site.
===

There is a saying that "no good deed goes unpunished".  I feel that a counterpart should be "no bad talk goes unrewarded".  At Pycon 2009, I gave a talk on plugins that has to be amongst the worst presentations I ever gave. Yet, as an unexpected result from that talk, I received a free copy of Pro Python System Administration written by Rytis Sileika. This blog entry is a review of that book.

This book is written for system administrators, something in which I have no experience; therefore, this review will definitely not have the depth that an expert may have given it.

Four general areas of system administrations are covered: network management, web server and web application management, database system management, and system monitoring.  Examples are given on a Linux system with no indication as to whether or not a given example is applicable to other operating systems. Given that Python works with all major operating systems, and that the book focuses on using Python packages, I suspect that the content would be easily adaptable to other environments. 

While the book is classified, on the cover, as being addressed to advanced Python programmers, the author in the book introduction indirectly suggests that this book would be appropriate for people that have some minimal experience with Python.  I suspect that the classification on the book cover was done (wrongly) by the editor as I found the examples very readable and I would not claim to be an advanced Python programmer.

The book is divided into 13 chapters, each focused on one or a few well-defined tasks. While the tasks in a given chapter are relatively independent of those of other chapters, there is a natural progression in terms of topics introduced and it is probably better to read the book in the natural sequence rather than reading chapters randomly - in other words, this book is not simply a random collection of recipes for a series of tasks.

    1. Reading and Collecting Performance Data Using SNMP

In this chapter, Sileika introduces the Simple Network Management Protocol, or SNMP after which he shows how one can query SNMP devices using Python and the PySNMP library. In the second part of that chapter, he introduces RRDTool, an application for graphing monitoring data, and shows how to interact with it using the rrdtool module. In the last section of this first chapter, he shows how to create web pages (with the eventual goal of displaying on the web monitoring data) using the Jinja2 templating system.

    2. Managing Devices Using the SOAP API 

In this chapter, Sileika introduces the Simple Object Access Protocol or SOAP and gives examples based on using the Zolera SOAP Infrastructure (ZSI) package.  The bulk of the chapter focuses on explaining how to manage and monitor Citrix Netscaler load balancers. The Python logging module is introduced and used.


    3. Creating a Web Application for IP Address Accountancy  

In this chapter, the Django framework is introduced and used to build a web application that maintains IP addresses allocation on an internal network.  Rather than using the web server included with Django, Sileika shows how to use Django with the Apache web server.

    4. Integrating the IP Address Application with DHCP

This chapter is a continuation of the previous one, where the application previously developed is enhanced with the addition of Dynamic Host Configuration Protocol (DHCP) services as well as a few others. More advanced Django methods are included as well as some AJAX calls.

    5. Maintaining a List of Virtual Hosts in an Apache Configuration File

Another Django based application is introduced, this time with a focus on the administration side.
  
    6. Gathering and Presenting Statistical Data from Apache Log Files

This chapter focuses on building a plugin-based modular framework to analyze log files.  The content of this chapter is the reason why I received a free copy of the book in the first place: Sileika mentioned to me in an email that the architecture described was mostly inspired by the presentation I gave at PyCon with a few modifications that allow for information exchange between the plug-in modules.  When I got the original email, I was really surprised given that I had tried to forget about the talk I had given on plugins. 

In my opinion, Sileika does an excellent job in explaining how plugins can be easily created with Python and used effectively to design modular applications.

Dynamic discovery and loading of plugins is illustrated, and the GeoIP Python library is used to find the physical location corresponding to a given IP address.
 
    7. Performing Complex Searches and Reporting on Application Log Files

This chapter shows how to use Exctractor, an open source log file parser tool, to parse more complex log files than those generated by an Apache server as illustrated in the previous chapter. Of note is the introduction and usage of Python generators as well as an introduction to parsing XML files with Python.
 
    8. A Web Site Availability Check Script for Nagios 

This chapter shows how to use Python with Nagios, a network monitoring system. Python packages/modules illustrated include BeautifulSoup, urllib and urllib2.  The monitoring scripts developed do more than simply checking for site availability and actually test the web application logic.

    9. Management and Monitoring Subsystem 
    10. Remote Monitoring Agents
    11. Statistics Gathering and Reporting  

Chapters 9, 10 and 11 explain how to build a "simple" distributed monitoring system.  A number of Python module & packages are used including xmlrpclib, SimpleXMLRPCServer, CherryPy, sqlite3, multiprocessing, ConfigParser, subprocess, Numpy, matplotlib and others.  The application developed is a good example of using Python as a "glue language", to use third-party modules & packages.  One weakness is that the introduction to statistics included is rather elementary and, I believe, could have been shortened considerably given the intended audience.

    12. Automatic MySQL Database Performance Tuning 

This chapter revisits the use of plugins, with a slightly more advanced application, where information can be exchanged between the different plugins.

    13. Using Amazon EC2/S3 as a Data Warehouse Solution

This last chapter gives a crash course on Amazon's "cloud computing" offerings.  It is a good final note to the book, and a good starting point for future explorations.


Overall, I found the book quite readable even though it is outside of my area of expertise.  Occasionally, I had the feeling that there were a few "fillers" (e.g. overlong log listings, etc.) that could have been shortened without losing anything of real value.  This is very much an applied book, with real life examples that could be either used "as-is" or used as starting points for more extended applications.

I would recommend this book highly to anyone who has to perform any one of the system administration tasks mentioned above. It is also a good source of non-trivial examples demonstrating the use of a number of Python modules and packages.  The code that is given would likely save many hours of development.   As is becoming the norm, the source code included in the book is available from the publisher.  Also, many of the prototypes covered in the book are available as open source projects at the author's web site http://www.sysadminpy.com, where a discussion forum is also available.