Writing Apache Modules with Perl and C 43
Writing Apache Modules with Perl and C | |
author | Li |
pages | 724 |
publisher | O'Reilly, ISBN: 156592567X |
rating | 9.5/10 |
reviewer | darren chamberlain |
ISBN | |
summary | Absolutely essential for anyone who is considering using Apache and mod_perl. C programmers may need more. |
The Scenario
If you're like me, your first introduction to Perl [?] was in the form of CGI [?] scripts. A few years ago, I inherited a few dozen ancient CGI scripts (Perl and otherwise) that required Immediate Attention. CGI led to Perl, and to Apache [?] ; Perl and Apache led, naturally enough, to mod_perl [?] , once I started hitting the performance bottlenecks inherenent in CGI programming. After researching mod_perl, building a mod_perl-enalbed Apache, and reading all the available online documentation, I got it up and running--and I was suitably impressed.
So, when O'Reilly [?] announced a book devoted to programming Apache with Perl, I was extremely excited. The book starts with an introduction and history of web programming, introduces CGI and other types of web programming (server API [?] 's, such as ISAPI and NSAPI; embedded processors, such as mod_perl, mod_dtcl, and mod_pyapache; FastCGI; Java [?] servlets [?] ; ActiveX [?] ; and client-side scripting languages, such as VBScript [?] and JavaScript [?] ), and then describes the Apache module architecture, using some simple examples ("Hello, World" in Perl and in C). Then it gets good, covering dynamically generated content; the hobgoblin of HTTP, state; and all the other stuff that gives CGI programmer nightmares (like authentication and authorization).
What's Bad?
Although the title reads '... with Perl and C', the emphasis is very obviously on Perl. The C API reference chapters (chapters 10 and 11, pages 505 through 631) are very thorough, but almost all the examples are in Perl only. In fact, the authors go so far as to recommend that almost all Apache modules be written in Perl, and not C, except for very small modules or modules that need that extra speed boost or small memory footprint of being compiled into the server (page 13: "Anything you can do with the C API you can do with mod_perl with less fuss and bother."). Their reasoning is sound: mod_perl modules and scripts require a server restart at most, and often not even that, while for C modules, Apache itself must be recompiled; but I was expecting more in this area, perhaps a larger section on using DSO. After the book was published, however, several of the Perl-only examples were ported to the C API, and are available for download.
A few of these examples have already been published, and in these cases the book is mostly redundant. Notably, the Apache::NavBar module (which Lincoln uses on the server in his lab) and the Apache::AdBlocker module (chapters 4 and 7), appeared in The Perl Journal last year (issues 12 and 11). This is not that big a deal, since both of these modules are incredibly useful and probably deserve to be published in a few more places, but two brand new modules would have been most welcome, especially since the book's target audience probably also reads The Perl Journal.
What's Good?
There's a lot to like here. Since I'm a Perl programmer by trade and disposition, I personally liked the fact that 99.9% of the examples were written in Perl. With only a few exceptions, the modules could be copied into the right locations and run immediately; the exceptions were the modules that made use of either other programs (Chapter 5's Hangman program which uses a relational database to store state information) or specialized Apache features (Chapter 7's Apache::AdBlocker module, which requires proxy functionality).
Much of the text and all of the source code is available on the web at www.modperl.com. Chapters 6, 7, 8, and 9 can be found on the web site for the book, as can all the Perl modules and some of the examples in functional form (Apache::Magic and hangman).
Chapter 9 is the key chapter, and the heart of the book. It describes in great detail all the Apache:: modules. If you use mod_perl at all, download and print this chapter. Memorize it. Use your favorite indexing script to make it searchable. Everything you need to know about mod_perl is here in this chapter.
The appendices are also excellent, although, because it is an Apache book, I would have figured that several of the sections would be regular chapters, and not relegated to the end. The appendices are divided pretty evenly between concentrating on Perl and on C, unlike most of the rest of the book.
So What's In It For Me?
Fortunately for people like me, there is a lot of information about mod_perl on the web; The Perl Journal has had several articles on it, WebMonkey has had an article or two, and so on. There is a comprehensive mod_perl developer's guide on the offical Apache/Perl site. Lincoln Stein uses it a lot on his site and in his software. And, of course, we have the man pages and perldocs. So why do we need a book?
A few reasons. First and foremost, few of those sources go into the kind of detail that this book does, while still being approachable. Second, the book focuses on Apache, programming Apache, and (to a lesser extent) programming applications on the web; Perl and C are the means here, not the end. The in-depth technical discussions are about Apache: how it translates URI's to filenames, how it handles subrequests and internal redirects, how it maps files to MIME types. It then presents techniques for usurping these functions, customizing each phase of the reponse process, and explains when and why you would want to do this, instead of letting Apache do it's own thing. Creating checksums on the fly, compressing and decompressing data, creating extremely flexible HTML preprocessors, and modifying outgoing and incoming headers are some just some of the given examples.
The reference chapters are probably the single most valuable thing about the book. If you are a Perl programmer on a budget, you can download chapter 9 from the web site, but the C programmers out there have to buy the book to get the C API refernce. The C reference is 2 chapters (126 pages) long, and covers all the functions in precise detail.
For those among you who are using Microsoft operating systems, the book pays special attention to building, installing, and configuring mod_perl and Apache on Win32 systems, where it is different from Unix and Unix-like systems. Most of the actual modules are very similar (except for the obvious ones, such as scripts that call sendmail and the scripts that access MySQL), but the installation and building of mod_perl (or ApacheModulePerl.dll) are very different. The process is described in enough detail to make it possible, without boring those readers to whom it is irrelevant.
Conclusion
Programming Apache/mod_perl without this book is like writing Perl without the camel book. It can be done, but it is much easier and more enjoyable with the book. The writing is clear, informative, straight-forward, and, at times, amusing. The authors are the definitive sources for information on mod_perl and CGI programming, and this is reflected in every aspect of the book. While not as definitive for C programmers, it is still the best Apache API reference out there, other than the actual source code itself.
Purchase this book at Amazon.
Errata
Table of Contents
- Server-Side Programming with Apache
- A First Module
- The Apache Module Architecture and API
- Content Handlers
- Maintaining State
- Authentication and Authorization
- Other Request Phases
- Customizing the Apache Configuration Process
- Perl API Reference Guide
- C API Reference Guide, Part I
- API Reference Guide, Part II
- Standard Noncore Modules
- Building and Installing mod_perl
- Building Multifile C API Modules
- Apache:: Modules Available on CPAN
- Third-Party C Modules
- HTML::Embperl--Embedding Perl Code in HTML
hmmm.... (Score:1)
Book schmook! (Score:2)
----
Dave
All hail Discordia!
Re:Book schmook! (Score:4)
I think the idea is that the Perl interpreter is loaded at startup as part of the Apache process. The Perl programs are also compiled just once at startup. Once you've done this, running modules written in Perl simply involves interpreting bytecode, which although not as fast as C, is probably fast enough for most applications. Process creation overhead and loading / compiling scripts is usually the real killer for performance, not executing them.
Besides, how much time does the machine spend in the Perl script, and how much calling Apache API functions? And how relevant is any of this, given that the biggest bottleneck is often bandwidth, not CPU time?
It was a good introduction (Score:3)
Conflict of interest? (Score:2)
Has anyone else noticed that O'Reilly have their own web server [oreilly.com] which competes with Apache?
I expect the publishing and software divisions are kept separate, to avoid the IBM syndrome of products being squashed / crippled to avoid 'cannibalizing' sales of products from another division. But it still seems a bit strange.
Great Book (Score:2)
I knew many of the things discussed in it, but the added detail of the chapters taught me many new things. If you have access to a mod_perl server to develop on, this book will fill your head with great ideas for features, design strategies, and even does a great job of cataloging "fun" CPAN modules out there for the taking.
Re:hmmm.... (Score:1)
DBI is a system independent interface (Score:1)
I'm not sure how the book examples access MySQL, but I use DBI. Scripts run on NT or unix without modification. Otherwise, DBI would be pointless.
heh (Score:2)
This summer I was in intern at Cold Spring Harbor Biological Labs where Dr. Stein works as a bioinformatician! I got some help from him a bunch of times and worked with some of his postdocs.
We also heard a presentation from him regarding his internet interface to the DB of the C. elegans genome. He's a nice guy and something of an interesting character, and definately knows his perl!
Respectfully,
Kevin Christie
kwchri@maila.wm.edu
PS - Perl rules!!!
To moderate, or not to moderate ... (Score:2)
Perl is also a no-no (in mod_perl or straightforward standalone guise) for very heavily loaded sites. At Yahoo!, Perl is considered too resource hungry for use on the frontline webservers.
This leaves you in the unenviable situation of writing leakless, bugless C or C++ code. Catch 22 time
Chris Wareham
Yeah really. (Score:1)
What kind of hand holding is next? Apache Module Wizard integrated into bash?
Let's face it; most computer books are written purely for profit. Particularly ones about dreary, passionless, narrowly-defined topics like writing extensions to a particular application.
This is an excellent book for C module writers (Score:1)
This book makes a great comliment to online docs for C module writers. I'm also on the Apache module writers mailing list and I happen to know that most of the other people on that list refer to this book often -- it is the defacto bible for Apache module writers who use C.
A vote against the Everything links (Score:1)
Just my $0.02
Help the authors make money (Score:3)
There are links to Amazon.com and O'Reilly.
Cheers,
-jwb
Re:It was a good introduction (Score:1)
Nevertheless, I plan on picking up a copy of this book. 8)
--jeddz
p.s. every time I think about moderating a topic, I end up posting to it!
Re:Conflict of interest? (Score:2)
note that WebSite Pro only runs on NT/98/95 whereas Apache runs on whatever you can build it on. And O'Reilly use Solaris as the hardware for www.ora.com and linux.ora.com ( the latter is definately running Apache for the webserver, the former cannot be running website) and others check out Netcraft details for Ora sites [netcraft.com].
Website Pro does look to be quite a nice product, and should displace IIS as a good sererver for these platforms (NT etc).
Re:Book schmook! (Score:1)
Then could you share with us where on the web you have found useful documentation for the C API? I sure would like to find such a thing, and do not believe that it exists until someone proves otherwise. The part of the API spec [apache.org] in the online manual [apache.org] that are finished are fine; but there are very large and important areas that are not covered at all. In some places, the author evidently did not get past his outline (here's [apache.org] an example).
This is a gap that evidently needs to be filled in by the book.
Why anyone would want to burden their server with modules written in Perl is beyond me though.
I have written an Apache module in C and quite a few handlers in mod_perl, and the advantage of mod_perl is just as the same as any Perl programming over C -- you get a lot more done a lot more quickly. I can get a handler done in mod_perl in an hour that would probably take me all day if I wrote it in C. A C module forces you to spend too much time memory management, string twiddling and core-dumping, and Perl is a great relief from all that.
To be sure, sometimes you need to squeeze every last bit of speed out of your software, and that's when you probably need C instead of Perl. That's why I wrote the Apache module. But if you need to program to the Apache API in a hurry, mod_perl's the thing.
A great Apache book in general... (Score:1)
Re:DBI is a system independent interface (Score:1)
The book uses DBI in their database examples. I'm sure everyone is also aware of the Apache::DBI module, which keeps persistent database handles available for each child process of apache.
Care to enlighten us with your recommendations (Score:1)
How do you suggest people generate dynamic web pages?
Re: (Score:1)
Care to enlighten us with your recommendations (Score:1)
How do you recommend people generate dynamic web pages.
Apache Modules (Score:1)
I *REALLY* disagree with the author's assertion that apache modules should be written in perl. Many apache modules end up being glue into an existing system. Most of the benefit of being an apache module goes away if it consists of perl code that calls 'system' on existing programs. For peak performance, the existing code must be glued directly into apache, which means using C.
Re:Care to enlighten us with your recommendations (Score:1)
What, me? You're not asking me, are you?
Well, anyway, at the moment I'm developing a web application using CGI and Perl, together with a handful of useful libraries such as CGI.pm . Later, I can move it to something like mod_perl (or in this case, PerlIIS) to increase performance.
I think the most important decision is how you will store your data. Will you use an SQL database, a flat file, serialization (something like the Perl Storable module) or even something funky like OpenLDAP?
If you want SQL, it might be good to go for something like PHP or ASP that has extensive SQL support 'built-in'. Of course Perl has SQL modules too, but it's probably not quite as easy (I haven't used PHP / ASP). If you don't, you have a much freer choice. For groups I really cannot say what to use, but if you're working on your own, just use your favourite high-level language. It probably isn't worth learning a new scripting language just for Web development - there are too many already.
If you don't already know a scripting language, go out and learn Perl at once. Yes, I know Python and many others are a lot cleaner, but Perl is fun to learn, there are lots of good books on it, and you'll probably end up having to use it someday anyway.
definitely reccommend this book (Score:1)
My recommendations (Score:2)
The actual web pages tend to be HTML hardcoded into C and C++ programs, with the dynamic stuff coming from the database or memmapped files. For instance, I am currently writing a reporting system. This is a C++ database load program that uploads the tables once every 24 hours. The searching is done by several C programs tailored to the individual search being performed - in other words one program for editors, another for authors. The nearest thing to 'templates' that it uses is a static library that has output routines for various headers, footers and standard menus.
This is a little bit more laborious than using say PHP3, or mod_perl. However, it is blisteringly fast and efficient.
One reason I tend to shy away from Perl besides the performance or resources issue, is the question of maintainability. It is very easy to get the job done quickly in Perl. It's also easy to write terribly unreadable code. One of the systems that I am replacing is simply line noise and a bunch of cron jobs. The other does absolutely no error checking, and has been missing many errors in the data feed for the last two years.
You may argue that the issue of Perl code maintainability is down to the authors of the original systems, but Perl encourages quick hacks. When these hacks go into production they end up being a nightmare to maintain or enhance.
Chris Wareham
Re: Agreed, C is lighter on the httpd daemon (Score:1)
Amazon uses CGI programs written in C, too. (Score:1)
"Amazon.com has a market capitalization of $5.75 billion (August 10, 1998). They built their site with compiled C CGI scripts connecting to a relational database. You could not pick a tool with a less convenient development cycle. You could not pick a tool with lower performance (forking CGI then opening a connection to the RDBMS). They worked around the slow development cycle by hiring very talented programmers. They worked around the inefficiencies of CGI by purchasing massive Unix boxes ten times larger than necessary. Wasteful? Sure. But insignificant compared to the value of the company that they built by focusing on the application and not fighting bugs in some award-winning Web connectivity tool programmed by idiots and tested by no one."
Re:Care to enlighten us with your recommendations (Score:2)
Re:Care to enlighten us with your recommendations (Score:1)
Re:Care to enlighten us with your recommendations (Score:1)
Re:OT: Apache mod_jserv (Score:1)
So... if you know Java, but not Perl, you should use mod_jserv.
--