Forgot your password?
typodupeerror
Book Reviews Books Media

High Performance Web Sites 132

Posted by samzenpus
from the heavy-duty-net dept.
Michael J. Ross writes "Every Internet user's impressions of a Web site is greatly affected by how quickly that site's pages are presented to the user, relative to their expectations — regardless of whether they have a broadband or narrowband connection. Web developers often assume that most page-loading performance problems originate on the back-end, and thus the developers have little control over performance on the front-end, i.e., directly in the visitor's browser. But Steve Souders, head of site performance at Yahoo, argues otherwise in his book, High Performance Web Sites: Essential Knowledge for Frontend Engineers." Read on for the rest of Michael's review.
High Performance Web Sites
author Steve Souders
pages 168
publisher O'Reilly Media
rating 9/10
reviewer Michael J. Ross
ISBN 0596529309
summary 14 rules for faster Web pages
The typical Web developer — particularly one well-versed in database programming — might believe that the bulk of a Web page's response time is consumed in delivering the HTML document from the Web server, and in performing other back-end tasks, such as querying a database for the values presented in the page. But the author quantitatively demonstrates that — at least for what are arguably the top 10 sites — less than 20 percent of the total response time is consumed by downloading the HTML document. Consequently, more than 80 percent of the response time is spent on front-end processing — specifically, downloading all of the components other than the HTML document itself. In turn, cutting that front-end load in half would improve the total response time by more than 40 percent. At first glance, this may seem insignificant, given how few seconds or even deciseconds it takes for the typical Web page to appear using broadband. But any delays, even a fraction of a second, accumulate in reducing the satisfaction of the user. Likewise, improved site performance not only benefits the site visitor, in terms of faster page loading, but also the site owner, with reduced bandwidth costs and happier site visitors.

Creators and maintainers of Web sites of all sizes should thus take a strong interest in the advice provided by "Chief Performance Yahoo!," in the 14 rules for improving Web site performance that he has learned in the trenches. High Performance Web Sites was published on 11 September 2007, by O'Reilly Media, under the ISBNs 0596529309 and 978-0596529307. As with all of their other titles, the publisher provides a page for the book, where visitors can purchase or register a copy of the book, or read online versions of its table of contents, index, and a sample chapter, "Rule 4: Gzip Components" (Chapter 4), as a PDF file. In addition, visitors can read or contribute reviews of the book, as well as errata — of which there are none, as of this writing. O'Reilly's site also hosts a video titled "High Performance Web Sites: 14 Rules for Faster Pages," in which the author talks about his site performance best practices.

The bulk of the book's information is contained in 14 chapters, with each one corresponding to one of the performance rules. Preceding this material are two chapters on the importance of front-end performance, and an overview of HTTP. Together these form a well-chosen springboard for launching into the performance rules. In an additional and last chapter, "Deconstructing 10 Top Sites," the author analyzes the performance of 10 major Web sites, including his own, Yahoo, to provide real-world examples of how the implementation of his performance rules could make a dramatic difference in the response times of those sites. These test results and his analysis are preceded by a discussion of page weight, response times, YSlow grading, and details on how he performed the testing. Naturally, if and when a reader peruses those sites, checking their performance at the time, the owners of those sites may have fixed most if not all of the performance problems pointed out by Steve Souders. If they have not, then they have no excuse, if only because of the publication of this book.

Each chapter begins with a brief introduction to whatever particular performance problem is addressed by that chapter's rule. Subsequent sections provide more technical detail, including the extent of the problem found on the previously mentioned 10 top Web sites. The author then explains how the rule in question solves the problem, with test results to back up the claims. For some of the rules, alternative solutions are presented, as well as the pros and cons of implementing his suggestions. For instance, in his coverage of JavaScript minification, he examines the potential downsides to this practice, including increased code maintenance costs. Every chapter ends with a restatement of the rule.

The book is a quick read compared to most technical books, and not just due to its relatively small size (168 pages), but also the writing style. Admittedly, this may be partly the result of O'Reilly's in-house and perhaps outsource editors — oftentimes the unsung heroes of publishing enterprises. This book is also valuable in that it offers the candid perspective of a Web performance expert, who never loses sight of the importance of the end-user experience. (My favorite phrase in the book, on page 38, is: "...the HTML page is the progress indicator.")

The ease of implementing the rules varies greatly. Most developers would have no difficulty putting into practice the admonition to make CSS and JavaScript files external, but would likely find it far more challenging, for instance, to use a content delivery network, if their budget puts it out of reach. In fact, differences in difficulty levels will be most apparent to the reader when he or she finishes Chapter 1 (on making fewer HTTP requests, which is straightforward) and begins reading Chapter 2 (content delivery networks).

In the book's final chapter, Steve Souders critiques the top 10 sites used as examples throughout the book, evaluating them for performance and specifically how they could improve that through the implementation of his 14 rules. In critiquing the Web site of his employer, he apparently pulls no punches — though few are needed, because the site ranks high in performance versus the others, as does Google. Such objectivity is appreciated.

For Web developers who would like to test the performance of the Web sites for which they are responsible, the author mentions in his final chapter the five primary tools that he used for evaluating the top 10 Web sites for the book, and, presumably, used for the work that he and his team do at Yahoo. These include YSlow, a tool that he created himself. Also, in Chapter 5, he briefly mentions another of his tools, sleep.cgi, a freely available Perl script that tests how delayed components affect Web pages.

As with any book, this one is not perfect — nor is any work. In Chapter 1, the author could make more clear the distinction between function and file modularization, as otherwise his discussion could confuse inexperienced programmers. In Chapter 10, the author explores the gains to be made from minifying JavaScript code, but fails to do the same for HTML files, or even explain the absence of this coverage — though he does briefly discuss minifying CSS. Lastly, the redundant restatement of the rules at the end of every chapter, can be eliminated — if only in keeping with the spirit of improving performance and efficiency by reducing reader workload.

Yet these weaknesses are inconsequential and easily fixable. The author's core ideas are clearly explained; the performance improvements are demonstrated; the book's production is excellent. High Performance Web Sites is highly recommended to all Web developers seriously interested in improving their site visitors' experiences.

Michael J. Ross is a Web developer, freelance writer, and the editor of PristinePlanet.com's free newsletter.

You can purchase High Performance Web Sites from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

High Performance Web Sites

Comments Filter:
  • by Anonymous Coward on Wednesday October 10, 2007 @02:57PM (#20930429)
    Everyone knows that Rule 34 on the internet is "If it exists, there is porn of it".
  • by NickFitz (5849) <slashdot@nickfiH ... minus herbivore> on Wednesday October 10, 2007 @02:58PM (#20930441) Homepage

    does this sound suspiciously like an advertisement for YSlow in book form?

    What's suspicious about the fact that a book written by the creator of YSlow addresses the very issues that YSlow, a free open source Firefox extension, addresses? It would be pretty strange if it didn't.

    If you want to be so paranoid about the intentions of an author, at least find one it's reasonable to be suspicious about in the first place.

  • by QuietLagoon (813062) on Wednesday October 10, 2007 @02:59PM (#20930453)
    ... if Yahoo's website were not dog slow all the time.
  • by redelm (54142) on Wednesday October 10, 2007 @03:02PM (#20930493) Homepage
    If you're responsible for the response time of some webpages, then you've got to do your job! First test a simple static webpage for a baseline.

    Then every added feature has to be justified -- perceived added value versus cost-to-load. Sure, the artsies won't like you. But it isn't your decision or theirs. Management must decide.

    For greater sophistication, you can measure your dl rates by file to see how much is in users caches. And decide whether these are also not a cause of slowness!

  • by morari (1080535) on Wednesday October 10, 2007 @03:08PM (#20930585) Journal
    I hate that the typical webpage assumes that everyone has broadband these days. The finesse and minimalist approach of yesteryear no longer applies. Even with broadband at 100%, smaller is always better. No one wants to put the effort in that would go toward efficiency though.
  • by Dekortage (697532) on Wednesday October 10, 2007 @03:11PM (#20930633) Homepage

    All my pages are static HTML. Not a web application in site, not even PHP. Yes, it's a drag when I need to do some kind of sitewide update, like adding a navigation item.

    Umm... there are plenty of content management systems (say, Cascade [hannonhill.com]) that manage content and publish it out to HTML. Even Dreamweaver's templating system will do this. Just because you use pure HTML, doesn't mean you have to lose out on sitewide management control.

  • Odd Summary (Score:4, Insightful)

    by hellfire (86129) <`moc.liamg' `ta' `vdalived'> on Wednesday October 10, 2007 @03:24PM (#20930815) Homepage
    Web developers often assume that most page-loading performance problems originate on the back-end, and thus the developers have little control over performance on the front-end, i.e., directly in the visitor's browser. But Steve Souders, head of site performance at Yahoo, argues otherwise in his book, High Performance Web Sites: Essential Knowledge for Frontend Engineers."

    Let's correct this summary a little bit. First, it's NOVICE Web developers who would think this. Any web developer worth their weight knows the basic idea that java, flash, and other things like it make a PC work hard. The website sends code, but the PC has to execute the code, rather than the website pushing static or dynamic HTML and having it simply render. We bitch and moan enough here on slashdot about flash/java heavy pages, I feel this summary is misdirected as if web developers here didn't know this.

    Secondly, there's no argument, so Steve doesn't have to argue with anyone. It's a commonly accepted principle. If someone didn't learn it yet, they simply haven't learned it yet.

    Now, I welcome a book like this because #1 it's a great tool for novices to understand the principle of optimization on both the server and the PC, and #2 because it hopefully has tips that even the above average admin will learn from. But I scratch my head when the summary makes it sound like it's a new concept.

    Pardon me for nitpicking.
  • Re:gzip (Score:4, Insightful)

    by cperciva (102828) on Wednesday October 10, 2007 @03:28PM (#20930875) Homepage
    Unlike bzip2, gzip is a streaming compression format; so the web browser can start parsing the first part of a page while the rest is still being downloaded.
  • by lgordon (103004) <larry.gordon@ g m a i l .com> on Wednesday October 10, 2007 @03:28PM (#20930885) Journal
    Getting rid of banner ads at the source is what causes most page loading time, and it's usually a fault of the browser renderer than anything else. A lot of times these javascript ad servers are horrible performance wise. It can also be the fault of the ad networking company when their servers get overloaded, causing undue delay before the ad is served to the client. Something to think about when choosing ad placement on a site.

    Putting an adblocker of some sort or Mozilla Adblock Plus is a great way to speed up any page (from the user's point of view, of course).
  • by spikeham (324079) on Wednesday October 10, 2007 @03:39PM (#20931109)
    In the mid-90s Yahoo! pared down every variable and path in their HTML to get the minimum document size and thus fastest loading. You'd see stuff in their HTML like img src=a/b.gif and a minimum of spaces and newlines. However, back then most people had dialup Internet access and a few KB made a noticeable difference. In the past few years, mainstream Web sites pretty much assume broadband. Don't bother visiting YouTube or MySpace if you're still on a modem. Aside from graphics and videos, one of the main sources of bloat is Web 2.0. Look at the source of a Web 2.0 site, even Yahoo!, and often you see 4 times as many bytes of Javascript as HTML. All that script content not only has to be retrieved from the server, but also takes time to evaluate on the client. Google is one of the few heavily visited sites that has kept their main page to a bare minimum of plain HTML, and it is reflected in their popularity. If you visit a page 10 times a day you don't want to be slowed down by fancy shmancy embedded dynamic AJAX controls.

    - Spike
    Freeware OpenGL arcade game SOL, competitor in the 2008 Independent Games Festival: http://www.mounthamill.com/sol.html [mounthamill.com]
  • Sort of... (Score:3, Insightful)

    by Roadkills-R-Us (122219) on Wednesday October 10, 2007 @03:41PM (#20931149) Homepage
    It's really irrelevant whether they actually understand the real problem or not when what they do is broken. I don't care of they really don;t know or just have a mandate from someone who doesn't know or if they're just too clueless to realize that what happens on their high end system on their high speed LAN has little to do with what Jenny and Joey Average see at home on their cheap Compaq from WalMart with about half the RAM it should have for their current version of Bloated OS. The end result is the same.

    And, in fact, a lot of web site developers fit one, two or three of the above categories. It's not just novices. a ridiculous percentage of websites suck performance-wise, and it's not just the myspaces and hacked up CMSes and such; a lot of corporate sites fall into this category as well, from financial institutions to ebay to auto manufacturers and dealers to swimming pool installation companies.
  • "Web developers often assume that most page-loading performance problems originate on the back-end, and thus the developers have little control over performance on the front-end,"

    Those Web designers should be called "Unemployed"
  • by arete (170676) <areteslashdot2@x[ ]net ['ig.' in gap]> on Wednesday October 10, 2007 @04:02PM (#20931433) Homepage
    As a solution to speed alone, the right answer (as some other posts mentioned) is a CMS/publishing solution that makes static HTML pages once on a change. The most braindead way to do this is to put an aggressive squid/apache cache in front of your server, and only refresh the cache every half-hour or on demand; nobody gets to go directly to the dynamic site and you have a minimal investment in the conversion. But certainly just using an automated system to write-out HTML files works too.

    Using AJAX you have to also remember that you're giving away all your code - and that any user with GreaseMonkey can REWRITE your code to do whatever they want. So your scenario only works out if 100% of your application data for all time is supposed to be viewable (at least) by all users. (Which is not to mention a significant number of other AJAX security potholes.)

    Use AJAX to save page refreshes (eg Google Maps) - and only that. For any real world app, your server needs to control your data.

    And if you need help implementing this, drop me a reply ;)
  • by guaigean (867316) on Wednesday October 10, 2007 @04:05PM (#20931471)

    No one wants to put the effort in that would go toward efficiency though.
    That's not an accurate statement. A LARGE amount of time is spent on the very big sites to maximize efficiency. It is the largest of sites that truly see the benefits of optimization, as it can mean very large savings in fewer servers, bandwidth fees, etc. A better statement might be "People with low traffic sites don't want to put the effort in that would go toward efficiency though."
  • by SirJorgelOfBorgel (897488) * on Wednesday October 10, 2007 @04:40PM (#20931979)
    I have read a large number of excerpts (one for every paragraph) of this book in response to a mention of this book in the #jquery IRC channel. A few people were very much anticipating this book. A lot of discussion followed on some of the subjects. Ofcourse, this book makes some very good points, like how the front-end speed is important and only partially dependant on server response times. I will not go into the specifics (I could write a book myself :D), but some things, you might think the author is smoking crack.

    I have looked at the book again now, and there seem to have been some changes. For example, there were only 13 rules when I was reviewing those before. Now there are 14. As one example, ETags were advised to not be used at all (IIRC, my biggest WTF about the book - if used correctly, ETags are marvellous things and compliment 'expires' very nicely), instead of the current 'only use if done correctly'. Some other things are nigh impossible to do correctly crossbrowser (think ETag + GZIP combo in IE6, AJAX caching in IE7, etc). To be honest, I found pretty much all of this stuff being WebDevelopment 101. If you're not at the level that you should be able to figure most of these things out for yourself, you probably won't be able to put them into practise anyway, and you should not be in a place where you are responsible for these things.

    I might pick up this book just to read it again, see about the changes and read the full chapters, just to hear the 'other side of the story', but IMHO this book isn't worth it. In all honesty, the only thing I got out of it so far that I didn't know is the performance toll CSS expressions take (all expressions are literally re-evaluated at every mouse move), but I hardly used those anyways (only to fix IE6 bugs), and in response have written a jQuery plugin that does the required work at only the wanted times (and I've told you this now, so no need to buy the book).

    My conclusion, based solely on the fairly large number if excerpts I've read is: if you're a beginner, keep this book off for a while. If you're past the beginner stage but your pages are strangly sluggish, this book is for you. If you've been around, you already know all this stuff.
  • by shmlco (594907) on Wednesday October 10, 2007 @05:12PM (#20932477) Homepage
    Not to mention that that particular approach is probably a huge no-no when it comes to accessibility and search indexing. I mean, do you really expect Google to run all of your scripts when it spiders your page?
  • by shmlco (594907) on Wednesday October 10, 2007 @05:24PM (#20932661) Homepage
    Ads from third-party sites. Scripts and trackers from third-party sites (like Google Analytics or page counters). Scripted web page widgets from third-party sites.

    Basically anything that's not under your control can slow your site down significantly.
  • Re:Solution (Score:3, Insightful)

    by dsginter (104154) on Wednesday October 10, 2007 @06:56PM (#20933735)
    Wow - this is wonderful, constructive feedback. But allow me to make some suggestions on your wording. For example, the following statement:

    sounds good, except you may or may not know that a lot of javascript implementations are sloooow. not to mention you usually have to set the no cache headers for everything in the page so your javascript works right.

    I find that sites built with the method you describe are the asshole sites that fuck with browser history, disable the back button, try to disable the context menu, and those dumb ass tricks to get around the fact they don't know how to write proper server side code.


    Could be reworded as follows:

    AJAX isn't quite mature and it is still slow on those Wallmart PCs so I suggest that, in lieu of the AJAX client, I suggest that you simply apply a stylesheet to the XML with XSLT to provide the best of both worlds. But mature AJAX toolkits (such as GWT) are improving and do a speedy rendering job while adequately managing the browser history and other nuances of the UI. ...And the following...

    There's no reason you can't make a fast serverside site (with ajax too, that works without the stupid tricks I described above), if you can't I suggest you educate yourself, or don't use a wallmart PC for production use. ...could be reworded as...

    I do most of my work in server-side work. I will disregard the evidence that was provided about the test environment from "years ago" and instead insult you as if you were my nemesis, instead of someone that I have never met prior to this discussion.

    Fixed that for you.
  • by kyofunikushimi (769712) on Wednesday October 10, 2007 @07:02PM (#20933815) Homepage
    And the ads don't slow things down at all?

    Also, don't the adds call some sort of script? I wouldn't call that static.

You can do this in a number of ways. IBM chose to do all of them. Why do you find that funny? -- D. Taylor, Computer Science 350

Working...