Forgot your password?
typodupeerror
Data Storage Books Media Databases Programming Software Security Book Reviews

High Performance MySQL 307

Posted by timothy
from the turning-the-tables dept.
swsuehr (Steve Suehring) writes "Knowing that Jeremy Zawodny works with MySQL in a highly demanding environment at Yahoo! Finance, I have to admit that I was somewhat anxious to crack open a copy of the book High Performance MySQL published by O'Reilly and written by Jeremy and Derek J. Balling. With such high expectations, it would be easy to be disappointed with the content or level of the material in the book. Far from disappointing, High Performance MySQL was exactly what I had hoped it would be and more." Read on for the rest of Suehring's review.
High Performance MySQL
author Zawodny & Balling
pages 294
publisher O'Reilly
rating Excellent
reviewer Steve Suehring
ISBN 0596003064
summary Optimization, Backups, Replication, Load Balancing & More

With a title like High Performance MySQL, the reader should not expect assistance with tasks such as installation or the basics of SQL. The authors acknowledge that the book isn't meant for the MySQL beginner. Even so, the book does spend a little time going over a refresher of MySQL. If the reader is looking for a book on the basics of MySQL, though, I'd recommend picking up a different title. I might be so brazen as to suggest looking at the book on MySQL that I wrote a couple years ago, but I'll leave it for the reader to find an appropriate title for their level of knowledge with MySQL.

The book is broken out into ten chapters, beginning with a chapter on the basics of MySQL as seen from someone with some level of MySQL experience. (And since I am someone with some level of MySQL knowledge, this chapter didn't seem to be too painful of a read.) That statement begs for a further explanation. I find it painful to read books that claim to have advanced topics but find it necessary to spend (or waste) time by giving yet another explanation of how to type ./configure; make; make install in the beginning chapter. In this regard, I find High Performance MySQL to be quite good at providing the appropriate level of explanation for the audience.

The first pages alone are a great starting point for those who are wondering about the performance as it relates to the different versions of MySQL. A discussion of the binary versions of MySQL versus the source code versions gives the reader a good understanding of the implications of this important decision. Choosing correctly between the versions of MySQL can help to maximize performance.

Though only the first chapter is titled "Back to Basics," most chapters include some basic or background information pertinent to that chapter. For example, both the query and index performance-related chapters include sections with the word 'Basic' in their titles. These two chapters, "Index Performance" and "Query Performance," cover topics often misunderstood when considering performance on a database. Naturally, any database administrator worth their pay (insert your own joke here) will tell you that an index can help to improve performance. But High Performance MySQL takes it a step further by diving into why you might use a particular index given your database needs. This type of practical information is common throughout the book. Rather than just give the reader a listing of the indexes (which are mostly common among RDBMS anyway), the authors choose to tell you why you might choose one method over another.

Other chapters within High Performance MySQL include chapters on server performance tuning, replication, load balancing, and high-availability, backup and recovery, and security. I've followed MySQL forums and mailing lists over the years and some of the more common questions relate to MySQL security. The chapter on security gives the reader a great explanation of the MySQL GRANT tables and includes a sidebar on the significance of 'localhost' in MySQL which nearly always seems to be a source of confusion, though maybe not for the reader of this particular book. Like other chapters, the security chapter includes tips for the underlying environment upon which MySQL operates. For example, regardless of how good the MySQL database permissions are configured, if another piece of software on the server isn't kept up to date then the entire server, including the MySQL data, could be compromised.

Looking back at the book, I might have laid the chapters out in a different order or provided more structure through sections of the book. However, this can be dismissed as personal opinion and is non-substantive anyway. I believe High Performance MySQL is an excellent choice for the MySQL administrator or developer not only looking to improve their database performance but also to improve his knowledge of MySQL.


You can purchase High Performance MySQL from bn.com. (And the book that Suehring alludes to is the MySQL Bible . Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.

High Performance MySQL

Comments Filter:
  • Re:No Thanks (Score:3, Insightful)

    by turboflux (781551) on Wednesday October 20, 2004 @02:39PM (#10578151)
    You would almost think that data inegrity would come before tuning for high performance wouldn't you?
  • by cowgoesmoo2004 (762366) on Wednesday October 20, 2004 @02:41PM (#10578174) Journal
    Okay, as an industry professional, I tire of books calling themselves "for professionals". Don't most professionals already know about index types and choosing their usage? If they don't, why are they calling themselves professionals.

    The important thing is, does it tell you how to set up and manage highly distributed system managing gigabytes of data under high transaction volumes?

    If it doesn't, and just discusses the general issues to consider under such conditions, it is worthless for those that are already professionals. Professionals know how to consider these things, they just want concrete examples so they don't have to spend months of reading and tinkering to figure it out on their own.

    Unfortunately, reading the review above doesn't even let me know whether or not the book has anything in it at a level that I might want. However, if I was writing the review I'd probably order the paragraphs differently and use some different words. That might just be personal preference.... :p

  • by Atzanteol (99067) on Wednesday October 20, 2004 @02:41PM (#10578175) Homepage
    That's true of all databases (not just MySQL). Inserting to a table that has an index requires a cost to index the new data.
  • by JAgostoni (685117) on Wednesday October 20, 2004 @02:43PM (#10578201) Homepage Journal
    I find MySql to be very high in performance when queries are written concisely and correctly. Heck, I have seen (and written) some Oralce PL/SQL that has horrific performance just to rewrite it concisely and correctly and have it perform very well indeed.

    The engine is only going to run as good as the crap you (developers) put into it. I would go so far as to say the simply due to the pervasiveness of MySql there are for more horrible script examples out there than for PostgreSQL.

    Personally, I use to correct tool for the job. BLOG? Hell ya use MySql. OLAP? Hell no. But you could make it work and work well if you wanted to.
  • Apples and oranges (Score:5, Insightful)

    by Pan T. Hose (707794) on Wednesday October 20, 2004 @03:00PM (#10578417) Homepage Journal

    You can keep "High Performance MySQL". I'm holding out for "Incredible Data Integrity Management with MySQL". :-D

    High Performance and Incredible Data Integrity? You are comparing apples and oranges. Make up your mind. It's either "High Performance with MySQL" or "Incredible Data Integrity with PostgreSQL." You can't have your cake and eat it too. If you need real ACID (Atomicity, Consistency, Isolation, and Durability) then by all means have it, but realize that it isn't free. Please read about the relational model [wikipedia.org] and ACID [wikipedia.org]. Read about the set theory [wikipedia.org] and predicate calculus [wikipedia.org], about tuple calculus [wikipedia.org] and relational algebra [wikipedia.org]. This is complicated stuff which it is not cheap algorithmically. Most of people don't need ACID and should not be penalized by the algorithms that implement it. Those who need it will have to pay with performance, because there is only this much you can do in one cycle. A pure ACID relational model is not for everyone, some people will only need a persistent object store instead of a relational database. The most important thing to realize is that databases are not magic. They implement complicated algorithms in code that have to be run by the CPU and by definition cannot be a one-size-fits-all solution. So please don't oversimplify it by saying "I want everything with no paying for anything, and also a pony." Use the right tool for the right job.

  • by esconsult1 (203878) on Wednesday October 20, 2004 @03:03PM (#10578452) Homepage Journal
    Is this high performance read only MySQL installations?

    The authors certainly could not be referring to a system that has medium to high levels of inserts or updates.

    While MySQL is great for reads, it sucks balls with inserts and updates... even with InnoDB, even with gobs of RAM and 15K SCSI hard drives in Raid 0 configurations on Quad Opteron systems.

    And what a sucky review too... Nothing about what the book addresses.

  • by AKAImBatman (238306) * <akaimbatmanNO@SPAMgmail.com> on Wednesday October 20, 2004 @03:07PM (#10578500) Homepage Journal
    Touchy, aren't we? I think that most of us here understand core database concepts (including ACID) just fine. (At least, I hope most of us do.) The problem is that it has become all to common to use MySQL in situations were data-loss and data-integrity can be measured in real dollars. MySQL is completely the wrong tool for the job, and as such has been heavily criticized.

    In fact, it is very difficult to understand where MySQL fits in today's technological environment. On one hand you have truly powerful database servers that guarantee data safety. On the other hand you have simple embedded databases that run much faster than MySQL. Where does this leave MySQL? With nothing more than a lot of "it's Open Source!" momentum.

    High Performance and Incredible Data Integrity?

    I never said that. Please don't act like I did.
  • by Anonymous Coward on Wednesday October 20, 2004 @03:15PM (#10578578)
    Its because it has proven to be very competitive compared to text files and berkeley DB. Mysql is great at exactly this, replacing simple text based data management with basically the same thing only with an SQL interface. It doesn't compare to actual databases though, like oracle, db2, mssql and pgsql.
  • by cft_128 (650084) on Wednesday October 20, 2004 @03:28PM (#10578725)
    Those who have no choice and use open source projects that use it (e.g. www.ensembl.org) and/or those who deal with mostly read-only databases. Its not as bad as you make it out to be, and is good for some projects.

    It is generally faster than Postgres,

    It handles large datatypes more easily than Postgres (although Postgres, while odd, has some more flexibility)

    Postgres has an odd 8k limit per row (probably fixed by now though)

    Supports selects across databases.

    It has good replication support (with 4.x and later)

    Comes with some good admin tools in the normal distro.

    Decent full text indexing out of the box

    It is really about choosing the right tool for the job, each has their advantages and disadvantages.

  • by Lazy Jones (8403) on Wednesday October 20, 2004 @03:31PM (#10578762) Homepage Journal
    One of our servers handled over a quarter of a billion queries in a month-and-a-half, and it still has capacity to spare.

    OK, that explains a lot. 250 mil. queries in 45 days is normal for a small company like us and definetely peanuts for Yahoo - i.e. I'd say that this allows the conclusion that nothing particularly important or demanding (i.e. heavily used compared to Yahoo's overall traffic) is built using MySQL at Yahoo. Just to put this into perspective, a little dual Xeon box here does more than 5 million *INSERTs* every day and a large number of SELECTs (we update each of our currently 2.4 million product prices at between 1 and 20 times per day - the exact number of INSERTs per day is currently unknown to me). We use PostgreSQL 7.4.5 though, so we don't consider this a particularly impressive feat ;-)

  • by fatboy (6851) * on Wednesday October 20, 2004 @03:36PM (#10578805)
    As a result, REAL alternatives like PostgreSQL were just ignored when they came on the scene. After all, you have to perform *gasp* maintenance on Postgres! *sigh*

    When PostreSQL came on the scene I attempted to use it, but there was little in the way of documentation. I went the MySQL route for my little database projects because it had better docs at the time. I never decided MySQL was "1337" and that all other databases were "evil".

    It had good docs, was cheap, and was good enough to get the job done.

    MySQL, if you can read this, your using it right now.
  • Re:No Thanks (Score:3, Insightful)

    by AKAImBatman (238306) * <akaimbatmanNO@SPAMgmail.com> on Wednesday October 20, 2004 @03:40PM (#10578835) Homepage Journal
    For a moment there, I thought you meant that Yahoo was processing company financial data with MySQL. Then I realized [mysql.com] that they use it just for displaying market data and the like.

    Give me a heart attack, why don't you?!
  • Re:No Thanks (Score:2, Insightful)

    by fitten (521191) on Wednesday October 20, 2004 @03:40PM (#10578836)
    Unfortunately, a lot of folks who think they know a lot about databases really don't. They don't know when MySQL is sufficient or not because they don't understand what databases need to do and/or provide. Most wouldn't know the difference between MySQL, a big honkin text file, and a real RDBMS like Oracle. To them, they are jsut all places to fling a bunch of data to query back later. Most wouldn't know what transactions really are or when you should use them, although if you ask them, they'll assure you that they don't need that "overhead" in their problem.
  • by borg1238 (692335) on Wednesday October 20, 2004 @03:55PM (#10578979)
    Damn it, where's the "no sense of humor" mod option?
  • Tripping on ACID (Score:4, Insightful)

    by yintercept (517362) on Wednesday October 20, 2004 @04:13PM (#10579159) Homepage Journal
    Truth is, most people want ACID (and transaction processing) because it saves them from having to think too much. You perform a whole bunch of data operations. If it doesn't work you role back. If you are just dropping things into a MySQL structure, you have to think harder about what will happen if an operation crashes mid way.

    As for the parent post, he's just bought into the illusion that database gurus think at a transcendental level. It is a common ailment. Object gurus, business gurus, religious gurus, political gurus are all treading on the same clouds. It is a common ailment that usually cures itself with time.
  • by Gorath99 (746654) on Wednesday October 20, 2004 @04:29PM (#10579352)
    There is, of course, no tool that meets the needs of every possible scenario. Instead, there are tools that meet the needs of common scenarios. Most of what MySQL is being (improperly) used for could be easily replaced with PostgreSQL. Postgres offer similar performance, but takes a minor hit for data integrity. As a bonus, you gain sub-selects, stored procedures, SQL-92 compliance, and other features of a "real" SQL database.

    Actually, MySQL has been moving in that direction for quite some time now. MySQL 4.1 has subqueries [mysql.com] and MySQL 5.0 has stored procedures [mysql.com].

    Of course, MySQL still doesn't quite have the functionality of Postgress, especially in the data integrity department, so there's certainly truth to your claims. However, my point is that it's not standing still. It's certainly improving, so we can't just keep pointing out its past mistakes.
  • by jadavis (473492) on Wednesday October 20, 2004 @04:29PM (#10579358)
    I encourage you to examine PostgreSQL 8.0 upon it's release, which will most likely be November or December of this year.

    It fits your requirements perfectly without requiring the commercial license. Of course the installer bit depends also on your application, but it shouldn't be too hard to do. You may want to check out the pginstaller project here: http://pgfoundry.org/projects/pginstaller/ [pgfoundry.org].
  • by Not The Real Me (538784) on Wednesday October 20, 2004 @04:31PM (#10579393)
    Where does MySQL fit? Storing discussion forums, news sites, caching mailing lists, storing information, etc.

    I would never use MySQL for financial transactions but for non-monetary usage it's a good general purpose database as long as your don't require complex business rules filled with lots of cascading actions.
  • by sql*kitten (1359) * on Wednesday October 20, 2004 @04:39PM (#10579490)
    If you need extreme performance in small commercial site, why you need an great and cpu-expensive Referential Integrity? In my opinion, in this case, MySQL is better.

    OK, MySQL can be used as a cache. It's like Squid and Oracle is like Apache. You can use Oracle to handle your actual transactions, integrity constraints, etc etc, then periodically dump out the data into MySQL and generate the web pages from there.

    I have Oracle databases with thousands of connected users, all doing both queries and transactions. Could you read the data quicker from MySQL? Probably. Can MySQL manage tens - sometimes hundreds - of thousands of locks at a time? Not a chance.
  • by tzanger (1575) on Wednesday October 20, 2004 @05:00PM (#10579760) Homepage

    You PgSQL nazis are just being ignorant. Do you think because MyISAM isn't transactional that the whole database isn't?

    I never said that; I have not seen a decent benchmark showing InnoDB having *any* kind of decent performance though. And it still does nothing to address the scalability issues of MySQL with hundreds of users, nor does it address the speed issues of MySQL in general when performing complex queries.

    MySQL's InnoDB offers the same level of data integrity as PgSQL does, and about the same performance.

    Show me. And I'm not talking "how fast can I do a thousand SELECT * FROM mytable;" -- good, real-life queries with WHERE clauses, maybe cursors and ordering.

    Most of PostgreSQL's gains are erased by MySQL 4.1's excellent code when it is combined with those OS's proper threading.

    Again, show me. Postgres' performance increases in 7.4.x. are nothing to sneeze at, and certainly have nothing to do with the threading changes between 2.4 and 2.6.

    PgSQL will be doomed to a BSDesque life. Touted by too many people with no social skills, nobody will ever be able to see how good it is, because they just get shouted down when asking questions.

    That's funny; I see something similar with the general MySQL crowd; it's like watching a bunch of pizza-faced 13 year olds drool over the latest and greatest nVidia card and trying to convince their parents to buy the $500 video card to replace the $250 they bought only 6 months ago so they can play Doom3, when all the parents want to be able to do is balance the checkbook and get some real work done.

  • by harlows_monkeys (106428) on Wednesday October 20, 2004 @06:30PM (#10580693) Homepage
    Postgres offer similar performance, but takes a minor hit for data integrity

    I've got a few million reports that basically look like this: some header information of the form "variable=value" (about a dozen possible values of variable), and then one or more sections. Each section has a header in the same format as the report header, and from 0 to 5000 entries of the form "item:status:file", where name is one of about 5000 items, status is one of 5 status values, and file is the full path of a file containing information about the item.

    I want these in a database. So, I've got a table that represents reports, which gets the information from the header. Another table represents report sections, and has the information from the section headers, and references the report table. A third table represents the "item:status:file" lines. I've actually got a bunch more tables, because the database is in third normal form, but the above is the gist of it.

    With MySQL, I write the straightforward code to parse reports and insert into the database. It takes about a day to import all my reports, and querying them runs at decent speed.

    With PostgreSQL, it starts out about an order of magnitude slower at importing, and after four days, when it is isn't anywhere near done, it is running even slower.

    PostgreSQL does seem to query a bit faster than MySQL (but see below), but that slowness in getting the data in is a killer.

    As far as queries go, PostgreSQL seems fast, if you've done "VACUUM ANALYZE" on the database. If you don't do that, it is incredibly slow. For example, if I have a simple table, indexed on an integer field, and do a select that was "WHERE that_field = 123", it is fast, and uses the index, but if I do "WHERE that_field IN (123, 456)", it doesn't use the index, unless you analyzed the table--it scans the whole freaking table. MySQL is smart enough to actually use an index without you having to run a special command first. Geesh!

    So, here's what it comes down to, and why sooooo many people choose MySQL over PostgreSQL: PostgreSQL seems to require that you know a fair amount about database stuff to actually make it work well, even if you aren't trying to do fancy stuff. MySQL, when you are doing simple stuff, just works, and works well.

    Oh, one more thing. When I was importing my few million records into MySQL, it barely loaded down my machine. With PostgreSQL, pretty early on, the disk light came on and stayed on, and doing things on the machine became painful. If that were a shared machine, say, a web server with shared hosting, replacing the MySQL in LAMP with PostgreSQL would be a disaster.

  • by wximagery (820020) on Wednesday October 20, 2004 @07:19PM (#10581120)

    "You can't have your cake and eat it too"

    Sure you can. It's called Oracle ... but that's a damn expensive cake.
  • by harlows_monkeys (106428) on Wednesday October 20, 2004 @08:28PM (#10581600) Homepage
    I'm going to forgo the opportunity to mod this fascinating debate to discuss this. You mentionned that PostgreSQL was "an order of magnitude" slower on imports -- my guess is that your import tool simply connected and started running INSERTs? It's a common-ish trap for new users. If there is no transaction running, each statement is its own transaction, with all the BEGIN and COMMIT overhead that goes with it. Wrapping your import in a BEGIN; ... COMMIT; pair would probably have sped it up considerably.

    Actually, I did have the complete processing for each report in a begin/commit pair. That was one of the reasons I was experimenting with PostgreSQL...with MySQL there was the annoyance of having to make sure the whole report was going to be OK before inserting any of it, so I couldn't just add to each table as I processed the report, because I might find out near the end that the report is bogus and want to discard it.

    In short, pgsql does require somewhat more effort to initially configure and maintain. On the other hand, for a lot of people it's very much worth that effort

    Right...and for a lot of people, they need something the works in the common case without that effort, and so there's always going to be an important place for MySQL.

    What I'd really like to see is a good document included with PostgreSQL that says "So..you've got a big application that just works fine in MySQL, but you want to convert? Here's what you do" that covers all these things.

  • no (Score:2, Insightful)

    by ashpool7 (18172) on Wednesday October 20, 2004 @10:16PM (#10582233) Homepage Journal
    It seems to go like this.....

    DBAs and those who actually know how to use all the features in the high-end database exploit the features in the database. PostgreSQL supports the most features of the free DBs and it's pretty fast when you use connection pooling compared to MySQL, so they go with it.

    People see this and immediately declare MySQL crap, dredge up all those reasons not to use it, and declare PostgreSQL the winnar! [sic] People love conflict....

    So, you can use whatever you want. I'm sure Yahoo! picked MySQL because of fast SELECTs and that's all that is really important to them. I pick PostgreSQL because I don't want to have to convert from MySQL to PostgreSQL because I need some feature.
  • by JamieF (16832) on Thursday October 21, 2004 @07:02AM (#10584661) Homepage
    >Prior to its switch to MySQL in 2000, Yahoo! Finance managed its database
    > needs through homegrown flat files and Berkeley DB databases. But those solutions
    > proved to be inflexible and not scalable enough for their needs, with the
    >volume of data nearly doubling every few months

    So, a SQL database is better than flat files. Ergo, MySQL rules...?

    I'm glad that Yahoo has found a solution that works (or at least that the person who said that thinks it does) but that doesn't say anything about why MySQL is better or worse than another SQL database, or another kind of DBMS for that matter. Did they even do an evaluation, or did they just upgrade from DBM to MySQL and see no reason to consider other options?

    If Yahoo Finance loses a story, or a stock chart, or a user account, or an email, big whoop, it's a huge free service. If they can run 30% fewer database servers, that's more important. So it probably is a good choice for them. But if there were actual money in transit they'd probably want to use something else for that (there's no reason the content management couldn't stay in MySQL, of course).
  • by JamieF (16832) on Thursday October 21, 2004 @07:44AM (#10584803) Homepage
    >>Weird handling of '0' vs. null
    >The special handling of null forces you to write slightly less braindead code.

    Way to spin it, dude. Here's another way to look at it: The retardo handling of null in MySQL makes your work as a programmer harder. It's just a pissing match argument: only a wuss needs those training-wheels features. Real Programmers never make mistakes.

    create table foo (bar integer not null, biz integer not null);
    insert into foo (bar) values (1);
    select * from foo;
    +-----+-----+
    | bar | biz |
    +-----+-----+
    | 1 | 0 |
    +-----+-----+

    That's just the MySQL dudes deciding that silently doing the wrong thing is better than giving you an error message. Hmm, 0 isn't null, so technically the NOT NULL is not violated, but that's not what it does on other databases, and in SQL books, etc.

    This sort of thing is consistent with MySQL AB's attitude toward product design and standards compliance: we don't feel like implementing it because it's hard, and/or the people who are asking for it don't really need it, and/or we don't really understand the issue, so we'll just document it and that makes the problem go away.

    So, you have to learn all sorts of oddities about the way that MySQL differs from everything else, and that makes your job as a programmer harder, which means more bugs and slower coding.
  • by tgrigsby (164308) on Thursday October 21, 2004 @01:16PM (#10589185) Homepage Journal
    In short, pgsql does require somewhat more effort to initially configure and maintain.

    That extra effort to configure and maintain is why we went with MySQL. It's hella fast (and yes, that *is* a technical term), isn't plagued by the insecurities of MS SQL Server, and it's a breeze to install and configure -- don't even get me started on PostgreSQL's requirement that it be run from a non-Administrator account under Windows when EVERYONE with a default installation of Windows from NT 4.0 through XP is running as an Administrator.

    At the end of the day, the time and aggravation saved makes MySQL the hands-down winner over PostgreSQL, it's price/performance make it the smack-down winner over Oracle, and the security, stability, and speed make it the no-lie DB God over MS SQL Server.

    Now if it just had Booleans as a native datatype...

  • by ttfkam (37064) on Sunday October 24, 2004 @04:10PM (#10615459) Homepage Journal
    This would be all well and good but PostgreSQL isn't standing still either. By the time MySQL 5.0 stabilizes, PostgreSQL will have a new host of features that MySQL lacks.

    But that's fine. Not everyone needs all of the current features of PostgreSQL let alone all of the potential new ones.

    HOWEVER, MySQL needs to work on data integrity post haste. "my new purple crayon" being logically equal to 0 is not a missing feature; it's a data loss bug. No error. Not even a warning. A bug in software that tries putting a string into an integer field will have no notification that anything went wrong until after the data is retrieved later. It's things like this that keep me from liking MySQL, not its lack of features.

    Until then, when I need SQL and something small, I'll be using SQLite.

The flow chart is a most thoroughly oversold piece of program documentation. -- Frederick Brooks, "The Mythical Man Month"

Working...