Forgot your password?
typodupeerror
Data Storage Books Media Databases Programming Software Security Book Reviews

High Performance MySQL 307

Posted by timothy
from the turning-the-tables dept.
swsuehr (Steve Suehring) writes "Knowing that Jeremy Zawodny works with MySQL in a highly demanding environment at Yahoo! Finance, I have to admit that I was somewhat anxious to crack open a copy of the book High Performance MySQL published by O'Reilly and written by Jeremy and Derek J. Balling. With such high expectations, it would be easy to be disappointed with the content or level of the material in the book. Far from disappointing, High Performance MySQL was exactly what I had hoped it would be and more." Read on for the rest of Suehring's review.
High Performance MySQL
author Zawodny & Balling
pages 294
publisher O'Reilly
rating Excellent
reviewer Steve Suehring
ISBN 0596003064
summary Optimization, Backups, Replication, Load Balancing & More

With a title like High Performance MySQL, the reader should not expect assistance with tasks such as installation or the basics of SQL. The authors acknowledge that the book isn't meant for the MySQL beginner. Even so, the book does spend a little time going over a refresher of MySQL. If the reader is looking for a book on the basics of MySQL, though, I'd recommend picking up a different title. I might be so brazen as to suggest looking at the book on MySQL that I wrote a couple years ago, but I'll leave it for the reader to find an appropriate title for their level of knowledge with MySQL.

The book is broken out into ten chapters, beginning with a chapter on the basics of MySQL as seen from someone with some level of MySQL experience. (And since I am someone with some level of MySQL knowledge, this chapter didn't seem to be too painful of a read.) That statement begs for a further explanation. I find it painful to read books that claim to have advanced topics but find it necessary to spend (or waste) time by giving yet another explanation of how to type ./configure; make; make install in the beginning chapter. In this regard, I find High Performance MySQL to be quite good at providing the appropriate level of explanation for the audience.

The first pages alone are a great starting point for those who are wondering about the performance as it relates to the different versions of MySQL. A discussion of the binary versions of MySQL versus the source code versions gives the reader a good understanding of the implications of this important decision. Choosing correctly between the versions of MySQL can help to maximize performance.

Though only the first chapter is titled "Back to Basics," most chapters include some basic or background information pertinent to that chapter. For example, both the query and index performance-related chapters include sections with the word 'Basic' in their titles. These two chapters, "Index Performance" and "Query Performance," cover topics often misunderstood when considering performance on a database. Naturally, any database administrator worth their pay (insert your own joke here) will tell you that an index can help to improve performance. But High Performance MySQL takes it a step further by diving into why you might use a particular index given your database needs. This type of practical information is common throughout the book. Rather than just give the reader a listing of the indexes (which are mostly common among RDBMS anyway), the authors choose to tell you why you might choose one method over another.

Other chapters within High Performance MySQL include chapters on server performance tuning, replication, load balancing, and high-availability, backup and recovery, and security. I've followed MySQL forums and mailing lists over the years and some of the more common questions relate to MySQL security. The chapter on security gives the reader a great explanation of the MySQL GRANT tables and includes a sidebar on the significance of 'localhost' in MySQL which nearly always seems to be a source of confusion, though maybe not for the reader of this particular book. Like other chapters, the security chapter includes tips for the underlying environment upon which MySQL operates. For example, regardless of how good the MySQL database permissions are configured, if another piece of software on the server isn't kept up to date then the entire server, including the MySQL data, could be compromised.

Looking back at the book, I might have laid the chapters out in a different order or provided more structure through sections of the book. However, this can be dismissed as personal opinion and is non-substantive anyway. I believe High Performance MySQL is an excellent choice for the MySQL administrator or developer not only looking to improve their database performance but also to improve his knowledge of MySQL.


You can purchase High Performance MySQL from bn.com. (And the book that Suehring alludes to is the MySQL Bible . Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.

High Performance MySQL

Comments Filter:
  • Check your SQL.... (Score:5, Informative)

    by tcopeland (32225) * <{moc.dnalepoceelsamoht} {ta} {mot}> on Wednesday October 20, 2004 @02:43PM (#10578207) Homepage
    ...before blaming MySQL! Ha ha!

    Er, anyhow, as an apology, here's an open source SQL analyzer [postgresql.org].
  • by stevewz (192317) on Wednesday October 20, 2004 @02:49PM (#10578277) Homepage
    This review didn't really tell me anything that I couldn't have gleened by simply reading the Table of Contents. I get the impression the author either didn't actually read the book or he was too lazy to write more detail.

    Having said that, I'm intrigued by the title and will probably investigate the book anyway simply because it's a topic that directly pertains to some upcoming projects on my calendar.
  • by ttys00 (235472) on Wednesday October 20, 2004 @02:57PM (#10578378)
    ...and was surprised at a two things:

    1) The number of typos in the book. Didn't anyone proofread it before it went to print?
    2) How many features MySQL lacks. Don't get me wrong, I like it as its fast and easy to work with for web stuff, but I was surprised at how often I read "MySQL doesn't do x yet, but will in version y".

    The chapter on replication was worth the cost of the book for me, but I suggest people considering buying the book should wait until the 2nd edition.
  • by Anonymous Coward on Wednesday October 20, 2004 @02:58PM (#10578394)
    Why not? Yahoo finance has been using MySQL for the last four years [mysql.com].

    "Yahoo! uses the MySQL database to power many of the services on Yahoo! Finance (finance.yahoo.com), a popular web site that provides the full spectrum of financial services and information. MySQL is behind Yahoo! Finance's high-traffic area of news headlines, which is aggregated from over 100 sources worldwide, as well as its stock charts, insider trading, SEC filings, conference calls and earnings reports, among others. "

    "Prior to its switch to MySQL in 2000, Yahoo! Finance managed its database needs through homegrown flat files and Berkeley DB databases. But those solutions proved to be inflexible and not scalable enough for their needs, with the volume of data nearly doubling every few months. "Our needs vary considerably, and that is one reason that MySQL works well with us. It handles the high-volume, 260 million record tables just as well as the small, low-volume ones. One of our servers handled over a quarter of a billion queries in a month-and-a-half, and it still has capacity to spare.""

    "Since Yahoo! Finance started using MySQL, the database's popularity has grown within the company, and many of Yahoo's other world-wide properties are now using MySQL too. "We have used MySQL far more than anyone expected," stated Zawodny. "We went from experimental to mission-critical in a couple of months. Once others saw it, they jumped on board.""
    ---

    I wonder why many people here hate MySQL even when it has been proven to be a very competitive database. Is it because MySQL is a foreign product?
  • Re:mysql backup (Score:3, Informative)

    by smclean (521851) on Wednesday October 20, 2004 @02:59PM (#10578412) Homepage
    I'm stepping out on a limb here, but I think you will end up having to lock all tables to make a real consistent backup of your database where all tables relational data is gauranteed intact.

    I've never had to do this but your problem interested me. Think about it, how could you possibly make the database spit out a perfectly frozen-in-time set of tables without actually freezing in order to make the set?

    It's not like you'd be able to tell the database "Make me a backup of the database as it was at is right now, at 12:45:23.183848!" and the database could continue modifying the content of the tables while still producing for you a perfect backup. Even things programmed into the database to perform an action like this would take as much overhead as locking the tables first anyway, right?

    The replication slave would work, though.

  • by shirai (42309) * on Wednesday October 20, 2004 @02:59PM (#10578413) Homepage
    The answer is yes it does show you how to setup massive high transaction systems.

    Up until I picked up this book, I was not comfortable with using MySQL for the extremely high performance apps I have to write. Yes, they are web-based (meaning they fit into the MySQL appropriateness factor) but things like backups and scalability really scared me off of it.

    This book talks about indexes but doesn't focus on them as the magic bullet. Rather it is very specific with lots of need to know items of importance.

    Some things that it covers are:

    1. How to setup replication with different architectures including master/slave, master/slave/slave (another level), replication rings, dual master replication with slaves and more.

    2. RAID (software vs. hardware), RAID types, IDE vs. SCSI. I know. I'm sure you've thought of these things already but most people will find something new here because there is a lot on it. It is quite dense.

    3. Selection of an OS. Seriously. It actually compares threading models on OSes compared to performance with MySQL. This is great stuff.

    4. Selection of a filesystem on an OS. Yes, nitty gritty. Includes talk about journaling vs non-journaling benefits and warnings.

    5. Load balancing mySQL. How to do this through a load balancer. Why and how it's different than load balancing web servers.

    6. How to backup live using replication.

    7. I love this: Detailed information on how the replication system actually works. I love understanding the inner workings because then I know how/why things went wrong.

    8. The benefits of myISAM/InnoDB table types including how they store/retrieve data, the features available, etc.

    I've been optimizing databases for web use for some time and this isn't a book you should be tired of because it claims "high performance." Okay, maybe I'm a geek, but this book kept me up at night figuring out how I should be architecting a system.
  • by InsaneCreator (209742) on Wednesday October 20, 2004 @03:02PM (#10578444)
    Umm, why would you do this?

    Because using an index requires random data access which is more expensive than sequential access. There is a point where doing an index scan takes more time than just going over all the records in a table (example: small tables, condition matches many rows, etc). Any sane query planner should/will choose the faster method.
  • by tcopeland (32225) * <{moc.dnalepoceelsamoht} {ta} {mot}> on Wednesday October 20, 2004 @03:05PM (#10578488) Homepage
    > People can write some scary code and then
    > just blame it on the engine.

    Yup. There's a really good article by Stephane Faroult on OnLamp about writing better SQL... it's right here [onlamp.com].

    After reading that article, I went through some code I'd written and found some places where I was using DISTINCT incorrectly in exactly the way he described.
  • by ViolentGreen (704134) on Wednesday October 20, 2004 @03:08PM (#10578504)
    No subselects

    That is huge. I used MySQL about 6 months ago (though it wasn't the most recent version.) I was very disappointed. I use subqueries extensively. Since these were not available, I ended up with messy and slow joins.

    The non-standard SQL isn't too bad to figure out as long as you have google. There needs to be a SQL translator java applet or something.

    In addition, the version that I used didn't support views. Is this fixed now?

    Regardless, I gave it a chance. The next time I need database services for personal use I'm going with PostgreSQL.
  • Re:mysql backup (Score:1, Informative)

    by Anonymous Coward on Wednesday October 20, 2004 @03:10PM (#10578520)
    It's called Multi Version Concurrency Control. It's what databases like PostgreSQL and Oracle use to do just this.

    Most databases do this one way or another. Those that don't have MVCC use the transaction log to achieve the same result, though you can (AFAIK) onlyl do a binary dump of the pages in the database in that case. Wherease MVCC lets you do a logiacal dump.
  • Safari Rules (Score:2, Informative)

    by NardofDoom (821951) on Wednesday October 20, 2004 @03:12PM (#10578542)
    I was just about to run out and buy this, but then I realized that my safari subscription will get me the book for only $20/month.

    Best $120/year I've ever spent, that. And it's a tax write-off.

  • Re:mysql backup (Score:4, Informative)

    by fingon (114710) on Wednesday October 20, 2004 @03:12PM (#10578544)
    Check out LVM2 and snapshots.

    In principle, using LVM 2 system it's easy. Get database to consistent state on disk, take LVM snapshot of the logical volume involved, resume database, run dump on the snapshot, take out the snapshot when no longer needed.

    Beauty is that the size of snapshot is size of delta that is caused during the backup; therefore, this is practical for almost any sized database.
  • Re:mysql backup (Score:5, Informative)

    by shirai (42309) * on Wednesday October 20, 2004 @03:14PM (#10578561) Homepage
    Actually, lack of live backups was one of my biggest fears with using MySQL as well and the book actually does address this.

    The caveat: You need to set up replication. This is a good idea anyways for a high performance system in case the master crashes and you need another system available quickly.

    It works like this:

    1. Set up replication.
    2. Do a locked table backup on the slave.

    The nice things are the details like how the new version of MySQL uses two threads on the slave. One to copy the transactions to the local drive and the other to apply the transactions. Because of this, your local transaction log is up to date even if the transactions haven't been applied which subsequently means you don't have missing transactions if there is a crash on the master during the backup.

    This is seriously a good book. Okay, you might want to use PostgreSQL for its feature set but if you are doing write few read mostly with the web as the application, this book seriously gives you the confidence to use MySQL. Up until I read this book, I didn't know enough about MySQL performance implications to trust it for scalability. Now I know what I need to do to get it.
  • by dprust (316840) * on Wednesday October 20, 2004 @03:17PM (#10578602)
    Let's be fair -- technology is a very broad subject and nobody can know everything. A lot has to do with how a professional programs with a database. For example, a person may write applications that use an existing database for years, learning all kinds of complex queries and techniques without ever having the access to create an index. A professional is really one who gets paid to do something. I think a person who knows at least 80% of everything there is to know about a specific topic like MySQL would be called an "expert" instead.

    I totally agree on those books. When I'm going to look for a book and see the word "For Dummies", I think, "Hey, I'd be a dummy if I /wasn't/ looking to learn!"
  • by wowbagger (69688) on Wednesday October 20, 2004 @03:21PM (#10578653) Homepage Journal
    Just as in maths transforming the problem to a new domain might make it easier to solve, perhaps there is an easier way for you to solve this problem:

    Put your MySQL data onto a Logical Volume Manager volume. Use the LVM snapshot mechanism to insure a consistent view of the data during backup. Here's how:

    Create an LVM physical group with enough space to contain all your database, plus enough overcapacity to store any changes during the backup proceedure. So, let us say your database contains 10G of data, will take 10 hours to back up, and will, during that time, undergo revision to about 1G of that data - you create a PV of 11G or more.

    Next, create a logical volume big enough to store your data - in the case of the numbers above, create a 10G logical volume. In that volume create your file system, and set up your database.

    Now, when you wish to do a backup, lock out access to the database, and do a snapshot of the logical volume it is on. Then restore access to the database. This won't take very long at all.

    Mount the snapshot read-only and back it up. If you need to back up through the MySQL, bind a server onto the read-only data.

    Then release the snapshot.

    During the time the snapshot exists (in this example, the 10 hours to do the backup), the real, live read/write file system may be updated as desired, as long as the total differences between it and the snapshot do not exceed the reserve capacity of the physical group - in the case of the example numbers above that would be 1G of total differences (NOT 1G of writes - change the same 1k record a billion times and it is still a 1K change).

    That's one of the reasons the Big Boys (like Sun) pooh-poohed Linux - it did not have LVM. Now it does.
  • by ca1v1n (135902) <snook AT guanotronic DOT com> on Wednesday October 20, 2004 @03:24PM (#10578688)
    Sounds like you've been using PostgreSQL for quite a while, since your criticisms generally apply to rather old versions of MySQL. I hate all databases with equal passion, so I figured I ought to set the record straight.

    No data integrity

    MySQL has a modular design, allowing you to use any of several database backends, each with different design priorities. If data integrity is a priority for you, RTFM and pick the right backend.

    Completely non-standard SQL

    This is true of pretty much all DBMS's. MySQL's development cycle is quite delightfully quick, meaning that if people want certain features added, it doesn't take long to get them added.

    No extensibility in the engine (functions, stored procedures, etc.)

    Ummm... MySQL has those.

    No subselects

    Yes it does, as of version 4.1. It's not their fault if you're more than a full release behind.

    Weird handling of '0' vs. null

    The special handling of null forces you to write slightly less braindead code. This kind of handholding is becoming popular in the design of modern programming languages like Java and C#, with the recognition that the later in your development cycle you find a bug, the more it costs to fix it. Of course, there are still times when the programmer needs very explicit control, like when writing an OS kernel or a device driver. If you're writing an OS kernel or a device driver in MySQL, you've got bigger problems.
  • by tzanger (1575) on Wednesday October 20, 2004 @03:34PM (#10578788) Homepage

    I guess it's kind of humorous then that PostgreSQL beats the pants off of MySQL in any kind of moderate DB usage. MySQL can do a damn fine job for simple INSERTs and SELECTs but throw a hundred users at it or a few WHERE or ORDER BY clauses (or all of the above) and MySQL shows its true colours: Made by Fischer-Price. Throw in complex queries including subselects or try using views and... oh right, I forgot... It's not got any kind of real relational power behind it... my mistake!

  • by AKAImBatman (238306) * <akaimbatman@NosPAm.gmail.com> on Wednesday October 20, 2004 @03:36PM (#10578809) Homepage Journal
    But what are these alternatives you mention that are so much better suited for every possible scenario?

    There is, of course, no tool that meets the needs of every possible scenario. Instead, there are tools that meet the needs of common scenarios. Most of what MySQL is being (improperly) used for could be easily replaced with PostgreSQL. Postgres offer similar performance, but takes a minor hit for data integrity. As a bonus, you gain sub-selects, stored procedures, SQL-92 compliance, and other features of a "real" SQL database.

    MySQL is quite decent for something like a blog, but why even bother with a complete server in those cases? An embedded database such as Berkeley DB [sleepycat.com] or HSQL [sourceforge.net] (formerly Hypersonic) would provide better performance and would get rid of security issues inherent in running a complete database server.

    In still other instances, SQL databases are misused for large object data storage. In many of these instances, an Object Database such as ObjectStore [objectstore.com] (or your favorite open source choice of the 1000+ options) will provide better performance, without sacrificing much in the way of database management. (Standard database management tools are usually insufficient for dealing with databases containing large amounts of LOBs.)

    Basically, the choice in database and database technology should be carefully weighed against the application instead of saying "I know SQL and MySQL is 1337!"

    Some other database options include:

    DaffodilDB [daffodildb.com]
    SAP DB [sapdb.org]
    FireBird DB [sourceforge.net]
    Cloudscape [cloudscape.com] (Soon to be open source)
    Xindice [apache.org] (XML Database)
    ObjectDB [objectdb.com]
    DB4O [db4o.com]
    Prevayler [prevayler.org]
  • by AKAImBatman (238306) * <akaimbatman@NosPAm.gmail.com> on Wednesday October 20, 2004 @03:48PM (#10578916) Homepage Journal
    MySQL has a modular design, allowing you to use any of several database backends, each with different design priorities. If data integrity is a priority for you, RTFM and pick the right backend.

    As I understand it, InnoDB (which I assume is what you're referring to) costs money to acquire and as such is NOT part of the standard database engine. It's really just another database engine using MySQL as a facade.

    This is true of pretty much all DBMS's. MySQL's development cycle is quite delightfully quick, meaning that if people want certain features added, it doesn't take long to get them added.

    MySQL is the ONLY database that won't execute:
    select * from "My Table"
    Instead, you have to execute:
    select * from `My Table`
    Do you have ANY idea how much that screws up those of us in the database tools business?

    No extensibility in the engine (functions, stored procedures, etc.)

    Ummm... MySQL has those.


    In 5.0 Alpha. i.e. Not here yet.

    No subselects

    Yes it does, as of version 4.1. It's not their fault if you're more than a full release behind


    Again, 4.1 is a non-production release. The latest stable is 4.0.

    The special handling of null forces you to write slightly less braindead code.

    So you're saying that inserting zeros instead of nulls (or no value at all) for auto-increment columns leads to less brain-dead code? All it does for me is make my database tools that much more complicated to code.
  • Re:mysql backup (Score:3, Informative)

    by pHDNgell (410691) on Wednesday October 20, 2004 @03:59PM (#10579010)
    It's not like you'd be able to tell the database "Make me a backup of the database as it was at is right now, at 12:45:23.183848!" and the database could continue modifying the content of the tables while still producing for you a perfect backup.

    This is an example of why people who have used real databases are annoyed by mysql's undeserved popularity. People who have never seen a DB before build applications and processes around mysql's limitations and think that's the way things are.

    Why would you store data in something that can't do this?

    Sure Oracle (which does this among other things) is expensive, but people are pretty sure that if they put something in it, they can get it back out.

    Postgres (which does this among other things) is indisputably more free than mysql and outscales it without making it harder on the low end. (And I don't buy the embedded mysql thing...if you're embedding, embed sqlite which is still yet more free, smaller, and easier).
  • Re:mysql backup (Score:2, Informative)

    by Anonymous Coward on Wednesday October 20, 2004 @04:06PM (#10579087)
    % man pg_dump
    ...
    Description

    pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).

    ...
  • by ashpool7 (18172) on Wednesday October 20, 2004 @04:11PM (#10579137) Homepage Journal
    (from what I've read and observed)

    Because it's not a "real" database. Sacrifices too many features to be fast.

    Because of this list: http://sql-info.de/mysql/gotchas.html

    Because it's pawned off as an Oracle or DB2 replacement when it's not. There are instances where a database isn't using all the features of DB2 and in that case, MySQL or Postgres could be used, but that does not make it a "DB2 replacement."

    Because data corruption used to be the nom de guerre until InnoDB. There are still skeptics.

    Because there are no views.

    Somewhere (yes, I know not providing it makes this argument looks stupid. I honestly couldn't find it, since I didn't know which one said it or what precisely it said) there's a good quote about one of the MySQL founders saying something like "PRIMARY KEY is the same as a UNIQUE." This lack of understanding is not appreciated by actual database administrators... See http://www.phpfreaks.com/postgresqlmanual/page/key s.html
  • by smurfi (91140) <smurf@noris.de> on Wednesday October 20, 2004 @04:13PM (#10579147) Homepage
    InnoDB is free, it comes with the basic MySQL distribution and is GPLv2'd.

    You can change compatibilit settings, so you're mistaken on the ` vs. " problem.

    4.1 is in "gamma". Good enough if you ask me.

    Autoincrement is non-standard SQL anyway, so what do you expect?

    True -- if you absolutely cannot live without stored procedures, triggers, views and all the rest then MySQL may not (yet) be the right database for you. But nobody suggested otherwise.
  • by ajs (35943) <ajs@ a j s . c om> on Wednesday October 20, 2004 @04:23PM (#10579282) Homepage Journal
    The non-standard SQL isn't too bad

    MySQL's SQL is quite standard, and while there are differences [mysql.com], they are either the lack of a small number of features (like sub-selects, which you go into below) or very baroque trade-offs, which most users will not encounter.

    I suspect you're instead running into assumptions about what is and is not "standard" based on what you've used in the past. MySQL follows the ANSI SQL standard [mysql.com] as closely as all of the other databases I've used, having its own small quirks and LOTS of extensions, but basically doing as good a job of standards compliance as, say, gcc does with the ANSI C standard.

    I understand your frustration, though. I'm an old Sybase user, and I miss the Sybase extensions in many places.

    No subselects [...] That is huge

    It is?! Why? All a sub-select does is instantiate a temporary table on the fly. You can, of course, instantiate a temporary table on the fly, so what is a sub-select buying you?

    The only possible advantage would be the possibility of lazily evaluating a very large subselect, but I've come across that need in perhaps 2 or 3 situations in the course of 12 years of working with databases in production environments... how is this "huge"?

    In addition, the version that I used didn't support views.

    You're showing your lack of depth here. By definition a system that lacks sub-selects lacks views (as views are just named sub-selects).

    All that said, sub-selects are being added to MySQL, but people who pay for work on the core server don't ask for these features as much as certain others, and that's how the folks at MySQL decide what to put effort into.
  • by SpamapS (70953) on Wednesday October 20, 2004 @04:30PM (#10579366) Homepage
    You PgSQL nazis are just being ignorant. Do you think because MyISAM isn't transactional that the whole database isn't?

    MySQL's InnoDB offers the same level of data integrity as PgSQL does, and about the same performance. Where it wins is in the flexibility to choose table-by-table whether you want InnoDB, MyISAM, or the new NDBCLUSTER. There's also the universal availability and familiarity when it comes to the mysql protocol and libraries.

    As far as performance.. you might want to look at numbers on well designed threading systems such as Solaris or Linux 2.6. Most of PostgreSQL's gains are erased by MySQL 4.1's excellent code when it is combined with those OS's proper threading.

    The only place I see PgSQL winning the performance war is the query optimizer.

    PgSQL will be doomed to a BSDesque life. Touted by too many people with no social skills, nobody will ever be able to see how good it is, because they just get shouted down when asking questions.
  • by ajs (35943) <ajs@ a j s . c om> on Wednesday October 20, 2004 @04:59PM (#10579749) Homepage Journal
    Yeah, sub-selects are mostly just temporary tables.

    That is to say, if you:
    create temporary table foo select f.id from fiz f, fam m where f.name = m.name;
    select b.stuff from bar b, foo f where b.id = f.id;
    Then that's the same thing as:
    select b.stuff from bar b where b.id in (select f.id from fiz f, fam m where f.name = m.name)
    The only difference being that for sub-selects, since it's done server-side, you can optimize by evlauating the sub-select lazily. Some databases that support sub-selects do not do this optimization, but most do and in some cases it's a big win. It's just a rare enough case that it's not that big a deal for the general discussion of the value of sub-selects vs other features that people want.
  • by samjam (256347) on Wednesday October 20, 2004 @05:15PM (#10579906) Homepage Journal
    By the tone of your post you seem angry about this book, and scared to read it in case you learn something new and doubt what you have done in the past.

    1. How to setup replication with different architectures including master/slave, master/slave/slave (another level), replication rings, dual master replication with slaves and more.

    Simple, just read a few pages on the MySQL documentation. [mysql.com]

    Some people prefer printed book form, as some publishers are aware.

    2. RAID (software vs. hardware), RAID types, IDE vs. SCSI. I know. I'm sure you've thought of these things already but most people will find something new here because there is a lot on it. It is quite dense.

    What on earth does this have to do with MySQL specifically?

    You might benefit from learning about it if you are trying to optimise an SQL server?

    And furthermore, if you dont know about _RAID_, and need to be told the benefits of SCSI/Raid, do you really think you need MySQL clustering ?

    Well, once the reader has read these chapters they can make this judgement. You've manage to make this judgement for EVERYONE without reading the chapter for which I congratulate you.

    Any PC/Server with enough ram will push queries like a maniac. Also, check #4...

    queries... I think this book isn't just a book about getting fast queries.

    3. Selection of an OS. Seriously. It actually compares threading models on OSes compared to performance with MySQL. This is great stuff.

    Great, now that i've bought this book, i'm supposed to _CHANGE_ OS in my server farm, just to gain a few more queries/sec ?

    Steady there! Whoa! This isn't advertised as a book for people who are happy with what they have and know they did it right by instinct. This could just be a book for people who want to learn something before they decide what to do NEXT.

    4. Selection of a filesystem on an OS. Yes, nitty gritty. Includes talk about journaling vs non-journaling benefits and warnings.

    Filesystems? Oboy, anyone who knows about SQL performance knows that if you access _DISK_ during execution, you are screwed anyway.

    And whats going to happen to your data when you power down? I suppose you've been committing all your data to files on the disk? It might just be useful to know about the benefits and dangers of the various filesystems.

    Anyone who knows anything about indexing knows that btrees and rtrees are optimised for pulling data off random access media. There's better index forms if you are going to be ram based. I forget what they are because all my data is disk based.

    5. Load balancing mySQL. How to do this through a load balancer. Why and how it's different than load balancing web servers.

    This seems like a decent chapter.

    6. How to backup live using replication.

    DOH! We are talking backup here, right? Backup means archiving a copy of the database. MySQL allows you to use the magic command "cp" (or scp, or rsync, or..) Best # so far ...

    Totally wrong. If you want to back up a database that is being modified, you need to lock it and that would stop updates. This certainly needs a chapter.

    You could read a few pages on mysql.com about this instead if this book offends you. It sounds like you might need to. Some people prefer web sites to books.

    7. I love this: Detailed information on how the replication system actually works. I love understanding the inner workings because then I know how/why things went wrong.

    The updated node sends commands to the other machines to update as well. If something goes wrong, you could rollback the transaction.

    If there are transactions? I can use mysqlbinlog and convert to ascii logs, edit the logs, and replay the edited logs. If you didn't know about this, maybe you acknowledge the need for a chap
  • by AKAImBatman (238306) * <akaimbatman@NosPAm.gmail.com> on Wednesday October 20, 2004 @05:24PM (#10580024) Homepage Journal
    Close, but not quite.

    Well, HSQLDB is embedded, but only inside a java application

    True, but that's why I said I'd need to know more about his application.

    It persists data in a flat file that it loads on start.

    Not quite. HSQLDB uses an SQL script of the transactions to recreate the database. However, this script is significantly reduced in size if you use CACHED tables.

    If you've got more than 1000 records

    Again, depends on the data and the application. 1000 records is usually not a whole lot to cache in memory. And with CACHED table types, you could easily go up to millions of records without stressing the user's machine.

    you might want to go with something more robust, like MS Access.

    That's what he was trying to get away from.

    Berkeley DB isn't a "database" in the sense most people think.

    It's not an "SQL" database, but it is a database. (Technically, even a filesystem is a type of database, but we won't go there.) Again, it depends on his application. He gave no requirement that the database support SQL, so I went with the most natural assumption: It doesn't need it.

    It's not relational, and there's no SQL (there may be a SQL driver, but it'd be totally inefficient.

    That's an amusing statement, because MySQL does exactly that [mysql.com]. :-)
  • by Cajal (154122) on Wednesday October 20, 2004 @05:49PM (#10580291)
    MySQL's InnoDB offers the same level of data integrity as PgSQL...
    No, it doesn't. [sql-info.de]
  • No brainer. (Score:3, Informative)

    by DogDude (805747) on Wednesday October 20, 2004 @05:50PM (#10580301) Homepage
    Uh, MSDE? [microsoft.com]
  • Re:Tripping on ACID (Score:4, Informative)

    by yintercept (517362) on Wednesday October 20, 2004 @05:57PM (#10580366) Homepage Journal

    Personally, I don't just stop with ACID, I make sure I have an audit trail from start to finish and I make a barage of tests that show that the audit trails add up. Even with ACID I never buy the illusion that there will never be a problem with the transaction.

    ACID may be about data integrity. My point is that people who have an ACID database tend to stop thinking about data integrity.

    It's about making it impossible for unexpected bugs to corrupt data

    Sorry, but I don't buy into the illusion that anyone is creating perfect programs. I have had to deal with people who thought they wrote perfect programs. My experience is that the programs written by buzz word spouting gurus is generally very poor quality.

  • by harlows_monkeys (106428) on Wednesday October 20, 2004 @06:33PM (#10580722) Homepage
    MySQL can do a damn fine job for simple INSERTs and SELECTs but throw a hundred users at it or a few WHERE or ORDER BY clauses (or all of the above) and MySQL shows its true colours: Made by Fischer-Price

    Last time I checked, Dark Age of Camelot usually has more than a few hundred users, and they are using MySQL. So much for your amusing theory.

  • by boneshintai (112283) <ojacobson@@@lionsanctuary...net> on Wednesday October 20, 2004 @07:25PM (#10581171) Homepage
    I'm going to forgo the opportunity to mod this fascinating debate to discuss this. You mentionned that PostgreSQL was "an order of magnitude" slower on imports -- my guess is that your import tool simply connected and started running INSERTs? It's a common-ish trap for new users. If there is no transaction running, each statement is its own transaction, with all the BEGIN and COMMIT overhead that goes with it. Wrapping your import in a BEGIN; ... COMMIT; pair would probably have sped it up considerably.

    There is an option to change the transactionality of standalone commands, too, so that they implicitly start a transaction that doesn't end until you COMMIT it.

    VACUUM is something of a contentious point among PostgreSQL users, certainly. On the one hand, I can't think of an algorithmic way to automatically schedule statistic updates, and apparently tracking them in real time causes more of a performance hit than is needed. On the other, having to set up an external task (either a cron task or vacuumd) rather than being able to configure postmaster to vacuum itself on a schedule or other parameters is kind of a pain.

    PostgreSQL's default memory cache size is exceedingly small -- this is an issue with the default configuration that can lead to excessive disk reads and writes.

    In short, pgsql does require somewhat more effort to initially configure and maintain. On the other hand, for a lot of people it's very much worth that effort.
  • by Dirtside (91468) on Wednesday October 20, 2004 @08:41PM (#10581690) Journal
    My company uses MySQL for most of its databases (we're still in the lengthy process of converting everything over from Oracle, but, ah, we have a handful of critical tables with over 100 million rows each). We do several thousand queries per second, non-stop, and have minimal problems with MySQL. (We have fewer problems with MySQL than we did with Oracle!)

    I'm sure Postgre is great and all, but the evidence does seem to indicate that MySQL is suited for at least some intense applications.
  • by _xeno_ (155264) on Wednesday October 20, 2004 @08:58PM (#10581767) Homepage Journal
    You're forgetting the other great thing about MySQL: it doesn't bother reporting data errors!

    A recent example involves BLOBs and MySQL. The basic feature request is to be able to attach various binary files to other data. One of the ways to do that is with a BLOB. So I create a simple test so we can compare this approach with various other approaches.

    Code up the test, and attach a file. Pull the file back out, and try and read it back in. Doesn't work, the file is corrupted. At first I figure I'm interfacing with the database wrong, since I've never used BLOBs before. So I muck around with the API some, still no go.

    Then I realize I should check how much data is going in and how much is going out. Turns out I'm trying to attach a 70KB file, and I'm getting 64KB back. A little bit of research tells me that MySQL BLOB types only support up to 64KB. Oops. Change to a MEDIUMBLOB (I think) and then it works.

    Turns out instead of flagging it as an error to insert more data into the field than it can contain, MySQL just trunucates it. (So if you try and do something like insert "MYSQL SUCKS" into a CHAR(5) column, it'll say it successfully inserted one row. A select will then get "MYSQL" back.)

    Some page out there has a really nice list of things that MySQL will do with bad data. Besides trunucating values, it has some interesting ways of handling bad numeric values.

  • by Dirtside (91468) on Thursday October 21, 2004 @03:37AM (#10583921) Journal
    you're saying that you prefer to try and keep all of these in mind when you write apps that use MySQL,
    It probably says more about you than about me that you think that that list of "gotchas" (many of which are no longer valid, or are otherwise insignificant) is somehow difficult to keep in mind. None of the professional programmers I work with have any difficulty dealing with the quirks of the MySQL database environment, especially considering that that list of gotchas is tiny compared to the list of design quirks our site code has built up over the years. Hell, it's tiny compared to the quirks in any desktop GUI environment.
    What company do you own, again?
    I work at Neopets.com. 250 million pages a day, mean of 9 queries per page, which averages out to about 26,000 queries per second. And we're having many fewer DB integrity problems since we started switching back to MySQL from Oracle. What problems we do have arise when people do things like enable code that lets users do fulltext field searches 20 times a second on tables that have 200,000 rows. Our biggest tables have over 100 million rows.

    What company do you own?

  • by _xeno_ (155264) on Thursday October 21, 2004 @12:30PM (#10588376) Homepage Journal
    Oh please. Even if I had read the documentation, I probably wouldn't have known the file was too large at first anyway. I just picked a fairly small binary file to use. It turns out it was over the MySQL BLOB size.

    The proper response when receiving bad data is to throw an error. My program should not have to check its data to make sure it's valid for MySQL. Well, that's not true, it should do some checks, but bugs still exist. Eventually, some wrong data will go through to the database server.

    The DB should never "decide what's best" and do that. It should always raise an error. Yes, the client programs should try and always send valid data. But sometimes, things get messed up, and bad data gets sent. The proper response shouldn't be to just mangle the data, it should be to raise an error.

    Example:

    mysql> create table testusers (name varchar(32), uid integer primary key);
    Query OK, 0 rows affected (0.06 sec)

    mysql> insert into testusers values ( 1, 'user');
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into testusers values ( 2, '2user');
    Query OK, 1 row affected (0.00 sec)

    mysql> select * from testusers;
    +------+-----+
    | name | uid |
    +------+-----+
    | 1 | 0 |
    | 2 | 2 |
    +------+-----+
    2 rows in set (0.00 sec)
    That's just not right!
  • Re:OLAP? (Score:1, Informative)

    by Anonymous Coward on Thursday October 21, 2004 @03:12PM (#10590840)
    ROLLUP in MySQL [mysql.com]
  • by mikis (53466) on Thursday October 21, 2004 @06:28PM (#10593040) Homepage
    Check out Gaia Online [gaiaonline.com]: phpBB forum with 190,595,085 articles posted, 1,042,339 registered users -- and 10,386 of them online at this very moment. MySQL db running on one dual Opteron. Here is an interview [big-boards.com] with guy that manages it.

Their idea of an offer you can't refuse is an offer... and you'd better not refuse.

Working...