High Performance MySQL 307
High Performance MySQL | |
author | Zawodny & Balling |
pages | 294 |
publisher | O'Reilly |
rating | Excellent |
reviewer | Steve Suehring |
ISBN | 0596003064 |
summary | Optimization, Backups, Replication, Load Balancing & More |
With a title like High Performance MySQL, the reader should not expect assistance with tasks such as installation or the basics of SQL. The authors acknowledge that the book isn't meant for the MySQL beginner. Even so, the book does spend a little time going over a refresher of MySQL. If the reader is looking for a book on the basics of MySQL, though, I'd recommend picking up a different title. I might be so brazen as to suggest looking at the book on MySQL that I wrote a couple years ago, but I'll leave it for the reader to find an appropriate title for their level of knowledge with MySQL.
The book is broken out into ten chapters, beginning with a chapter on the basics of MySQL as seen from someone with some level of MySQL experience. (And since I am someone with some level of MySQL knowledge, this chapter didn't seem to be too painful of a read.) That statement begs for a further explanation. I find it painful to read books that claim to have advanced topics but find it necessary to spend (or waste) time by giving yet another explanation of how to type ./configure; make; make install in the beginning chapter. In this regard, I find High Performance MySQL to be quite good at providing the appropriate level of explanation for the audience.
The first pages alone are a great starting point for those who are wondering about the performance as it relates to the different versions of MySQL. A discussion of the binary versions of MySQL versus the source code versions gives the reader a good understanding of the implications of this important decision. Choosing correctly between the versions of MySQL can help to maximize performance.
Though only the first chapter is titled "Back to Basics," most chapters include some basic or background information pertinent to that chapter. For example, both the query and index performance-related chapters include sections with the word 'Basic' in their titles. These two chapters, "Index Performance" and "Query Performance," cover topics often misunderstood when considering performance on a database. Naturally, any database administrator worth their pay (insert your own joke here) will tell you that an index can help to improve performance. But High Performance MySQL takes it a step further by diving into why you might use a particular index given your database needs. This type of practical information is common throughout the book. Rather than just give the reader a listing of the indexes (which are mostly common among RDBMS anyway), the authors choose to tell you why you might choose one method over another.
Other chapters within High Performance MySQL include chapters on server performance tuning, replication, load balancing, and high-availability, backup and recovery, and security. I've followed MySQL forums and mailing lists over the years and some of the more common questions relate to MySQL security. The chapter on security gives the reader a great explanation of the MySQL GRANT tables and includes a sidebar on the significance of 'localhost' in MySQL which nearly always seems to be a source of confusion, though maybe not for the reader of this particular book. Like other chapters, the security chapter includes tips for the underlying environment upon which MySQL operates. For example, regardless of how good the MySQL database permissions are configured, if another piece of software on the server isn't kept up to date then the entire server, including the MySQL data, could be compromised.
Looking back at the book, I might have laid the chapters out in a different order or provided more structure through sections of the book. However, this can be dismissed as personal opinion and is non-substantive anyway. I believe High Performance MySQL is an excellent choice for the MySQL administrator or developer not only looking to improve their database performance but also to improve his knowledge of MySQL.
You can purchase High Performance MySQL from bn.com. (And the book that Suehring alludes to is the MySQL Bible . Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Re:No Thanks (Score:3, Insightful)
It Sounds Pretty Basic (Score:3, Insightful)
The important thing is, does it tell you how to set up and manage highly distributed system managing gigabytes of data under high transaction volumes?
If it doesn't, and just discusses the general issues to consider under such conditions, it is worthless for those that are already professionals. Professionals know how to consider these things, they just want concrete examples so they don't have to spend months of reading and tinkering to figure it out on their own.
Unfortunately, reading the review above doesn't even let me know whether or not the book has anything in it at a level that I might want. However, if I was writing the review I'd probably order the paragraphs differently and use some different words. That might just be personal preference.... :p
Re:Sometimes MySQL indexes are bad... (Score:5, Insightful)
Re:High Performance? (Score:4, Insightful)
The engine is only going to run as good as the crap you (developers) put into it. I would go so far as to say the simply due to the pervasiveness of MySql there are for more horrible script examples out there than for PostgreSQL.
Personally, I use to correct tool for the job. BLOG? Hell ya use MySql. OLAP? Hell no. But you could make it work and work well if you wanted to.
Apples and oranges (Score:5, Insightful)
High Performance and Incredible Data Integrity? You are comparing apples and oranges. Make up your mind. It's either "High Performance with MySQL" or "Incredible Data Integrity with PostgreSQL." You can't have your cake and eat it too. If you need real ACID (Atomicity, Consistency, Isolation, and Durability) then by all means have it, but realize that it isn't free. Please read about the relational model [wikipedia.org] and ACID [wikipedia.org]. Read about the set theory [wikipedia.org] and predicate calculus [wikipedia.org], about tuple calculus [wikipedia.org] and relational algebra [wikipedia.org]. This is complicated stuff which it is not cheap algorithmically. Most of people don't need ACID and should not be penalized by the algorithms that implement it. Those who need it will have to pay with performance, because there is only this much you can do in one cycle. A pure ACID relational model is not for everyone, some people will only need a persistent object store instead of a relational database. The most important thing to realize is that databases are not magic. They implement complicated algorithms in code that have to be run by the CPU and by definition cannot be a one-size-fits-all solution. So please don't oversimplify it by saying "I want everything with no paying for anything, and also a pony." Use the right tool for the right job.
Read only MySQL, right? (Score:4, Insightful)
The authors certainly could not be referring to a system that has medium to high levels of inserts or updates.
While MySQL is great for reads, it sucks balls with inserts and updates... even with InnoDB, even with gobs of RAM and 15K SCSI hard drives in Raid 0 configurations on Quad Opteron systems.
And what a sucky review too... Nothing about what the book addresses.
Re:Apples and oranges (Score:5, Insightful)
In fact, it is very difficult to understand where MySQL fits in today's technological environment. On one hand you have truly powerful database servers that guarantee data safety. On the other hand you have simple embedded databases that run much faster than MySQL. Where does this leave MySQL? With nothing more than a lot of "it's Open Source!" momentum.
High Performance and Incredible Data Integrity?
I never said that. Please don't act like I did.
Re:How does Yahoo! Finance use MySQL? (Score:1, Insightful)
Re:It Sounds Pretty Basic (Score:3, Insightful)
It is generally faster than Postgres,
It handles large datatypes more easily than Postgres (although Postgres, while odd, has some more flexibility)
Postgres has an odd 8k limit per row (probably fixed by now though)
Supports selects across databases.
It has good replication support (with 4.x and later)
Comes with some good admin tools in the normal distro.
Decent full text indexing out of the box
It is really about choosing the right tool for the job, each has their advantages and disadvantages.
Re:How does Yahoo! Finance use MySQL? (Score:4, Insightful)
OK, that explains a lot. 250 mil. queries in 45 days is normal for a small company like us and definetely peanuts for Yahoo - i.e. I'd say that this allows the conclusion that nothing particularly important or demanding (i.e. heavily used compared to Yahoo's overall traffic) is built using MySQL at Yahoo. Just to put this into perspective, a little dual Xeon box here does more than 5 million *INSERTs* every day and a large number of SELECTs (we update each of our currently 2.4 million product prices at between 1 and 20 times per day - the exact number of INSERTs per day is currently unknown to me). We use PostgreSQL 7.4.5 though, so we don't consider this a particularly impressive feat ;-)
Re:It Sounds Pretty Basic (Score:3, Insightful)
When PostreSQL came on the scene I attempted to use it, but there was little in the way of documentation. I went the MySQL route for my little database projects because it had better docs at the time. I never decided MySQL was "1337" and that all other databases were "evil".
It had good docs, was cheap, and was good enough to get the job done.
MySQL, if you can read this, your using it right now.
Re:No Thanks (Score:3, Insightful)
Give me a heart attack, why don't you?!
Re:No Thanks (Score:2, Insightful)
Re:Apples and oranges (Score:2, Insightful)
Tripping on ACID (Score:4, Insightful)
As for the parent post, he's just bought into the illusion that database gurus think at a transcendental level. It is a common ailment. Object gurus, business gurus, religious gurus, political gurus are all treading on the same clouds. It is a common ailment that usually cures itself with time.
Re:Apples and oranges (Score:4, Insightful)
Actually, MySQL has been moving in that direction for quite some time now. MySQL 4.1 has subqueries [mysql.com] and MySQL 5.0 has stored procedures [mysql.com].
Of course, MySQL still doesn't quite have the functionality of Postgress, especially in the data integrity department, so there's certainly truth to your claims. However, my point is that it's not standing still. It's certainly improving, so we can't just keep pointing out its past mistakes.
Re:Apples and oranges (Score:3, Insightful)
It fits your requirements perfectly without requiring the commercial license. Of course the installer bit depends also on your application, but it shouldn't be too hard to do. You may want to check out the pginstaller project here: http://pgfoundry.org/projects/pginstaller/ [pgfoundry.org].
Re:Apples and oranges (Score:4, Insightful)
I would never use MySQL for financial transactions but for non-monetary usage it's a good general purpose database as long as your don't require complex business rules filled with lots of cascading actions.
Re:MySQL Performance (Score:4, Insightful)
OK, MySQL can be used as a cache. It's like Squid and Oracle is like Apache. You can use Oracle to handle your actual transactions, integrity constraints, etc etc, then periodically dump out the data into MySQL and generate the web pages from there.
I have Oracle databases with thousands of connected users, all doing both queries and transactions. Could you read the data quicker from MySQL? Probably. Can MySQL manage tens - sometimes hundreds - of thousands of locks at a time? Not a chance.
Re:Apples and oranges (Score:2, Insightful)
You PgSQL nazis are just being ignorant. Do you think because MyISAM isn't transactional that the whole database isn't?
I never said that; I have not seen a decent benchmark showing InnoDB having *any* kind of decent performance though. And it still does nothing to address the scalability issues of MySQL with hundreds of users, nor does it address the speed issues of MySQL in general when performing complex queries.
MySQL's InnoDB offers the same level of data integrity as PgSQL does, and about the same performance.
Show me. And I'm not talking "how fast can I do a thousand SELECT * FROM mytable;" -- good, real-life queries with WHERE clauses, maybe cursors and ordering.
Most of PostgreSQL's gains are erased by MySQL 4.1's excellent code when it is combined with those OS's proper threading.
Again, show me. Postgres' performance increases in 7.4.x. are nothing to sneeze at, and certainly have nothing to do with the threading changes between 2.4 and 2.6.
PgSQL will be doomed to a BSDesque life. Touted by too many people with no social skills, nobody will ever be able to see how good it is, because they just get shouted down when asking questions.
That's funny; I see something similar with the general MySQL crowd; it's like watching a bunch of pizza-faced 13 year olds drool over the latest and greatest nVidia card and trying to convince their parents to buy the $500 video card to replace the $250 they bought only 6 months ago so they can play Doom3, when all the parents want to be able to do is balance the checkbook and get some real work done.
Re:Apples and oranges (Score:5, Insightful)
I've got a few million reports that basically look like this: some header information of the form "variable=value" (about a dozen possible values of variable), and then one or more sections. Each section has a header in the same format as the report header, and from 0 to 5000 entries of the form "item:status:file", where name is one of about 5000 items, status is one of 5 status values, and file is the full path of a file containing information about the item.
I want these in a database. So, I've got a table that represents reports, which gets the information from the header. Another table represents report sections, and has the information from the section headers, and references the report table. A third table represents the "item:status:file" lines. I've actually got a bunch more tables, because the database is in third normal form, but the above is the gist of it.
With MySQL, I write the straightforward code to parse reports and insert into the database. It takes about a day to import all my reports, and querying them runs at decent speed.
With PostgreSQL, it starts out about an order of magnitude slower at importing, and after four days, when it is isn't anywhere near done, it is running even slower.
PostgreSQL does seem to query a bit faster than MySQL (but see below), but that slowness in getting the data in is a killer.
As far as queries go, PostgreSQL seems fast, if you've done "VACUUM ANALYZE" on the database. If you don't do that, it is incredibly slow. For example, if I have a simple table, indexed on an integer field, and do a select that was "WHERE that_field = 123", it is fast, and uses the index, but if I do "WHERE that_field IN (123, 456)", it doesn't use the index, unless you analyzed the table--it scans the whole freaking table. MySQL is smart enough to actually use an index without you having to run a special command first. Geesh!
So, here's what it comes down to, and why sooooo many people choose MySQL over PostgreSQL: PostgreSQL seems to require that you know a fair amount about database stuff to actually make it work well, even if you aren't trying to do fancy stuff. MySQL, when you are doing simple stuff, just works, and works well.
Oh, one more thing. When I was importing my few million records into MySQL, it barely loaded down my machine. With PostgreSQL, pretty early on, the disk light came on and stayed on, and doing things on the machine became painful. If that were a shared machine, say, a web server with shared hosting, replacing the MySQL in LAMP with PostgreSQL would be a disaster.
Re:Apples and oranges (Score:1, Insightful)
"You can't have your cake and eat it too"
Sure you can. It's called Oracle
Re:Apples and oranges (Score:4, Insightful)
Actually, I did have the complete processing for each report in a begin/commit pair. That was one of the reasons I was experimenting with PostgreSQL...with MySQL there was the annoyance of having to make sure the whole report was going to be OK before inserting any of it, so I couldn't just add to each table as I processed the report, because I might find out near the end that the report is bogus and want to discard it.
In short, pgsql does require somewhat more effort to initially configure and maintain. On the other hand, for a lot of people it's very much worth that effort
Right...and for a lot of people, they need something the works in the common case without that effort, and so there's always going to be an important place for MySQL.
What I'd really like to see is a good document included with PostgreSQL that says "So..you've got a big application that just works fine in MySQL, but you want to convert? Here's what you do" that covers all these things.
no (Score:2, Insightful)
DBAs and those who actually know how to use all the features in the high-end database exploit the features in the database. PostgreSQL supports the most features of the free DBs and it's pretty fast when you use connection pooling compared to MySQL, so they go with it.
People see this and immediately declare MySQL crap, dredge up all those reasons not to use it, and declare PostgreSQL the winnar! [sic] People love conflict....
So, you can use whatever you want. I'm sure Yahoo! picked MySQL because of fast SELECTs and that's all that is really important to them. I pick PostgreSQL because I don't want to have to convert from MySQL to PostgreSQL because I need some feature.
Re:How does Yahoo! Finance use MySQL? (Score:3, Insightful)
> needs through homegrown flat files and Berkeley DB databases. But those solutions
> proved to be inflexible and not scalable enough for their needs, with the
>volume of data nearly doubling every few months
So, a SQL database is better than flat files. Ergo, MySQL rules...?
I'm glad that Yahoo has found a solution that works (or at least that the person who said that thinks it does) but that doesn't say anything about why MySQL is better or worse than another SQL database, or another kind of DBMS for that matter. Did they even do an evaluation, or did they just upgrade from DBM to MySQL and see no reason to consider other options?
If Yahoo Finance loses a story, or a stock chart, or a user account, or an email, big whoop, it's a huge free service. If they can run 30% fewer database servers, that's more important. So it probably is a good choice for them. But if there were actual money in transit they'd probably want to use something else for that (there's no reason the content management couldn't stay in MySQL, of course).
Re:It Sounds Pretty Basic (Score:3, Insightful)
>The special handling of null forces you to write slightly less braindead code.
Way to spin it, dude. Here's another way to look at it: The retardo handling of null in MySQL makes your work as a programmer harder. It's just a pissing match argument: only a wuss needs those training-wheels features. Real Programmers never make mistakes.
create table foo (bar integer not null, biz integer not null);
insert into foo (bar) values (1);
select * from foo;
+-----+-----+
| bar | biz |
+-----+-----+
| 1 | 0 |
+-----+-----+
That's just the MySQL dudes deciding that silently doing the wrong thing is better than giving you an error message. Hmm, 0 isn't null, so technically the NOT NULL is not violated, but that's not what it does on other databases, and in SQL books, etc.
This sort of thing is consistent with MySQL AB's attitude toward product design and standards compliance: we don't feel like implementing it because it's hard, and/or the people who are asking for it don't really need it, and/or we don't really understand the issue, so we'll just document it and that makes the problem go away.
So, you have to learn all sorts of oddities about the way that MySQL differs from everything else, and that makes your job as a programmer harder, which means more bugs and slower coding.
Re:Apples and oranges (Score:3, Insightful)
That extra effort to configure and maintain is why we went with MySQL. It's hella fast (and yes, that *is* a technical term), isn't plagued by the insecurities of MS SQL Server, and it's a breeze to install and configure -- don't even get me started on PostgreSQL's requirement that it be run from a non-Administrator account under Windows when EVERYONE with a default installation of Windows from NT 4.0 through XP is running as an Administrator.
At the end of the day, the time and aggravation saved makes MySQL the hands-down winner over PostgreSQL, it's price/performance make it the smack-down winner over Oracle, and the security, stability, and speed make it the no-lie DB God over MS SQL Server.
Now if it just had Booleans as a native datatype...
Re:Apples and oranges (Score:3, Insightful)
But that's fine. Not everyone needs all of the current features of PostgreSQL let alone all of the potential new ones.
HOWEVER, MySQL needs to work on data integrity post haste. "my new purple crayon" being logically equal to 0 is not a missing feature; it's a data loss bug. No error. Not even a warning. A bug in software that tries putting a string into an integer field will have no notification that anything went wrong until after the data is retrieved later. It's things like this that keep me from liking MySQL, not its lack of features.
Until then, when I need SQL and something small, I'll be using SQLite.