Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Image

Refactoring SQL Applications 159

stoolpigeon writes "My past as a DBA probably makes me a bit biased, but the reaction I've seen the most when a database application isn't performing as well as would be liked seems to focus on the database side of things. The search for a solution usually seems to center around tuning db parameters, the building (or removal) of indexes and, if the budget allows, throwing hardware at the problem. In their new work, Refactoring SQL Applications, Faroult and L'Hermite bring a much wider range of options to the table. There is a lot in this little book for the developer charged with fixing an existing application and I think a lot of good information that could save one from making a number of headache-inducing mistakes on a new application." Keep reading for the rest of JR's review.
Refactoring SQL Applications
author Stephane Faroult with Pascal L'Hermite
pages 293
publisher O'Reilly Media, Inc.
rating 9/10
reviewer JR Peck
ISBN 978-0-596-51497-6
summary Good for a developer charged with fixing an existing application.
The book is divided into eight chapters; the first two deal with how to approach a problematic application in general. In the preface the authors say, "This book tries to take a realistic and honest view of the improvement of applications with a strong SQL component, and to define a rational framework for tactical maneuvers." I found this to be true throughout the entire book and was impressed by how well the examples, suggestions and problems echoed my real-life experience. This book is first and foremost practical. There is really almost nothing in the book that does not come immediately to bear upon the problem at hand. I've seen others do a lot less with many more pages.

The examples and benchmarks are compared across three different popular relational database management systems. They are MySQL, Oracle RDBMS and Microsoft SQL Server. I thought that this brought up a couple interesting issues that are not directly addressed in the book. First is that the authors are talking about how to improve performance, not comparing platforms, but the numbers are there and may be of some interest to people who would like to compare them. Secondly, I've met a number of people over the years who get quite animated about insisting that a good DBA does not need to know any certain solution, but rather just the fundamentals. I think Faroult and L'Hermite put this idea to rest, though unintentionally. In order to discuss how to best understand what exactly is happening and how best remedy issues, they show that it is necessary to have an understanding of platform specific issues and tools. This is true on two levels. The first is that the location of use of the built in tools for each platform are different. The second is that what works for one platform does not necessarily work for another.

For example, Chapter Two "Sanity Checks" contains a section on parsing and bind variables. The authors compare performance when queries are hard coded, with new prepared statements on each iteration (firm coded) and with one prepared statement and changing the parameter value on each iteration in a loop (soft coded). On Oracle and SQL Server the performance was poorest with hard coded, better with firm coded and best with soft coded. MySQL did best with soft coded as well but actually took a performance hit moving from hard coded to firm coded. This had to do with differences in how MySQL server caches statements. The authors took the time to rewrite their code from java to C in order to ensure that the issue was not related to language or driver issues. This is not to say that one can ignore RDBMS and SQL fundamentals, but rather that to get top performance requires knowledge of platform specific issues. This also comes out again when dealing with optimizers.

With that in mind, the authors recommend that readers have a solid understanding of SQL and some programming language. Most examples are SQL and code is given in Java and PHP. There are also examples that illustrate SQL extensions showing procedures, functions, etc. written for all three RDBMS products covered. The authors stick primarily to standard SQL but do make note and at times show examples of how things will look in each of the other databases. This information is current and reflects the most recent versions of the each product.

The fourth chapter, "Testing Framework" is incredibly useful. The authors cover generating test data and then checking correctness of outcomes through comparison. This is really useful information for anyone working to improve an application, or writing one for the first time. I think it also a large part of why this book could really appeal to new and experienced developers as well as the developer working on existing or brand new applications. I think there is a good chance that only the most extremely experienced developer would find nothing new here, or at least some new way to approach a problem. New developers can learn quite a bit and avoid some bad habits and assumptions without having to gain that information the hard way. And then the tools for generating random data, large amounts of data and comparing results will provide excellent opportunities for learning and real world application.

The next three chapters cover dealing with specific types of issues and how to improve performance. The last chapter then quickly describes a scenario of just how the authors step into real world situations and start to attack a problem. This is followed with two appendices. The first is scripts and samples, the second tools that are available to help in finding issues and resolving them. Some of the authors tools use SQLite, which is discussed briefly in the chapter on creating test data as some of the tools depend upon it.

I think that it has been a while since I've read a book that could have such a rapid return on investment. There are many suggestions and insights that should enable anyone to squeeze better performance out of just about any database application. While the focus is on the application side, there is plenty that requires understanding and work on the database side as well. There is discussion of the parameters and hardware I mentioned at the start of this review. But rather than the only options, they are one part in a much larger and systematic approach.

The authors relate that often refactoring for this type of application comes into play when something that used to work does not work any more. This can often lead to an environment of high pressure and emotion. The desire for a rapid resolution can lead to casting about in the dark for a quick fix or a feeling that cost is no longer as significant since a fix must be had now. The authors argue, and I agree, that this is exactly when a rational, disciplined process of tracking down and fixing issues is the most valuable. I agree. The issue is of course that someone in a position to do something must have the ability to take that approach. This book will get one well on the way to being in that place. Of course it can't take a brand new developer or DBA an expert. Much like a degree it can give them some fundamental tools that will allow them to take full advantage of experience as it comes rather than just crashing and burning.

If I could I'd have any developer on a database centric application read this, and DBAs as well. There is a lot here for both sides to learn about just how much they depend upon and impact one another. This may be an idealistic dream, especially for larger shops where often the relationship between those two groups is adversarial, but I think that such an approach could only make life much better for everyone involved. For anyone looking to enter this world on either side of the DBA or developer equation, this may make a nice addition to their education. For that individual wearing both hats this could be a life saver. In this small book they will learn many things to look out for as well as gain exposure to some of the similarities and differences in what are arguably the top three relational database management systems right now.

You can purchase Refactoring SQL Applications from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

*

This discussion has been archived. No new comments can be posted.

Refactoring SQL Applications

Comments Filter:
  • by MBGMorden ( 803437 ) on Wednesday March 11, 2009 @02:22PM (#27154571)

    I have to agree. Maybe I'm unfairly biased and it's just past performance (both relayed by others and experienced by myself), but I don't trust MySQL for anything more complex than a backend for a simple website. If I want a reliable open source database for a serious project, I'd go for PostgreSQL in a heartbeat.

    I actually was looking at KnowledgeTree recently as a possible solution for a document management system for our organization (we have a clunky old system and some others here are pushing SharePoint as a replacment . . .), but it's use of MySQL pretty much stopped that dead in it's tracks. I know they'd prefer MS SQL Server as an option here. I could *probably* talk with them if it supported PgSQL. But MySQL isn't even an option to discuss for something this important.

  • by Samalie ( 1016193 ) on Wednesday March 11, 2009 @02:26PM (#27154663)

    Agreed COMPLETELY.

    I work as a DBA as well, and the moment the prototype produces reliable data, its immediately off to the next project. Only time I ever get to go back and tweak code is if some random variable that was not thought of was missed in the original design, or a bug, forces me back into the code.

    I've got some code out there that I know beyond a shadow of a doubt is horribly inefficient...but I'm not given the time and opprotunity to correct that.

  • Shoot the developers (Score:2, Interesting)

    by kafros ( 896657 ) on Wednesday March 11, 2009 @02:27PM (#27154687)
    I am a developer, and my experience has shown that if you use one of: Oracle, SQLServer,
    PostgreSQL, DB2 and application performance is poor, 99% of the time it is poor design from our
    (developer's) side.

    Developers without good understanding of Relational Databases and SQL often produce problems that
    cannot be solved by indexes, or throwing transistors at them.

    It is so nice to see a "custom" made map implemented in the database using temporary tables instead of
    using the language's built-in map functionality :-)
    sorting arrays using the database gets extra points (no kidding, I have seen this!)
  • Well... I have been using Postgresql since back WHEN MySQL didn't do transactions.... I still don't trust MySQL's transactions or the new strict mode. At the same time, I have watched PostgreSQL do an absolutely terrific job of running horrendously written queries optimally. Here are two criticisms I have about using MySQL for real application work, especially when you are distributing that application (and thus have little control over how users set up their db's):

    1) MySQL transactions are built into the table engines, and by default (last I checked, and meaning you don't install innodb, etc), the tables will not be transactional. This means that if you are building an inhouse app, you can trust it more than you can if you are distributing your software. In short, if you are distributing software you can't guarantee that it is running on a system with transactions without a great deal of headache........ The same goes for referential integrity enforcement.

    2) Strict mode can be turned off by any application. This means that the more recent data integrity checks cannot be relied upon. This is an issue on both inhouse and distributed software because it adds quite a bit of overhead to the QA process internally, and can add support headaches in software for distribution.

    MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port. It is quite a bit worse than PostgreSQL for anything else.

  • by MBGMorden ( 803437 ) on Wednesday March 11, 2009 @03:37PM (#27155783)

    I'm not holding anything against it that regard. The simple fact is that I've had two fairly low traffic MySQL databases become corrupted beyond the point of being usable within the last 3 years. The hardware wasn't at fault here (nor was it old or outdated). Now luckily, this was for something that while important, wasn't "OMG somebody's head's gonna roll!" critical (namely, it was the quarantine database for amavisd-new on a mail filter, and then later an internal message/call tracking system that we'd wrote).

    For stuff like that, where you can stand to lose the data, or at worst, roll to a backup, then MySQL has it's uses. However, our document management system for example contains tons of documents that we must legally keep archived and available (Government institution - we have to have it available for FOIA requests). We also have for instance land appraisal software keeping databases of property taxing information that we need to bill at the end of the year (with about $50 million annually riding on that - if we don't get those bills out our whole budget shuts down). I just don't trust that type of thing to MySQL. Not to mention that the "nobody ever got fired for buying Microsoft" mentality does kick in. If the database fails and I have to restore from backup, then if it's MS SQL Server or Oracle then your bosses will usually not fault me(as long as I have good backups in place, which I do). If something that critical fails and I used MySQL on the project, I very well might be looking for a new job.

  • by einhverfr ( 238914 ) <chris@travers.gmail@com> on Wednesday March 11, 2009 @03:43PM (#27155895) Homepage Journal

    You throw transistors at your developers? ;-)

    Actually I agree with you. One of the big wins on the LedgerSMB project was the approach of moving the all main queries into user defined functions (and then using them as named queries). One of the nice things about this approach is that dba-types (like myself) can address performance when it crops up rather than dealing with the ABOLUTELY horrendous approaches found in the SQL-Ledger codebase..... (Assembling queries as little bits of text strings, throwing them all together, and doing REALLY braindead things like simulating HAVING clauses in the middleware...)

    For example, we had an issue where one large user (thousands of invoices per week) was having issues paying them without spending 8 hours waiting for the page to load. We were able to get this down to something like 10 minutes across 5 pages. No added hardware, just reviewing why the old code sucked, moving in new code, and then optimizing it when its performance sucked.

  • Re:Use views (Score:3, Interesting)

    by he-sk ( 103163 ) on Wednesday March 11, 2009 @04:15PM (#27156389)

    I have to agree. In DB theory we have learned that you should normalize your data for a good database design. However, materialized views can give HUGE performance gains, by eliminating multistep joins between tables. You can't built customs indexes for queries that have these joins when the index condition is not in adjacent tables and you always have to deal with large intermediate results.

    If the app is read-only and performance is critical the best strategy is to used materialized views built from normalized source tables. If the database doesn't have materialized views (PostgreSQL) you can simulate with CREATE TABLE foo AS SELECT * FROM bar WHERE ...

  • by Leolo ( 568145 ) on Wednesday March 11, 2009 @07:16PM (#27159067) Homepage

    I write applications that use MySQL that get installed on servers at the cient's premises. I'm also the one doing with the installation and MySQL config.

    Reponding to your points :

    1. If the client were to insist on handling the MySQL part, and screwed it up, it would cease to be my problem. Or rather, I would point at the instalation and tell them were they fucked up;
    2. About turning off strict-mode. If your applications are turning off strict-mode, then don't be supprised if you break data integrity. If your clients are writing apps that turn off strict-mode and mess up data integrity, that isn't really your problem, is it?

    Or how about we put it another way: it will always be possible for someone to mess up the data. These are human issue, not a software issue. Deal with them that way.

  • If the client were to insist on handling the MySQL part, and screwed it up, it would cease to be my problem. Or rather, I would point at the instalation and tell them were they fucked up;

    Ok, so your point is that this is fine as long as you install MySQL, make sure that Innodb, etc. is installed, etc. Fine. I don't want that responsibility.

    About turning off strict-mode. If your applications are turning off strict-mode, then don't be supprised if you break data integrity. If your clients are writing apps that turn off strict-mode and mess up data integrity, that isn't really your problem, is it?

    On the other hand, it isn't a support headache I want, since I might be hired to come in and fix it. I would rather be spending my paid development time elsewhere.

    However, on that latter point, for inhouse apps where several apps share a db, you have a bigger issue, and that is that you really need to have someone auditing every app to ensure it DOESN'T turn off strict mode. Why bother with that issue? Why not use a db which ALWAYS performs these sorts of checks and where they can't be turned off?

  • by einhverfr ( 238914 ) <chris@travers.gmail@com> on Wednesday March 11, 2009 @08:24PM (#27159993) Homepage Journal

    MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port. It is quite a bit worse than PostgreSQL for anything else.

    From your description you are using MySQL for a single-app database where you run a dedicated instance of MySQL for your app. That is not the usage case I was describing, which is a central RDBMS serving out the same data to a myriad of different applications. If you are trying to go beyond ONE app for your data, you should look to PostgreSQL instead of MySQL.

  • by Anonymous Coward on Thursday March 12, 2009 @01:31AM (#27162617)

    Maybe when PgSQL becomes CASE-INSENSITIVE on WHERE clauses. I mean really, am I gonna have to fork it, give it a better, catchier name, and get rid of case sensitivity just to get a solid open source DB that doesn't drive me nuts??

    PoStGrEsQl = PostGreSQL UNLESS I EXPLICITLY SAY IT DOESN'T. Not the other way around. </rant>

  • by julesh ( 229690 ) on Thursday March 12, 2009 @05:36AM (#27163909)

    1) MySQL transactions are built into the table engines, and by default (last I checked, and meaning you don't install innodb, etc), the tables will not be transactional. This means that if you are building an inhouse app, you can trust it more than you can if you are distributing your software. In short, if you are distributing software you can't guarantee that it is running on a system with transactions without a great deal of headache........ The same goes for referential integrity enforcement.

    It's easy enough to set up a database creation script that ensures the tables have the necessary support. Sure, if you're working on tables that somebody else has created you can have issues, but in your experience how often does that happen? And if it does happen, how often do you have a choice as to what dbms is in use? In my 10+ years as a consultant I've only ever had to do that once, and never with mysql (because the client's existing system I had to integrate with used informix).

    2) Strict mode can be turned off by any application. This means that the more recent data integrity checks cannot be relied upon. This is an issue on both inhouse and distributed software because it adds quite a bit of overhead to the QA process internally, and can add support headaches in software for distribution.

    On the whole, this is probably a good thing. If the application is under your control, you can use whichever mode you want. If you're relying on somebody else's application, forcing it to use strict mode when it wasn't written for this environment could introduce subtle bugs. Now, if you were to argue that the _existence_ of these different modes of operation was an issue, then I'd probably agree. But given the existence of the modes (and that's unfortunately a necessity for backwards compatibility reasons) the ability of the application to change the mode to the one it expects is crucial.

    MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port.

    Of course, you are describing something like 90%+ of all database applications there. Yes, this book should cover MySQL: it's a very popular database that is perfectly adequate for most uses. Sure, there are applications where it shouldn't be used, but that hasn't stopped it from becoming extremely widely deployed and being used as the database of choice by web developers everywhere (which is probably the target market for this book).

  • by julesh ( 229690 ) on Thursday March 12, 2009 @05:47AM (#27163961)

    I'm not holding anything against it that regard. The simple fact is that I've had two fairly low traffic MySQL databases become corrupted beyond the point of being usable within the last 3 years. The hardware wasn't at fault here (nor was it old or outdated).

    I'm not sure what you're doing wrong here, but I think many of us have been running a lot more MySQL databases than that and never experienced corruption. Myself, I have been maintaining on average about 20 MySQL instances spread across 3 different servers for the last 10 years, and have never[1] experienced corruption that wasn't tied to a hardware failure. Sometimes after a system crash I need to do a myisamchk --recover, but even that's rare. I've never needed to do myisamchk --sort-recover, although I understand there are circumstances where this is necessary.

    So my experience has been that mysql is, essentially, rock solid. It could be that we're using it for different applications and it works for mine but not yours, but I've also never seen any reports of problems with mysql corruption that cannot be fixed automatically by the tools mysql provides.

    [1]: OK, this isn't quite true. There was one instance of corruption that was tied to a linux kernel version that had a bug where dirty buffers were sometimes corrupted prior to being flushed to disk. Hard to pin that one on MySQL either, though.

  • On the whole, this is probably a good thing. If the application is under your control, you can use whichever mode you want. If you're relying on somebody else's application, forcing it to use strict mode when it wasn't written for this environment could introduce subtle bugs. Now, if you were to argue that the _existence_ of these different modes of operation was an issue, then I'd probably agree. But given the existence of the modes (and that's unfortunately a necessity for backwards compatibility reasons) the ability of the application to change the mode to the one it expects is crucial.

    It is a good thing in some, limited circumstances (where data integrity is not critical). Where data integrity is critical, this feature means you need to have careful policies of auditing ALL applications that hit the db to make sure they never, under any circumstance, turn off strict mode. This really requires a code audit.

    You say I am describing 90% of all database applications. Fair enough, if you are looking at this from the application, rather than the information, perspective. For a single-app db it isn't bad and it scales reasonably well for simple queries. However, where it breaks down is for real enterprise information management. Say what you will, the criteria I mentioned don't apply to that environment.

    One of the things we have worked hard to do for the LSMB project is to ensure other applications can safely access the database with appropriate data integrity constraints. Given that we came from the SQL-Ledger codebase, that has been a real headache, and will probably take us another year to make a reality. However, it is a powerful feature if your customer can write add-ins to the db app in a different language without having to worry as much about data integrity issues even where it is an accounting app.

Mystics always hope that science will some day overtake them. -- Booth Tarkington

Working...