Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Refactoring SQL Applications

Posted by samzenpus on Wed Mar 11, 2009 01:04 PM
from the read-all-about-it dept.
thumbnail
stoolpigeon writes "My past as a DBA probably makes me a bit biased, but the reaction I've seen the most when a database application isn't performing as well as would be liked seems to focus on the database side of things. The search for a solution usually seems to center around tuning db parameters, the building (or removal) of indexes and, if the budget allows, throwing hardware at the problem. In their new work, Refactoring SQL Applications, Faroult and L'Hermite bring a much wider range of options to the table. There is a lot in this little book for the developer charged with fixing an existing application and I think a lot of good information that could save one from making a number of headache-inducing mistakes on a new application." Keep reading for the rest of JR's review.
The book is divided into eight chapters; the first two deal with how to approach a problematic application in general. In the preface the authors say, "This book tries to take a realistic and honest view of the improvement of applications with a strong SQL component, and to define a rational framework for tactical maneuvers." I found this to be true throughout the entire book and was impressed by how well the examples, suggestions and problems echoed my real-life experience. This book is first and foremost practical. There is really almost nothing in the book that does not come immediately to bear upon the problem at hand. I've seen others do a lot less with many more pages.

The examples and benchmarks are compared across three different popular relational database management systems. They are MySQL, Oracle RDBMS and Microsoft SQL Server. I thought that this brought up a couple interesting issues that are not directly addressed in the book. First is that the authors are talking about how to improve performance, not comparing platforms, but the numbers are there and may be of some interest to people who would like to compare them. Secondly, I've met a number of people over the years who get quite animated about insisting that a good DBA does not need to know any certain solution, but rather just the fundamentals. I think Faroult and L'Hermite put this idea to rest, though unintentionally. In order to discuss how to best understand what exactly is happening and how best remedy issues, they show that it is necessary to have an understanding of platform specific issues and tools. This is true on two levels. The first is that the location of use of the built in tools for each platform are different. The second is that what works for one platform does not necessarily work for another.

For example, Chapter Two "Sanity Checks" contains a section on parsing and bind variables. The authors compare performance when queries are hard coded, with new prepared statements on each iteration (firm coded) and with one prepared statement and changing the parameter value on each iteration in a loop (soft coded). On Oracle and SQL Server the performance was poorest with hard coded, better with firm coded and best with soft coded. MySQL did best with soft coded as well but actually took a performance hit moving from hard coded to firm coded. This had to do with differences in how MySQL server caches statements. The authors took the time to rewrite their code from java to C in order to ensure that the issue was not related to language or driver issues. This is not to say that one can ignore RDBMS and SQL fundamentals, but rather that to get top performance requires knowledge of platform specific issues. This also comes out again when dealing with optimizers.

With that in mind, the authors recommend that readers have a solid understanding of SQL and some programming language. Most examples are SQL and code is given in Java and PHP. There are also examples that illustrate SQL extensions showing procedures, functions, etc. written for all three RDBMS products covered. The authors stick primarily to standard SQL but do make note and at times show examples of how things will look in each of the other databases. This information is current and reflects the most recent versions of the each product.

The fourth chapter, "Testing Framework" is incredibly useful. The authors cover generating test data and then checking correctness of outcomes through comparison. This is really useful information for anyone working to improve an application, or writing one for the first time. I think it also a large part of why this book could really appeal to new and experienced developers as well as the developer working on existing or brand new applications. I think there is a good chance that only the most extremely experienced developer would find nothing new here, or at least some new way to approach a problem. New developers can learn quite a bit and avoid some bad habits and assumptions without having to gain that information the hard way. And then the tools for generating random data, large amounts of data and comparing results will provide excellent opportunities for learning and real world application.

The next three chapters cover dealing with specific types of issues and how to improve performance. The last chapter then quickly describes a scenario of just how the authors step into real world situations and start to attack a problem. This is followed with two appendices. The first is scripts and samples, the second tools that are available to help in finding issues and resolving them. Some of the authors tools use SQLite, which is discussed briefly in the chapter on creating test data as some of the tools depend upon it.

I think that it has been a while since I've read a book that could have such a rapid return on investment. There are many suggestions and insights that should enable anyone to squeeze better performance out of just about any database application. While the focus is on the application side, there is plenty that requires understanding and work on the database side as well. There is discussion of the parameters and hardware I mentioned at the start of this review. But rather than the only options, they are one part in a much larger and systematic approach.

The authors relate that often refactoring for this type of application comes into play when something that used to work does not work any more. This can often lead to an environment of high pressure and emotion. The desire for a rapid resolution can lead to casting about in the dark for a quick fix or a feeling that cost is no longer as significant since a fix must be had now. The authors argue, and I agree, that this is exactly when a rational, disciplined process of tracking down and fixing issues is the most valuable. I agree. The issue is of course that someone in a position to do something must have the ability to take that approach. This book will get one well on the way to being in that place. Of course it can't take a brand new developer or DBA an expert. Much like a degree it can give them some fundamental tools that will allow them to take full advantage of experience as it comes rather than just crashing and burning.

If I could I'd have any developer on a database centric application read this, and DBAs as well. There is a lot here for both sides to learn about just how much they depend upon and impact one another. This may be an idealistic dream, especially for larger shops where often the relationship between those two groups is adversarial, but I think that such an approach could only make life much better for everyone involved. For anyone looking to enter this world on either side of the DBA or developer equation, this may make a nice addition to their education. For that individual wearing both hats this could be a life saver. In this small book they will learn many things to look out for as well as gain exposure to some of the similarities and differences in what are arguably the top three relational database management systems right now.

You can purchase Refactoring SQL Applications from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by bogaboga (793279) on Wednesday March 11 2009, @01:11PM (#27154387)

    This study would have carried more weight if it had included PostgreSQL and IBM's DB2. These two databases do more serious work than MySQL although many believe MySQL is more widely deployed.

    • by MBGMorden (803437) on Wednesday March 11 2009, @01:22PM (#27154571)

      I have to agree. Maybe I'm unfairly biased and it's just past performance (both relayed by others and experienced by myself), but I don't trust MySQL for anything more complex than a backend for a simple website. If I want a reliable open source database for a serious project, I'd go for PostgreSQL in a heartbeat.

      I actually was looking at KnowledgeTree recently as a possible solution for a document management system for our organization (we have a clunky old system and some others here are pushing SharePoint as a replacment . . .), but it's use of MySQL pretty much stopped that dead in it's tracks. I know they'd prefer MS SQL Server as an option here. I could *probably* talk with them if it supported PgSQL. But MySQL isn't even an option to discuss for something this important.

      • by vlm (69642) on Wednesday March 11 2009, @02:10PM (#27155363)

        You forgot the other often repeated/never researched traditional slashdot claims seen in every mysql comment section, such as mysql doesn't do transactions and doesn't do replication, both of which are necessary for each and every database install ever done past present or future... (Of course it has those features for about half a decade, maybe more, certainly since around the 4.0 range, but that never slows down the folks that repeat those claims)

        Then there are the re-occurring claims that mysql is useless because it doesn't have some bizarre feature that you might personally think is useful, therefore any database without it is useless for everyone doing anything, like perhaps direct file importation of COBOL ISAM punch cards, or an internal database representation for complex four dimensional vectors. You know, the stuff everyone uses.

        Then there are the posts explaining how a failing hard drive on an old gateway 2000 vaporized the filesystem and/or bad ram caused endless kernel lockups, and the mysql software was running on that bad hardware, and correlation always implies causation, so mysql must be bad too.

        Finally I expect several posts about how they found an obscure bug in the beta 3.23 version back around eight years therefore they'll never use it again because that is the only software that has ever had a bug.

        • Well... I have been using Postgresql since back WHEN MySQL didn't do transactions.... I still don't trust MySQL's transactions or the new strict mode. At the same time, I have watched PostgreSQL do an absolutely terrific job of running horrendously written queries optimally. Here are two criticisms I have about using MySQL for real application work, especially when you are distributing that application (and thus have little control over how users set up their db's):

          1) MySQL transactions are built into the table engines, and by default (last I checked, and meaning you don't install innodb, etc), the tables will not be transactional. This means that if you are building an inhouse app, you can trust it more than you can if you are distributing your software. In short, if you are distributing software you can't guarantee that it is running on a system with transactions without a great deal of headache........ The same goes for referential integrity enforcement.

          2) Strict mode can be turned off by any application. This means that the more recent data integrity checks cannot be relied upon. This is an issue on both inhouse and distributed software because it adds quite a bit of overhead to the QA process internally, and can add support headaches in software for distribution.

          MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port. It is quite a bit worse than PostgreSQL for anything else.

          • Re: (Score:3, Interesting)

            by julesh (229690)

            1) MySQL transactions are built into the table engines, and by default (last I checked, and meaning you don't install innodb, etc), the tables will not be transactional. This means that if you are building an inhouse app, you can trust it more than you can if you are distributing your software. In short, if you are distributing software you can't guarantee that it is running on a system with transactions without a great deal of headache........ The same goes for referential integrity enforcement.

            It's easy e

            • Re: (Score:3, Interesting)

              by einhverfr (238914)

              On the whole, this is probably a good thing. If the application is under your control, you can use whichever mode you want. If you're relying on somebody else's application, forcing it to use strict mode when it wasn't written for this environment could introduce subtle bugs. Now, if you were to argue that the _existence_ of these different modes of operation was an issue, then I'd probably agree. But given the existence of the modes (and that's unfortunately a necessity for backwards compatibility reasons)

            • Re: (Score:3, Interesting)

              by einhverfr (238914)

              If the client were to insist on handling the MySQL part, and screwed it up, it would cease to be my problem. Or rather, I would point at the instalation and tell them were they fucked up;

              Ok, so your point is that this is fine as long as you install MySQL, make sure that Innodb, etc. is installed, etc. Fine. I don't want that responsibility.

              About turning off strict-mode. If your applications are turning off strict-mode, then don't be supprised if you break data integrity. If your clients are writing apps t

            • MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port. It is quite a bit worse than PostgreSQL for anything else.

              From your description you are using MySQL for a single-app database where you run a dedicated instance of MySQL for your app. That is not the usage case I was describing, which is a central RDBMS serving out the same data to a myriad of different applications. If you are trying to go

        • by MBGMorden (803437) on Wednesday March 11 2009, @02:37PM (#27155783)

          I'm not holding anything against it that regard. The simple fact is that I've had two fairly low traffic MySQL databases become corrupted beyond the point of being usable within the last 3 years. The hardware wasn't at fault here (nor was it old or outdated). Now luckily, this was for something that while important, wasn't "OMG somebody's head's gonna roll!" critical (namely, it was the quarantine database for amavisd-new on a mail filter, and then later an internal message/call tracking system that we'd wrote).

          For stuff like that, where you can stand to lose the data, or at worst, roll to a backup, then MySQL has it's uses. However, our document management system for example contains tons of documents that we must legally keep archived and available (Government institution - we have to have it available for FOIA requests). We also have for instance land appraisal software keeping databases of property taxing information that we need to bill at the end of the year (with about $50 million annually riding on that - if we don't get those bills out our whole budget shuts down). I just don't trust that type of thing to MySQL. Not to mention that the "nobody ever got fired for buying Microsoft" mentality does kick in. If the database fails and I have to restore from backup, then if it's MS SQL Server or Oracle then your bosses will usually not fault me(as long as I have good backups in place, which I do). If something that critical fails and I used MySQL on the project, I very well might be looking for a new job.

          • Re: (Score:3, Interesting)

            by julesh (229690)

            I'm not holding anything against it that regard. The simple fact is that I've had two fairly low traffic MySQL databases become corrupted beyond the point of being usable within the last 3 years. The hardware wasn't at fault here (nor was it old or outdated).

            I'm not sure what you're doing wrong here, but I think many of us have been running a lot more MySQL databases than that and never experienced corruption. Myself, I have been maintaining on average about 20 MySQL instances spread across 3 different ser

        • Re: (Score:2, Funny)

          Exactly, MySQL is nothing but a toy database.

          You're right! I wanted to catalog all my LEGO sets and G.I.Joes and it was just useful enough.

        • by bogaboga (793279) on Wednesday March 11 2009, @01:34PM (#27154793)

          Exactly, MySQL is nothing but a toy database.

          This is the problem with most slashdotters. Most of them put up unsupported comments. What I would like you to do is to support your claims by pointing us to websites that have made the "mistake" of first running MySQL and later discovering the "light" in adopting PostgreSQL or otherwise.

          Alternatively, you could websites that use MySQL; which websites can be branded as "toy websites" by extension.

          • by not already in use (972294) on Wednesday March 11 2009, @01:49PM (#27155033)
            Puh-leez. You don't come to slashdot for objective discussion. You come here to mingle with folks who enforce your inherently held "truths." You come here to post the same Microsoft jokes that haven't been funny for 10 years and get modded +1, Funny. You come here to ridicule the less technically inclined in an attempt to inflate your hollow ego. You come here to partake in a circle-jerk of technical non-conformance. You come here for many things, none of which involve objective, reasonable discussion.

            As an aside, some of the toy websites that use mySQL include Flickr, Facebook, Wikipedia, Google, Nokia and YouTube.
            • wait - most of those sites also have memcached frontend for fetching data, custom message passing apis for balancing load, bigtables of google, and a lot of other cruft to make them run reasonably.

              I like mySQL, it's great for low to huge sized sites, but a database - any database - would get you only so far.

              still, I'd prefer not to handle mysql structural data, as it mangles text removing spaces, changing nulls and making unexpected modification to your data. It thinks it's helping me, but that's good on
            • by mcrbids (148650) on Wednesday March 11 2009, @03:31PM (#27156611) Journal

              The typical argument goes something like: 'MySQL suxorz - nobody uses it for serios work' followed by: 'Yeah? well explain that to =HIGH VOLUME SITE=!'

              Such responses show a misunderstanding of what serious work is being discussed.

              MySQL does a fabulous job with simple, high-volume transactions, exactly the type seen by Yahoogle/Flicker/Blogsites. They need to sore simple data (EG text) and be able to retrieve it quickly, and for these uses, MySQL is probably a better bet than Postgres or DB2.

              But 'serios work' means thing like strong, ACID compliant transactions, row-level locking, strong integrity of field types, and a query scheduler that holds its own when you combine inner, outer, nested, subqueries mashing together a dozen or more tables with millions/billions of records/combinations.

              Postgres will do this, MySQL won't. MySQL isn't bad because of this, it's just a tool not well suited to this specific job. I use MySQL for website CMS, I use Postgres for financial applications.

              Does your dishwasher suck because it does a piss-poor job cleaning your socks? Use the right tool for the job.

                • Re: (Score:3, Insightful)

                  by mcrbids (148650)

                  PS: Your company is pissing away tens of thousands of dollars on Oracle, when you could use PostgreSQL for free!

                  And no, I haven't read your requirements, but I'd be intrigued to find out what needs Oracle answers that PostgreSQL can't!

                  • by Alpha830RulZ (939527) on Thursday March 12 2009, @12:18AM (#27162525)

                    And no, I haven't read your requirements, but I'd be intrigued to find out what needs Oracle answers that PostgreSQL can't!

                    See, I have this budget that I need to use up, or I lose the budget, and then my pay grade goes down, and I don't get to keep my secretary and this office with the nice window...

                    Or, I have this Oracle DBA, and I can't convince him to learn any other platform, because he sez it's bad for his career, and he's my brother in law...

                    Other than that, I like PostgreSQL real well, too. MS SQL Server is a pretty good low cost solution for a lot of smaller uses, too, if your company insists on spending money.

                  • Re: (Score:3, Informative)

                    Well, if parent is using fuzzy full-text searches he's well better off with Oracle.

                    Even non-fuzzy full-text searches on Postgresql are a pain. Yes, they do work great, but the syntax is an abomination.

                    I loathe Oracle as much as the next guy, but even MySQL does a better job at fuzzy string matching! Really.
          • by Splab (574204) on Wednesday March 11 2009, @03:08PM (#27156317)

            GP is right, MySQL is a toy database, advanced toy but still a toy database.

            The absolutely most important thing for a database is data integrity, the ability to trust in your system - when it says "Yeah I saved that for you", it should take catastrophic events to lose it again.

            MySQL treats data in a best effort way, if what you asked it to do doesn't fly with current config, it reverts to something that looks right enough and go with that.

            Consider a database setup, admin installs MySQL with default creates some tables, runs it for a while, decides he needs more log space (transaction), he adjusts the settings and restart MySQL. It starts, everything is peachy. Transactions are running, being committed, he adds more tables, and then suddenly shit hits the fan, he does a rollback, MySQL says ok, but lo and behold, the data is still there...

            So what went wrong? When he changed the transaction log size MySQL during start up realized an inconsistency between the actual log file size and the wanted, MySQL can't expand this file on the fly so InnoDB is disabled, MySQL now reverts to MyISAM (I am not kidding, this is what MySQL will do). Any subsequent calls to begin and commit transaction will be accepted with an OK. Any tables created afterward will be accepted, even with explicit engine syntax MySQL will just issue query ok, 1 warning.

            Now the warning will tell you that the InnoDB engine wasn't available, so MySQL chose MyISAM instead - however, most aren't aware of this behavior, especially since most programming languages does not support this.

            A database should at no point _ever!_ say "OK" to a request for something that can't be handled. If I say begin transaction and something isn't right I want my database to shout on top of its binary lungs that something is wrong and my data isn't safe.

          • It isn't quite that simple, but I suppose one of my earlier (and later abandoned) projects qualifies.

            I set up HERMES (a CRM suite written in PHP4) originally on MySQL and eventually discovered that the lack of transactions, etc. were a serious problem (this was back in 1999). I tried to move it over to PostgreSQL and discovered that PostgreSQL was really hard to administer (this was back in 1999). I ended up doing all my prototyping on MySQL, then converting the schemas to PostgreSQL using mysql2pgsql.pl because this was the only way I could get the data protections I needed (back in 1999).

            Now, both MySQL and PostgreSQL have come a long way in the nearly-a-decade since then. MySQL has added transactions (for some table types not installed by default), foreign keys (for some table types not installed by default), strict mode (which can be circumvented on the app level), and a the planner has gotten much better. On the other hand, nearly every one of my issues with PostgreSQL has been resolved too. 8.3 has some really impressive new features from a developer perspective, and 8.4 will have even more. I haven't had to do prototyping on MySQL since PostgreSQL 7.3 came out.

            I still stand by the statement that "compared to PostgreSQL, MySQL is a toy," and I would expect the gap between them to continue to widen. However what was limited to light content management db's in 1999 (MySQL), has become better able to handle a wider range of single-app dbs. MySQL is still no reasonable choice for an enterprise-wide database management solution especially where critical data is involved, but there are an increasing number of special cases where it is an option, in particular when compared to Firebird's embedded version, SQLite, and stuff like Sybase's SQL Anywhere. Comparing MySQL to MS SQL though only comes out favorably for MySQL where MS SQL is quite a bit more than is needed. PostgreSQL OTOH can in most cases compare favorably to Oracle, DB2, and MS SQL.

            So the other half of the statement needs to be "but there are some cool things you can do with a toy db...."

          • by Fulcrum of Evil (560260) on Wednesday March 11 2009, @03:14PM (#27156379)

            What I would like you to do is to support your claims by pointing us to websites that have made the "mistake" of first running MySQL and later discovering the "light" in adopting PostgreSQL or otherwise.

            It's a toy database because when things aren't set up properly, they don't fail. Instead, they succeed silently and corrupt data (see using the wrong file format for your tablespace). Also, the developers are a treat - "we don't need transactions, do integrity checks in the app", followed by "we now have transactions, aren't we cool". Do they have triggers yet? Meanwhile, I have postgres, which works just fine.

          • They don't focus on MySQL - and I don't think the gp (or whatever it was) said they did - but just so it's clear. They do everything evenly between the three. But no - they do not do this for DB2 or PostgreSQL. I don't know how or why they chose those 3 - but that is what they cover. Though I have to imagine much of this will, as in The Art of SQL carry over to any RDBMS.

    • The book is not a study, it is trying to teach refactoring concepts with the idea that you can take and apply them to any SQL project. Surely you can figure out how to apply an Oracle or MySQL example to PostgreSQL, yes?
  • by Lumpy (12016) on Wednesday March 11 2009, @01:16PM (#27154471) Homepage

    But with management.

    when I spent a few years as a DBA it was common to be told to not work on that project any more as soon as it produced usable data. That means as soon as you have a working prototype you are required to drop it and start the next project. Many times after you get a working prototype you then go back and refine it so that it's faster and uses less resources.

    Management is the blame. Unrealistic deadlines for DBA's and if you are honest with them and give a report that you have data they think it's good to go. I actually got wrote up once for taking one of the old procedures we had and rewriting it so that it worked much faster and the resource hog it was was reduced to the point that others could use the DB while it ran. I was told I was wasting time.

    • by Samalie (1016193) on Wednesday March 11 2009, @01:26PM (#27154663)

      Agreed COMPLETELY.

      I work as a DBA as well, and the moment the prototype produces reliable data, its immediately off to the next project. Only time I ever get to go back and tweak code is if some random variable that was not thought of was missed in the original design, or a bug, forces me back into the code.

      I've got some code out there that I know beyond a shadow of a doubt is horribly inefficient...but I'm not given the time and opprotunity to correct that.

    • by CodeBuster (516420) on Wednesday March 11 2009, @02:43PM (#27155897)
      This experience speaks to a more general issue that I have with non-technical MBA types who tend to reduce everything to a dollars and cents issue without fully appreciating or even being able to fully appreciate either the technical OR the financial consequences of their decisions. They assume that their MBA piece-of-paper mail-order diploma makes them oh-so-much smarter than anyone else who doesn't have one, when in fact the smartest people tend to study mathematics, physics, engineering, other hard science, or even philosophy while the intellectual light-weights study social science and get their MBA. If anyone is actually a waste of time and resources then it is the middle management social climbers who produce a lot of hot air using the latest "management techniques" that they read about in a trade magazine on an airline flight or heard about at a conference held in a cheap hotel ballroom.
  • by trybywrench (584843) on Wednesday March 11 2009, @01:20PM (#27154541)
    I'd like to see some work done on the balancing act of how much to do in code and how much to do in SQL. My coworker can put SQL statements together that if printed on an 8.5x11 would fill the whole sheet if not run over. Me, on the other hand, I tend to break up huge sql statements into a set of smaller ones and then use code to do some of the work that could possible have been done in SQL. I don't have the time to find out what works best on my own but I do have the time to read about it.

    btw, how come tech books don't come on tape/cd?
    • by Saint Stephen (19450) on Wednesday March 11 2009, @01:42PM (#27154945) Homepage Journal

      On a REAL database, like Oracle, the query optimizer will factor common expressions, eliminate unused branches, and in general execute your SQL in completely different manner than what you write.

      Doing things in a "relational calculus" way, where you specify what to be done (i.e., with SQL) is superior to doing things in a "relational algebra" way (individual statements correlated by procedure code).

      I've written some queries that were a dozen pages long for a individual statement, mostly because I use a python-like style where the indentation specifies the structure and thus you can string together monstrous subexpressions and not get confused. The DBA was like "you're not running that on MY box," but it ran super fast because of the query optimizer.

      That's what I mean when I say MySql is a Toy, compared to DB/2, Oracle, or SQL Server. The query optimizer.

    • by OG (15008)

      btw, how come tech books don't come on tape/cd?

      Only on Slashdot do you find someone who wants to listen to Natalie Portman talk SQL.

      Joking aside, I doubt I'd find tech books on tape all that useful. Without diagrams, code examples, etc., you lose quite a bit of the value, IMO.

      • by RulerOf (975607) on Wednesday March 11 2009, @02:07PM (#27155325)

        Only on Slashdot do you find someone who wants to listen to Natalie Portman talk SQL.

        SELECT * FROM Memes WHERE Reference LIKE '%Portman%' AND LIKE '%naked%' AND LIKE '%petrified%' ORDER BY SlashdotCommentScore, HotGrits;

        27,154,947 Rows Returned.

    • I think the emphasis here is on writing the best sql so you can write the best code. Removing unneeded iteration on either side can be a huge benefit. Repeated calls to a database can be expensive - in numerous ways - so I think they aim the reader towards a state where more work is done with less trips.

      I think that it is also safe to say that many of the tools they give for testing performance would be very useful in nailing down just where the issue is. It's not an issue of finding what works b

    • Re: (Score:3, Insightful)

      by he-sk (103163)

      That's possibly a VERY bad idea. Even with small queries it's possible to create huge intermediate result tables and loading all that data into your application will make it crash. And if that doesn't happen, breaking a complex SQL statement into separate parts robs the SQL query optimizer of useful information. Your code limits the choices for an optimum evaluation plan, but how close is your code to the optimum plan that can be achieved?

      Having said that, the optimizers can't work magic. I sometimes split

  • by puppetman (131489) on Wednesday March 11 2009, @01:23PM (#27154589) Homepage

    I have the misfortune of working with a database that is primarily a couple of tables with key-value pairs (not a traditional database model).

    There is only one column that can be indexed, and it has to be done with a full text index.

    Every once in a while, there is a discussion about moving this mess to something more traditional. I was excited to read the review on this book, but as I read through the review, it seemed like this was more of a "performance tuning guide".

    Re-factoring a database is a lot more involved - changing tables, stored procedures, maybe even the underlying database.

    The term Database Application is fuzzy and poorly defined. Is it the front end? The stored procedures? The database tables? I would consider a database application to be any part of the code that stores, retrieves, removes or modifies data stored in a database, and the entities that have been defined to store that data.

    Using that definition, this book is about tuning, not refactoring.

    • It discusses how to change client code - which is definitely not database tuning. There are database tuning techniques involved, but really it is much more than that. I tried to express that in the review but maybe I didn't do as well as I would have liked.

      Here is how Wikipedia defines refactoring, "Code refactoring is the process of changing a computer program's internal structure without modifying its external functional behavior or existing functionality. This is usually done to improve externa

      • Re: (Score:3, Informative)

        by stoolpigeon (454276) *

        Let me explain what I meant by "doesn't work any more". For example a query that originally took 30 seconds now take 3 hours. It still 'works' from a functional perspective but from a business perspective may have become completely useless. Refactoring can make it work again. I should have been more clear on what I meant there.

  • by Ukab the Great (87152) on Wednesday March 11 2009, @01:25PM (#27154655)

    I've found that the biggest issues with SQL applications (writing rich clients) is not in performance turning of server/sql but in dealing with ORM issues, where to draw the line between how much work the client does vs. how much the server does, reconciling changes made in memory with data in tables, concurrency, database architecture designed to cope in advance with poorly thought-out requirements you're given, etc. I'd hope that book on refactoring SQL *applications* would touch on these issues.

  • Shoot the developers (Score:2, Interesting)

    by kafros (896657)
    I am a developer, and my experience has shown that if you use one of: Oracle, SQLServer,
    PostgreSQL, DB2 and application performance is poor, 99% of the time it is poor design from our
    (developer's) side.

    Developers without good understanding of Relational Databases and SQL often produce problems that
    cannot be solved by indexes, or throwing transistors at them.

    It is so nice to see a "custom" made map implemented in the database using temporary tables instead of
    using the language's built-in map function
    • Re: (Score:2, Informative)

      I would cautiously agree with the developers being less educated on SQL than they should be. The trouble seems to be in the different mind-sets required to solve the application problems. SQL is a declarative language, and it operates on your set of data as described. The host language, for what of a better term, is C or Java or some other iterative language, and it operates on each individual member of the set, stepwise. If you primarily think "stepwise" or algorithmically, you're already framing your


    • Custom made map implementation? You mean a view? :)

      This is why I like our new company architecture model. Essentially the application developers only see a view. The view should closely match the "screen" of the application. The view is designed, and optimized by architects, and then, of course, all these views are available in our DBMS, managed by our DBA's. Never should developers make complex queries; they just query the view. Although some definitely have the touch, the majority routinely do th
    • Re: (Score:3, Interesting)

      by einhverfr (238914)

      You throw transistors at your developers? ;-)

      Actually I agree with you. One of the big wins on the LedgerSMB project was the approach of moving the all main queries into user defined functions (and then using them as named queries). One of the nice things about this approach is that dba-types (like myself) can address performance when it crops up rather than dealing with the ABOLUTELY horrendous approaches found in the SQL-Ledger codebase..... (Assembling queries as little bits of text strings, throwing

  • Use views (Score:3, Insightful)

    by bytesex (112972) on Wednesday March 11 2009, @01:27PM (#27154697) Homepage

    I think it's usually best to have views (whether with rows that are the result of code, or with a pure 'select' definition, or materialized ones) define what your application 'sees', so that you can always change the underlying datastructure. That way refactoring becomes a bit more easy.

    • I do something similar by using a 'facade' pattern to assemble several modules' output into one stored procedure.

    • Re: (Score:3, Informative)

      by bjourne (1034822)
      That's silly. When you change the data model you must change the views too. Then you could as well have changed how the application uses the database instead and avoid a whole layer of indirection. Plus, views are read-only so the client application still needs direct access to the tables to update data. Views are useful and very under appreciated, but not in the way you suggest.
        • Re: (Score:3, Interesting)

          by he-sk (103163)

          I have to agree. In DB theory we have learned that you should normalize your data for a good database design. However, materialized views can give HUGE performance gains, by eliminating multistep joins between tables. You can't built customs indexes for queries that have these joins when the index condition is not in adjacent tables and you always have to deal with large intermediate results.

          If the app is read-only and performance is critical the best strategy is to used materialized views built from norma

  • ...is probably a good read. He has a lively writing style; it kind of reminds me of Bertrand Meyer's "Object Oriented Software Construction". Anyhow, I've got both this book and Faroult's "The Art of SQL"... both are excellent.

  • In cases where the problem is query performance, I've had pretty good results with the techniques in "SQL Tuning" by Dan Tow.

    This, of course, only works if the rest of the database setup is more or less ok ;-)

  • by avandesande (143899) on Wednesday March 11 2009, @01:58PM (#27155181) Journal

    I have developed a design pattern using in memory data table objects that can satisfy the most complex requirements without using any dynamic code.
    It also allows the queries and business logic to be broken into discreet chunks that are easily optimized and debugged.

  • I Only Know Oracle (Score:5, Informative)

    by bloobamator (939353) on Wednesday March 11 2009, @09:59PM (#27161495)
    I only know Oracle but I've known it since version 5.0. Intimately. I haven't read the book but I read the review. Here are a few tips I've learned over the decades that you might find useful, just in case they aren't covered in the book:

    1) You have to establish a general rule of thumb for each production db whereby any one sql that consumes more than x% of the db resources needs to be tuned. The value of x varies from db to db. If it cannot be tuned below x% then it needs to be refactored.
    2) Learn to use stored outlines. If you can get them to work they will save your ass and make you look like a total hero.
    3) Never turn your back on the optimizer. Really. Even for simple queries, even with the deepest stats.
    4) Bind variables are a necessity for high-repetition sql. Bind variables are something you might want to avoid for reports queries for which the optimal plans depend on the user input values. This is because a sql's plan is cached along with it the first time it is parsed, and if you use bind variables then the first plan you get is the plan you will always get so long as the sql remains in the shared pool.
    (You can sometimes work around this issue by turning off bind variable peeking, but consider doing it on a per-session basis instead of changing it system-wide. Scary!)
    5) Nowadays a 32GB SGA is no big thing. Get yourselves a ton o' RAM and set up a keep pool in the buffer cache to pin your most important reporting indexes and tables. Partition your big reporting tables and maintain a sliding window of the most recent partition(s) in the keep pool.
    6) No sorting to-disk. Ever. If you cannot let the session have the PGA it needs to sort the query in memory then the SQL needs to be "refactored".
    7) Once you have eliminated most of the physical disk reads it then becomes all about the buffer gets (BG's). When disk reads are low the high-logical-BG queries immediately become the new top SQL. This is because logical BG's are all CPU and your db is now cpu-bound, which is right where you want it. So from this point it's back to item #1 and we prune and tune (thanks KG!)

    I could go on all day. Perhaps I should write a book?