Open Source Databases Revisited 160
pusakat writes "If you've been following performance comparisons of the different Open Source databases, Tim Perdue revisits PostgreSQL v. MySQL with production data from SourceForge and comes up with interesting results. This may be fodder for yet another 'my database is better than your database' exchange from both camps but the results are interesting anyway."
free software trickles up (Score:1)
Indeed, I'm working on a crisis for a customer who runs a large web site with three Linux/Apache/PHP app. servers hitting one database server (via ODBC) running our DBMS.
Guess which component is failing?
Here we go... why? (Score:1)
Re:Low end machine? (Score:1)
designing someone's page, setting up a router,
or fixing some lemming's (like yourself) machine because
"The browser crashes when they get online" bullshit.
My machine stays online all day. I don't bother hanging up the modem.
You are a complete bastard child, I would assume you are no
older than 13, living with your mommie. She fucked
some bum, and you have no idea who the hell your father is.
You know what? Im gonna go dig up your grandmother's grave
and let SIGNAL 11 have his way with her. In fact, ill make sure
he digs up your grandfather too.
ETRN x
Re:Low end machine? (Score:1)
your whole body.
ETRN x
Re:linux in manufacturing (Score:1)
"Fat, drunk, and stupid is no way to go through life."
Re:Your Mileage May Vary (Score:2)
Quite. Let's see how well these "open source" databases do in TPC-C [tpc.org] and -H [tpc.org].
Re:Aditional questions: (Score:1)
"Fat, drunk, and stupid is no way to go through life."
Re:16 queries on one page? (Score:1)
On another subject, did you drop all the users from blacknova.net? My user (tolket) is gone....
Interesting Info (Score:1)
Re:Interbase (Score:1)
The opensource Interbase is alive and well: At: http://firebird.sourceforge.net
http://www.ibphoenix.com
http://www.interbase2000.com
And in newslists at http://www.mers.com
Generally it's called Firebird, and Inprise don't want to know or link to us.
Re:16 queries on one page? (Score:2)
Laugh, I'm not in charge of that, though I'm one of the OpenSource developers.. Yes, the DB was completely flushed... You need to create a new character.. Read the forums.
-Michael
Re:16 queries on one page? (Score:1)
Your comment minimizes the performance issues. You can create a beautiful, 3rd normal form database and find that it is slow as hell when you try to read or write to the schema.
Most real-world databases require a whole lot more {tables}.
For a web site, pages need to return in 1-2 seconds, but if your 16 table join takes 1-2 seconds, you're in trouble because your response time now depends entirely on the network conditions.
Theoretical ideals of relational theory applied to databases are fun to talk about and design, but the real-world dictates some compromizes. Often, denormalization is the first compromize. This reduces the number of joins required to read read the database. It also speeds up writes because you have to issue fewer SQL statements.
Later.
"Fat, drunk, and stupid is no way to go through life."
Re:real-world choosings.... (Score:2)
Anyway, I'm not saying that Interbase is an Oracle or MS SQL killer. But for a lot of clients and a lot of applications it's a suitable choice, and the client saves a stack of money to boot. Add to that the fact that you can deploy it on Linux, and for cost conscious customers it's essentially a no-cost solution. Just buy the box, the rest is free.
If it goes beyond tinkering: Yes! (Score:1)
Your database software failed. I lost all my data, but I can at least sue you and get some money. My business might be ruined, but hey I can cash out from winning a law suit against you. Of course I have to pay the lawyers and might have no income while the suit goes on but who cares.
Trying to be cynical ? Probably. But your logic fails utterly: In a mission critical environment you sure as hell want a database vendor, who:
is likely to provide professional, contractual support on various levels
is not likely to go away next week
who can provide an annual report indicating the amount of $ invested in R&D
ultimately who is accountable.
I'm aware that you can purchase professional support with PostgreSQL (which is likely better then what the big 5 have to offer), but frankly, being responsible for a project I wouldn't bet my carreer on it.
I never said that I necessarily agree with this mindset, but from a (project) management's perspective it's understandable. If I invest $200 000 000 into a mission critical system, the possible savings by using open source products in core elements (database, middleware) is probably not an option.
Now for the rest:
Reliable Database How do you determine reliable ? For millions M$ Access is the idea of a reliable database. Tell you what: It's not! Further disks will and go inevitably bad. Even if you have redundant disk subsystems a logical error or a pointer going haywire may have been written to disk.
good backups We certainly don't have an argument here. However, if you can't afford down time or you can't afford even five minutes of lost data this might not be enough.
Security Of course, but it's incredible difficult to provide good security. Especially in a multi tiered environment. Granted, that has nothing to do with the fact if a database is open source or not.
Re:16 queries on one page? (Score:1)
On the other hand, a quick query on a single table might spend a fairly high percentage of the execution time in the parser, because the optimizer has few options to explore and the resulting execution is likely to be fast.
Preliminary tests by the person who put together the current test implementation of the query cache code reports about a 3x improvement in the execution of simple queries.
Here's a reason why your test of precompiled functions might not help as much as you'd expected: though the function itself is precompiled, the query you use to reference the function is not. The time required to parse and optimize this query might be enough to overcome the fact that the function itself is pre-compiled.
Also, you talk of "stored procedures". I don't use PG's stored SQL functions, only PL/pgSQL. Stored SQL functions might not be pre-compiled, I can't say one way or another.
I'm not sure exactly what you mean by "raw calls to table operations". I do know, though, that a PG plan consists of a tree of operations like "merge join", "indexed scan", etc. These are "table operations" and they're "raw", but they're complex table operations.
Re:large? (Score:1)
Re:yeah baby (Score:1)
like your post couldn't be dismissed as a troll from the title alone...
linux in manufacturing (Score:4)
you have a budget? what's that? *smirk* I'm lucky if i get $50 for a stack of CD-R's.
Re:Insertion of large text objects (Score:1)
There used to be an 8K limit on row-sizes.
The author of the article says its going away next release.
Re:Why the partisan fuss? (Score:3)
I think the lack of tools, row-level locking, transaction support, etc. in MySQL are a shame. Yet, people talk like its the best thing since sliced bread. This has lead me to discover two kinds of MySQL users:
Re:Insertion of large text objects (Score:1)
I noticed, though, that it would not understand insertion of large text object data within an sql INSERT command as opposed to MySQL.
Postgres supports Large Objects but I've never had to use them. In fact I'm not even sure how to use them. :-)
Re:Why even bother? (Score:1)
You're forgetting Informix which runs many enterprise-class installations, 24/7, OLTP, Tbytes of data, all that rot.
Re:Test Methods Allert (Score:2)
I didn't notice that. In fact, I noticed the opposite (which is exactly what was expected.)
What I noticed was that as you increased the number of clients, the total number of requests served increased, while the number of requests per client was lower - which is exactly what should happen - the performance decreases as the number of clients increases.
The problem is that you don't know how to read a graph.
Re:Insertion of large text objects (Score:2)
Re:large? (Score:1)
Secondly, Oracle IS object oriented. You can use eithe ra relational or object-oriented structure in Oracle.
Thirdly, I've never even heard of these DB's that you mention. Do they even support databases of this size?
Then use Interbase (Score:2)
Re:16 queries on one page? (Score:3)
Let me provide a real world example. Some years ago, it was my task to roll out RADIUS for an ISP's dialup network. The stock setup for the USR RADIUS server -- it was the only one I could get to talk to a SecurID server correctly -- was, shall we say, non-optimal. USR's stats said one (1) authenication per second when using postgres. Let's ignore the way they had postgres setup (fsync enabled) for the moment. The reason everything was so bloody slow was the database schema. Even though there was only 1 query against the USERS table, it had to return 5k. Each row of the USERS table contained everything you could ever imagine (well, that USR could imagine.) Each row consumed almost 5k of space.
After three months of testing, data collection, and analysis, I redesigned the mess. USR literally laughed at my design because it made between three and six queries depending on the user. They stopped laughing when I showed one of their engineers the setup actively handling over 100 authentications per second. The USERS table, holding 10,000 entries, was about 2M.
Re:oracle (Score:2)
But
The resulting lawsuit might not be terribly pleasant.
While you're free to do what you want, and while Oracle's not known for going after small-fry and seems to be much less obnoxious than MicroSoft in terms of license enforcement, folks should at least know that what you are proposing is a violation of your Oracle license.
At the risk of being redundant - deployment or other non-development, non-evaluation use = $$$.
Re:Your Mileage May Vary (Score:3)
actually, in the mysql fulltext search docs [mysql.com] the mysql guys do a decent job of detailing the _two_ options for querying the fulltext index. putting the "match (a,b) against ('words')" in the "select" list returns the relevance rank against all rows in the table, including those with zero relevance. if you put the match clause as part of the "where" clause then mysql returns only those rows with non-zero relevance, ordered by the relevance score (but the relevance score is not one of the return columns).
- mark
Re:real-world choosings.... (Score:2)
1. PostgreSQL does run under NT. It runs under cygwin, and I have no personal experience running it under NT, but it does run under NT (and, yes, I do mean the server, not just various clients).
2. In PostgreSQL, writers never block readers, so your statement to the contrary is a factual error. InterBase and PostgreSQL use a very similar paradigm for the storing of actual data, and InterBase's "Multi-Generational" stuff is equivalent to PostgreSQL's "Multi-Variate Concurrency Control", or "MVCC" for short).
Again, I'm not disparaging your use of InterBase, a fine product that hopefully will survive the current political machinations that surround it. But I can't let your misconceptions about PostgreSQL pass without comment.
Re:Your Mileage May Vary (Score:2)
The only use I've found for the full text search is in ordering items found by other means. Marginal utility, at best.
It would also be nice if they gave a little formula somewhere for the "relevance". What's it really measuring?
--
PostgreSQL Beta (Score:1)
Re:real-world choosings.... (Score:1)
I had not considered the cygwin option and I concluded that PG does not run on NT natively when I went to download a binary distribution. Have you tried it under cygwin? Is there a performance hit because PG would run in the cygwin emulation, rather than native?
I was unaware of the MVCC of PG because the NT part was the first issue, so I did not explore use of PG after I didn't find a binary NT version.
Thanks for the corrections!
"Fat, drunk, and stupid is no way to go through life."
Recent MySQL News at Boston Perl Mongers (Score:2)
The big news was that a re-worked version of the *Progress* database back-end would be turned into an open-source MySQL-compatible back-end called Genie. This would be pretty huge, as Progress is everything that people complain that MySQL isn't. Also, MySQL is adding (either in their next release or the one after that, depending on the feature):
* Subqueries
* A boolean type
* Foriegn key constraints
Great (Score:1)
BDB tables? (Score:2)
Re:Aditional questions: (Score:3)
What does HUGE mean? Gigabytes? Terabytes? Exebytes? Also, size is not the only metric involved in stability. Stability with a given number of records or fields, for instance. A 100 gigabyte database could have 10 records or a million records. Depends on how big each record is. Another metric is the number of simulatenous requests it can handle. One might work fine if it gets a 100 requests at a time, but chokes if it gets a 1000 requests at a time. A million record database might be only accessed by 2 clients at a time...size doesn't equal number of users.
Which of them is most likely to be able to recover withouth glich from an "impure" shutdown ?
Good point, but they're probably about the same in that regard (just taking an educated guess here).
Which will benefit more from a HUGE ram quantity or multiple cpus ?
Most well-written software will take advantage of as much RAM as it can get. The question would be which program is better at managing its own memory?
As far as mutiple CPUs go, programs have to be written specifically to support multiprocessing. On Linux or Windows NT, for the most part, this means writing the code to use multiple threads. I'm not sure if either of them are or are not, but that would be a very good point.
Re:July Article (Score:1)
Maybe I'm ignorant... (Score:1)
As I said, maybe I'm ignorant and missing something as I am new to this type of development...
Re:MySQL wins hands down. (Score:2)
It's not all myth, folks. Table locking sucks, page-level locking (Sybase) is much better and row-level locking (Oracle, Interbase, Postgres) much better still in high concurrency systems.
Re:What about other Database Tools? (Score:3)
What's www.phpbuilder.com Run? (Score:1)
interbase "open source" db (Score:1)
I haven't had time to evaluate it myself, either. Anybody out there used it?
Re:real-world choosings.... (Score:3)
Interbase has been released under a variant of the MPL that "protects" the Interbase name (IOW, only Interbase may release certified Interbase builds). Anyone may grab the code and use it whichever way they like.
Somebody mod this up (Score:2)
Another real-world observation (Score:2)
We switched over the Postgres in one swift maneuver, and it's been running great ever since. Speed was never in question; we just needed a database which wouldn't drop all its data one night while no one was looking.
Besides which, the ultra-cool pg_dump command makes backups a no-brainer.
Kudos to the Postgres team. And my company won't be dump enough to try to use proprietary software again in the future, that's for sure...
Re:Here we go... why? (Score:1)
There's still the question of scalability. How does it fare in that category?
Pg 7.x kicks ass! Re:linux in manufacturing (Score:2)
I've been using Pg since the 6.4 series of builds, and about a month and a half ago switched to 7.0.2. It is worth the upgrade IMHO, but of course since you're relying on it for critical business data you'll probably want to test it some first... ;-)
Oh, and if you don't have a budget how come you have a 600mhz alpha? 600 amd mhz is pretty cheap, but my impression is that 600 Alpha mhz are very much not... :-)
--
Re:Why even bother? (Score:1)
View the PRINTABLE page - one page (Score:1)
--
Re:open source MOBILE database (Score:2)
Re:open source MOBILE database (Score:3)
I'd love to use db.linux on a couple of projects but this limitation is a killer. The complete license is at License Terms [dbstar.org].
Your Mileage May Vary (Score:5)
As an example, I recently ran some tests that came up with the exact opposite results. My application is a large message tracking database. Query speed is secondary; insertion speed is critical. MySQL handles my test data set in 20 minutes; PostgreSQL takes over 3 hours (!). For this application, PostgreSQL is Right Straight Out.
Other notes:
1. PostgreSQL's tables took up roughly twice the space of MySQL's.
2. MySQL's lack of transactions is a real pain. I can, however, work around it (in this particular application, at least).
3. MySQL's "text indexing" is useless. The evaluation function returns every record that contains any of the search terms; there is no way I've found to require all search terms. No documentation, of course.
4. The latest beta of MySQL can use Berkeley DB [sleepycat.com] tables to get real transaction handling. Unfortunately, this is even slower than PostgreSQL.
Obvious conclusion: Run your own tests and draw your own conclusions.
--
Re:Why the partisan fuss? (Score:2)
I'm pretty certain few people actually think MySQL is a wonderful DBM that's the best thing since sliced bread. Most are keenly aware of its deficiencies but defend it because of two features that are invaluable to a successful open-source product:
There's no doubt that PostgreSQL is a good DBM, but MySQL will remain more popular until the support for it arrives.
Re:MySQL wins hands down. (Score:2)
thank you.
Re:16 queries on one page? (Score:3)
My site [half-empty.org] runs about 10-20, but thats only after caching lots of data in memory.
Re:Aditional questions: (Score:2)
Postgres uses multi-processing (each connection gets it's own process) with shared memory segments and a controlling/monitoring process (postmaster).
Thus if Mysql could do a better job of managing resources under heavy loads, then it would waste less physical memory. Additionally, I don't know how good Linux is with multi-threading under heavy loads. Can it faithfullly handle hundreds of potentially thousands of concurrent/competing threads? I've never seen any tests to show this... I'm aware that the default max processes/threads was hard-coded in the kernel to 512 at one point (I believe it requires a recompile to change). This might have something to do with mysql's crashing.. You'd have to go to single-user mode to max this puppy out. Additionally I don't know if it's possible for an app to determine how many more threads can be spawned by the OS.. If a thread fails to spawn, then you can corrupt the entire process.. In MP, if you fail to spawn a process, then you simply lose that connection.
In general, I'd prefer MP on a Linux system. It's a similar story with Apache. You get a lot of reliability, and not _that_ much memory waste when compiled correctly. Additionally, MP has the potential to span computers (as in a slasdot article on a multi-hosted extension to Linux)
-Michael
Re:16 queries on one page? (Score:2)
Re:What about other Database Tools? (Score:2)
Re:PostgreSQL Beta (Score:2)
Re:Your Mileage May Vary (Score:3)
Well now, that information is completely useless, because you didn't mention anything that I would need to replicate your result. You didn't mention the versions (or the identities, really) of the databases you used, the hardware, the operating system, anything very interesting about the application...
So what exactly were we supposed to glean from this? In particular, the point of the target article, as I understand it, was really "PostrgeSQL used to be horribly slow for our application, but when we had some issues with our current set-up, we re-tested the very newest version of PostgreSQL and the latest mySQL we could deal with, and, wow, things have changed."
Now, I don't personally either believe or disbelieve what you posted, but only because I have no idea what proposition I'm supposed to be interested in here, because you didn't tell us.
oracle (Score:2)
Re:Aditional questions: (Score:2)
Ah, "exebytes". That's the unit of choice for measuring the size of executables produced by that big & rich [microsoft.com] company, right?
Re:Aditional questions: (Score:2)
Stability from an impure shutdown? This is definitely a win for MySQL. Postgres has completely self-destructed for me a handful of times when the machine it was on hard-locked. The only recovery method is to create a fresh database from a backup dump.
Huge RAM and multiple CPUs? This is probably a function of the OS more than anything. MySQL is limited more simply because of its locking problems, right? So no matter how many CPUs you have - only one can update a given table at a time. That's a pretty important limitation if you have a database that has any updates simultaneously with selects.
load control (Score:4)
Why? The answer is simple: load controlling. Basically, as concurrency increases, the cost of locking and that of buffer pool misses become prohibitive and transactions begin to starve. So most "smart" DBMSes will queue a few transactions for later processing after a certain point.
Because MySQL uses such coarsely-grained locks (at the table level), the opportunity for concurrency is very low to start with; hence the decreased throughput and starvation.
If you're interested in this stuff, look at Jim Gray's seminal paper "Granularity of Locks and Degrees of Consistency in a Shared Database" and Chou and DeWitt's "An Evaluation of Buffer Management Strategies for Relational Database Systems." You should be able to find either from the ACM Digital Library or in an anthology (like Stonebraker's _Readings in Databse Systems_).
~wog
Re:16 queries on one page? (Score:3)
I remember reading on the topic of excessive joins in a Sybase Admin book once (excelent book on "practical" database, since it shows you exactly how one company solved all the DB problems). It essentially said that determining join order was actually a very slow process.. Exponential in fact.. Especially if your DB tries to determine which indexes to used based on statistics... The selection of indexes increases the number of possible variations incredibly. Sybas's solution was to only compare a select few of the possible join methods - which is a trade-off.. Alternatively, if you were to create a stored procedure, then which-ever method was best at creation time will be used from there-on-out...
Stored procedures - in Sybase at least - reduce a great deal of the overhead, especially in a 16-way join. Sadly, I don't know if Postgres does ANY pre-compiling of stored procedures. They have two main methods: straight named 'sql' statements, and 'psql' (a true stored procedure 'language'). My guess is that they do little more than pre-parse the statment (if even that). If this is the case, then using a stored procedure might even be slower than raw SQL since a call requires the parsing of the function name, then loading of the function into memory from a table, then the parsing of that statement (two more steps than raw processing). It is, however, necessary for triggers and rules.
Postgres (and I believe mysql) has raw-c extensions, but I really dislike these for general DBA operations (not to mention I haven't spent the time learning them).
As for the 16-table join, a general web page probably shouldn't do this (especially a heavily used page), but a database SHOULD at least let the DBA perform the ad-hoc query when trying to extract information. The only real solution is summary information, which often tends to be more trouble then their worth... Another solution is de-normalization (which makes me feel dirty)... Basically like what Blacknova.net uses... If you have 1 to N data, then just pick some upper bound N and include that many extra columns directly in your main table. Fast and efficient, provided that you choose the right N.
-Michael
Re:Test Methods Allert (Score:2)
What *I* noticed is that in one graph the number of request *per*second* increseased for PostgreSQL. And this does not look right, so the original poster is quite right.
> The problem is that you don't know how to read a graph.
Compare graphs on page 3 and page 4. Then come back explaining us how *you* read those graphs.
Interpreting profiling results is a very difficult task. In that case, I suspect that it is impossible without getting the hands on the test configuration to try various alternative loads. Results are (IMHO) too strange to be accepted without further investigation.
Cheers,
--fred
Re:Maybe I'm ignorant... (Score:4)
Anyway, the number of tables you have depends greatly on the problem you are trying to tackle. If you've used SourceForge, you'll see that there is a whole lot of data displayed on each page. I would be more worried if he would have been able to get all of this from a single database query (could you imagine a table that would do that - yuch)
Re:BDB tables? (Score:2)
If I'm running a query that is updating a number of financial records to place orders and proccess information, I don't want that process to bomb out half finished, I want it to either be completed or fail to complete. I want the security of knowing that there isn't random orphaned data hiding within the system.
MySQL is a great quick small end database for web applications, and I really hope they find a way to optimize thier transaction work, because I really liked working with it. Until then we will continue in our transaction dependant world.
Re:Why the partisan fuss? (Score:2)
The online PostgreSQL book is ~490 pages and the latest online MySQL manual I have is ~460, which is comparable. But MySQL also has an >700 page "New Riders" book and an >460 page O'Reilly book. Other than the online text, I haven't found any other PostgreSQL books, which is unfortunate.
Python, in particular, has PostgreSQL bindings but they don't yet conform to the 2.0 DB spec, which is a pain. In general, the PostgreSQL modules seem older and less maintained than the MySQL modules, which is both a cause and a symptom of support problems.
Again, I see nothing technically wrong with PostgreSQL. But if it's going to get more popular, I still believe it needs more support in terms of documentation and languages.
Windows version? && PostgreSQL vs. MSSQL (Score:2)
Thanks in advance!
Banner ads! (Score:2)
Re:Why the partisan fuss? (Score:2)
This is absolutely not true. There are three Python 2.0 DB compliant PostgreSQL drivers. PyGreSQL has just added 2.0 compliancy, PoPy is 2.0 compliant, and there's another one that also purports to be 2.0 compliant.
Apart from Momjian's excellent online manual there is still the existing PostgreSQL documentation. Everything from basic queries to creating your own types, to hacking out extensions in C are covered by this documentation. MySQL may have more books about it, but most of the information in these books will be duplicate.
The fact of the matter is that PostgreSQL has more features than MySQL, is more standards compliant (SQL-92), is released under a more liberal license (BSD style versus GPL), and it is now even starting to outperform MySQL for simple selects and joins.
The reasons for staying within MySQL's limited abilities are getting smaller and smaller. Why not just use PostgreSQL in the first place? Eventually every successful database project is going to want the features that PostgreSQL has, why start with MySQL and face the prospect of migration later.
Threads vs processes (Score:4)
One way to think of threads is as a solution to bloated processes. In operating systems like Solaris, which are designed to scale to 32 or 64 (or more) processors, the number of locks in the kernel is enormous. As a result, context switches and process creation time tend to be higher. In gross terms, you pay for the concurrency at high numbers of processors with poor performance at lower numbers of processors. This is not necessarily bad (especially if you plan to use 16 or more processors at some point) but it is just the way things are.
Linux, on the other hand, penalizes scalability for large numbers of processors in order to get much better performance with smaller numbers of processors. Linux does this because Linus doesn't believe that 16+ processor machines are common or sensical and that the kernel should be optimized for common (and sensical) cases.
Why do I mention all of this? Because Linux process creation times are slightly faster than Solaris thread creation times. Anyone who fetishizes multi-threading versus multi-processing doesn't really understand the difference between the two and when it really matters.
Processes are contexts of execution. Threads are contexts of execution that may share a memory space with other contexts of execution. These are fundamentally different things in some operating systems. In Linux, a thread is just a process that shares a memory space with another process. This is because processes are *fast* to create and switch to in Linux (optimized for the common case, remember?).
Anyway, I don't mean to bash maraist at all, just to point out a common set of misconceptions about these things. When considering these architectures, each database system must try to optimize across all of the operating systems they plan to be deployed on (just like the apache project, which i think was an excellent analogy).
real-world choosings.... (Score:4)
My manager liked the idea, but wanted info on other open source databases... I pointed him towards Tim's article (this was last week) and we've been on our merry way since. We now have both MySQL and PostgreSQL boxes in production for testing, and will make our determination later on!
Tim's article played a HUGE roll in our initial impression forming, and I would recommend it to all and any who are looking for info on open source databases!
-C
So much for MySQL (Score:2)
Re:Why even bother? (Score:3)
Your comments remind me of similar ones a few years back when we started using linux and ppl said "serious companies don't use linux" Nobody (except a small-minded minority of change-haters) is saying that now.
Test Methods Allert (Score:5)
The only explanation here is the caching behavior of the data base. Howerver, this also indicates that benchmarking databases is not a really trivial task, because exactly such effects must be considered for database benchmarks.
Further, besides a few graphs. The test says actually nothing different then: PostgreSQL improved strongly, while MySQL is a dog. Actually it says nothing at all.
Not that I mind the results. I worked with both databases briefly and believe that PostgreSQL is far closer to an industrial strength database (Lack of transaction control disqualifies MySQL for that in the first place). Nevertheless, I think the results really lack any significance.
Re:Too many for your simple mind? (Score:2)
Not so. You say "you are uniformed and hasty in your conclusions" and then you say stuff like I quoted above. I have extensive experience in large scale web site back end design and have been a lead engineer in several sites that have very heavy loads (such as large dot-coms and large e-commerce sites for more traditional "chimney" companies). I do know what I'm talking about here.
I know quite well that a site might need to pull data from a large number of data sources to build up a page (especially in personalizable portal-type front pages). This doesn't mean that you have to do a large amount of database queries, however. Caching is one crucial strategy if you're going for scalability and speed, for instance. Planning the site and page layout so that you don't need so much data for one page is also crucial. If there is no clear simple way to cache the data you need to produce something more complex then cache the result (yes, the HTML!) and only update the HTML every so often - as needed - based on some flag. Read the flag together with some other data that you must get from the database anyway. This is what I'm talking about.
I'm honestly majorly surprised that the performance test mentioned that the site used a page that required 16 queries. When I saw the performance results, I wasn't so surprised they were so poor. I'd like to see the source code for that page..
Aditional questions: (Score:4)
- Which of them is most likely to be able to recover withouth glich from an "impure" shutdown ?
- How much disk space can they waste in tablespaces with frequent spaces ?
- Which will benefit more from a HUGE ram quantity or multiple cpus ?
--
Re:16 queries on one page? (Score:2)
I don't agree with that statement. In cases where you need data using 16 different queries, no doubt many of those could have been cached for a huge speedup. I've written code for back ends for several major sites and I've seen very complex funcationality indeed where only a few queries were needed per page - most data was pulled from caches. Also, OO helps you write code where you can fetch data lazily. For instance in a discussion forum, you might keep all root-level messages headers in a cache, the 1000 last accessed bodies in a cache, 10000 last accessed headers in cache, parent-child relationships in cache.. In this way, you don't have to go to the database 16 times every time users come to the front page of the discussion forum and are served the exact same page - 30 times per second.
Like you said, computer time is now significantly cheaper than programmer time. Well, memory is also cheap. Keep stuff in memory and use good old abstract data types to get your data fast.. RDBMS's are very overused these days.
Re:16 queries on one page? (Score:2)
Re:16 queries on one page? (Score:2)
Commentary like this disgusts me, because the writer makes it clear he doesn't know anything about Postgres but merely presumes it's a poor implementation.
For the record, PL/pgSQL queries are pre-compiled, which is why there's no dynamic SQL capability in the language. You can create dynamic SQL from one of the other languages (pltcl, plperl) which exposes the SPI interface to the query engine to the user.
Work is under way to allow optional caching of individual query plans (i.e. by explicit statements), which allows one to avoid parsing and optimization overhead. When completed (7.2?) this will be great in the web environment.
I know little about Sybase so will avoid speculation about that RDBMS, might you be so kind as to avoid speculating about Postgres as well?
Re:16 queries on one page? (Score:3)
First, cache all the smaller tables, and do the translation (from say user_id to user_name) in an external hash (I know perl can easily do this; don't know about php). For static fields such as user-names, email addresses, etc, this is easy. Unfortunately this doesn't help when trying to find all users with email addresses going to aol and souce code that's over-due, etc. But it does solve the summary report problem.
Second is to break up the query into multiple operations.. You need an incredible amount of memory to do certain types of joins, and sometimes the sql-processor guesses the wrong method. Even though postgres's "explain" command told me it was doing hash-lookups for a user-name summary, I found that it was faster to first get all the data with user-id's, then perform a second search with
"select user_name from users where user_id in (x,y,z,....)"
So long as you don't have more than a couple dozen entries, this works nicely.. ESPECIALLY if you have thousands of rows of real-data with only one or two unique user-names. This was a nice-short term fix, but "clean" sql should all be done on a single statement.. Soo onto the third
Denormalize back to form 4 or 3. Namely, whenever you have a user-name, don't use a user-id.. Sure you consume 8 more bytes of memory per row, but it greatly speeds up searches. The method is to ONLY use ID's when you can not uniquely represent a primary key with a single column.. (such as first-name, last-name,...).. But in our computer world, "user-name" tends to be unique. "project-name" should be unique, "working-branch" should be unique.. In which case, at the design level, it's worth the extra consumed space.. Now I know that Mysql advocates (including me a year ago), cringe at the idea of keying off variable length strings.. For me, it was the idea that you can compare integers much faster than strings.. BUT, if the string is an index, then your sorting is going to be MUCH faster, strings distribute much better than auto-incremented integers (which tend to want to be linear). Additionally, the fact that you won't have to perform a join at all removes almost all of the performance problems.
When you don't use meta-data as primary keys, then you find that you only perform joins when you actually have relational data... That joins based on summary data almost always goes away.
Additionally, if your DB ever get's corrupt (say you lose a table), then you don't have to fret over "who the hell was user-id 155?". Given that Mysql doesn't have roll-backs, the potential for data corruption with meta-data is enormous (when you do both reads/writes).
-Michael
Re:Why the partisan fuss? (Score:3)
Re:16 queries on one page? (Score:3)
As for the other languges, perl, tcl, etc, those can only be less optimial forms, and are more for procedural operations, which has nothing to do with over-comming the 16 join problem. (With the exception maybe of hashed-lookups, but that's dangerous)
It's great to hear about cached query plans (I'd forgotten the term).
As for the problem with speculation.. If nobody were to speculate, and those in the know don't impart their knowledge, then has anyone gained anything? 'Qualified' speculation does exactly as here; Inciting those knowing specific facts to quality errors. My speculation _was_ based on emperical analysis, however, so it wasn't as if I was blindly stateing something.
But now, as a question, is the compiled sql stored in any more optimal fashion? Or does postgres use an interpreter (which does little more than tokenize). To me, a compiled sql statement means that its making raw calls to table operations..
Simple benchmarks that I've done haven't shown any performance increases with using stored procedures, but that might just be because the compilation / interpretation stages were insignificant with respect to the overall operation. The other interpretation is that little is actually gained by the compilation of the sql statement (at least until the cached query plans come around).
-Michael
Re:linux in manufacturing (Score:2)
What about other Database Tools? (Score:4)
In the open-source world you've got... Well you've got to custom code in some non-database-centric language, e.g.: C, C++, Java, Perl, etc.
Anyone want to start up an open-source alternative for an ETL tool? Target open-source databases as source and target systems. A usable open-source system would really raise the visibility of open source in the (hitherto) commercial-only data warehousing market.
16 queries on one page? (Score:3)
open source MOBILE database (Score:3)
Centura Software makes an open source mobile database software called db.star and db.linux
db.star [centurasoftware.com] is the first open source, embeddable data management solution for Information Appliances (IAs). "It is a high performance, small footprint, developer friendly database engine which facilitates the development of rich and powerful applications for popular IA platforms, such as the Palm® Computing Platform and Windows®-powered Pocket PC. "
Open source but not free.
db.linux [centurasoftware.com] is OSS Data Management for Information Appliances
Open source and royalty free.
Article summary: (Score:3)
But, as always in the case of database server choices, nothing replace real world test. You should test the various servers on *your* data.
Frankly, I found the numbers quoted in the article ridiculously low. But as we don't know te volumetry of the data, the hardware used, the database cache size, etc, etc, this boils down to 'someone found PostgreSQL better than MySQL for its usage'.
It is not uncommon to have orders of magnitude performance differences between databases. I would love to know how what a tuned ORACLE would get.
Cheers,
--fred
Yup... (Score:2)
Re:Aditional questions: (Score:2)
First you say that programs do not need to be specifically written to support multiprocessing and then you say that using multiple threads is the path of least resistance [to support multiprocessing.]
Actually, applications can run on SMP machines with no problem, but if they use only ONE process or thread (about the same thing on Linux, but very different things on Solaris, as a previous poster mentioned), then they will not be running on more than one processor, thus not taking advantage of multiprocessing... Yes, there will be SOME benefit when running other processes, but the application taken by itself (not counting other processes...) will not have any added advantage other than the fact that the proocessor it is running on will have more cycles free.
Re:real-world choosings.... (Score:2)
I could not use PostGreSQL because it does not run on NT (client only has NT server and does not want a Linux box for this system). I don't know much about PG, so I don't know how good it is.
I did not use MySQL even though I have run it on Linux and NT for these reasons:
Interbase may not have industrial strength for massive number of concurrent users, but this particular implementation is a data entry, low volume system. I cannot personally attest to the viability of Interbase in a large number of users.
Concerning the 'opennes' of Interbase, the source is released, there are no restrictions and this database is not going anywhere soon. The main problem is that Inprise's release of the source left a bad taste in the mouths of free software advocates. This issues have been remedied and they have delivered on their promise to release the source.
Check Interbase out, you'll be surprized.
"Fat, drunk, and stupid is no way to go through life."
Re:Your Mileage May Vary (Score:2)
Question is moot, anyway. We'll probably just stuff everything into a flat file and forget about "online response" (:-(.
--
Re:MySQL wins hands down. (Score:2)
Insertion of large text objects (Score:2)
Re:real-world choosings.... (Score:3)
I'm a bit confused about the whole deal, Interbase is now Open Source, which is nice, but is it free beer? The website (here [interbase.com] says it is freely downloadable and open source) but is it free as in beer?
Interbase is a nice database for small to mid range deployments. Fast, small, and rock solid. But nobody is getting behind it - the Interbase section on sourceforge [sourceforge.net] has no files released yet.
Any other
Strong data typing is for those with weak minds.
Why the partisan fuss? (Score:4)
I'm not saying the standard "MySQL's not a database because it doesn't support transactions and databases have to. ACID!" (although I'm sympathetic to that point of view, I don't think individual words like "database" are worth fighting over--If MySQL wants to call itself a database, fine). What I am saying is that Postgres was designed to be a full-fledged SQL92-compliant database with transactions and triggers and foreign keys and the whole lot. MySQL was designed to be a SQL-based front-end to a very fast database file format. These products are not the same and comparing them without agknowledging that seems foolish.
I've been very pleased by the speed improvements in Postgres recently (partisan testing aside, Postgres 7 really is much faster). I've also been impressed by feature additions in MySQL (although it still isn't close to what you would expect to get if you're coming from the Oracle or DB2 world). But both remain inherently died to their design goals. This isn't a bad thing, at all, because different projects need different products with different design goals.
I compare this to the (often senseless) comparisons of NetBSD, OpenBSD and FreeBSD. One is designed for portability, another for security (features be damned!) and the last for performance, features and multi-processing under (primarily) intel. Although they come from a common code base, they have obviously diverged in design goals. Instead of bashing one product or another (all of the *BSD's and the two databases discussed have *major* problems that are bashable) use the one that seems most appropriate to your needs.
Re:MySQL wins hands down. (Score:4)
Well, that depends on what you mean by "win." Open source is not a popularity contest. In the past, it may have been true that to ensure the continued existence of a cool project, it needed to have a chance of dominating the market at large. But nowadays, as long as there are enough developers interested in a project--oftentimes only because of its "technical superiority"--it can stay alive, grow and improve. If a project can attain real technical superiority in this way, I call this winning.
That being said, I think it should be remembered that for many simple web sites, MySQL was the right choice because its superior performance with a read-oriented setup. Of course, Postgresql people claim a read-oriented setup isn't a real database application at all, but having designed several such small sites, I really wouldn't use anything else. (Flat files?? What an unnecessary pain!)
Just my two cents . . .