Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
News

Open Source Databases Revisited 160

pusakat writes "If you've been following performance comparisons of the different Open Source databases, Tim Perdue revisits PostgreSQL v. MySQL with production data from SourceForge and comes up with interesting results. This may be fodder for yet another 'my database is better than your database' exchange from both camps but the results are interesting anyway."
This discussion has been archived. No new comments can be posted.

Open Source Databases Revisited

Comments Filter:
  • ppl said "serious companies don't use linux"

    Indeed, I'm working on a crisis for a customer who runs a large web site with three Linux/Apache/PHP app. servers hitting one database server (via ODBC) running our DBMS.

    Guess which component is failing?

  • Oh man, yet another case of Slashdot putting up a comparison that can't possibly be made, but it will generate a lot of traffic and bring down servers to make it look like this is the happening place when it's just a geek version of Jerry Springer. This survey has been an old story on the Oracle newsgroups for a week and it's not resolvable, just "We're better than you" followed by "no, we're better, biatch." If you use these systems, you know what happens... different products are work better under different circumstances, nothing new. You are starting up a website, have no money but lots of know-how and want to get something up and running until you get loaded with cash (okay, dream). Right, you download MySQL for nothing and get up and running. You are getting a huge number of hits from tons of users (row locks become a little bit of an issue, right?) and you have some money, you get set up with Oracle. Which is better? I hit a website (a newcomer site, not up and going for very long) Friday night to place an order and for the second weekend in a row (okay, it was a food delivery place - I have no life and order in, what's the surprise?) I was returned a MySQL error. But you know what? A week ago I got multiple ORA-00020, and ORA-00018 errors trying to access my work late at night (and guess whose product that is). Lot's of factors come in, not the least of which is the talent running the whole show. It seems to me like this is a big picture issue, and isn't it good for us all that there are free alternatives that get us through testing and maybe even up and running, but realistically, how far can something free take us? Then if our idea works, and the users come, we pony-up the dough for something else. That seems to work. So what does everyone think?
  • Either im in class, at home studying, restoring my truck,
    designing someone's page, setting up a router,
    or fixing some lemming's (like yourself) machine because
    "The browser crashes when they get online" bullshit.

    My machine stays online all day. I don't bother hanging up the modem.

    You are a complete bastard child, I would assume you are no
    older than 13, living with your mommie. She fucked
    some bum, and you have no idea who the hell your father is.

    You know what? Im gonna go dig up your grandmother's grave
    and let SIGNAL 11 have his way with her. In fact, ill make sure
    he digs up your grandfather too.


    ETRN x
  • Sorry kat, i have more brain in my pinky than you do in
    your whole body.


    ETRN x
  • You can compile postgres yourself and the compiler will optimize for your processor.

    "Fat, drunk, and stupid is no way to go through life."
  • Please note that this benchmark (and any benchmark, for that matter) applies only to the system and application that it's testing.

    Quite. Let's see how well these "open source" databases do in TPC-C [tpc.org] and -H [tpc.org].

  • Here's the specifications for Interbase: http://www.interbase.com/open/research/tech_specs. html [interbase.com] Maximum size of database: 32TB using multiple files; largest recorded InterBase database in production is over 200GB Maximum size of one file: 4GB on most platforms; 2GB on some platforms Maximum number of tables: 64K Tables Maximum size of one table: 32TB Maximum number of rows per table: 4G Rows Maximum row size: 64KB Maximum number of columns per table: Depends on the datatypes you use. (Example: 16,384 INTEGER (4 byte) values per row.) Maximum number of indexes per table: 64K indexes Maximum number of indexes per database: 4G indexes

    "Fat, drunk, and stupid is no way to go through life."
  • I remember reading on the topic of excessive joins in a Sybase Admin book once (excelent book on "practical" database, since it shows you exactly how one company solved all the DB problems)
    And what book was this? (I really want to read it).

    On another subject, did you drop all the users from blacknova.net? My user (tolket) is gone....

  • Is that the official release or is that still in beta ? I heard they plan to do it, but with those versions I worked with, didn't support it.
  • I copied the following message from http://www.phpbuilder.com/annotate/message.php3?id =1001760:
    The opensource Interbase is alive and well: At: http://firebird.sourceforge.net
    http://www.ibphoenix.com
    http://www.interbase2000.com
    And in newslists at http://www.mers.com
    Generally it's called Firebird, and Inprise don't want to know or link to us.
  • On another subject, did you drop all the users from blacknova.net? My user (tolket) is gone....


    Laugh, I'm not in charge of that, though I'm one of the OpenSource developers.. Yes, the DB was completely flushed... You need to create a new character.. Read the forums.

    -Michael
  • Actually, this probably means that the database was very well-designed.

    Your comment minimizes the performance issues. You can create a beautiful, 3rd normal form database and find that it is slow as hell when you try to read or write to the schema.

    Most real-world databases require a whole lot more {tables}.

    For a web site, pages need to return in 1-2 seconds, but if your 16 table join takes 1-2 seconds, you're in trouble because your response time now depends entirely on the network conditions.

    Theoretical ideals of relational theory applied to databases are fun to talk about and design, but the real-world dictates some compromizes. Often, denormalization is the first compromize. This reduces the number of joins required to read read the database. It also speeds up writes because you have to issue fewer SQL statements.

    Later.

    "Fat, drunk, and stupid is no way to go through life."

  • I've evaluated several free (as in I can make money with it without paying anyone a dime) DB servers for a small startup company, and Interbase so far looks by far the best. Not only are the features attractive, it also has more name recognition than MySQL or Posgres, and for some people that's important. We're using it particularly from Delphi apps, and support for that is great. It's amazing that the entire server can be configured to take less space than the BDE itself.

    Anyway, I'm not saying that Interbase is an Oracle or MS SQL killer. But for a lot of clients and a lot of applications it's a suitable choice, and the client saves a stack of money to boot. Add to that the fact that you can deploy it on Linux, and for cost conscious customers it's essentially a no-cost solution. Just buy the box, the rest is free.
  • Ok, pissing contest is on. Yet, I try to keep this civilized:

    Your database software failed. I lost all my data, but I can at least sue you and get some money. My business might be ruined, but hey I can cash out from winning a law suit against you. Of course I have to pay the lawyers and might have no income while the suit goes on but who cares.

    Trying to be cynical ? Probably. But your logic fails utterly: In a mission critical environment you sure as hell want a database vendor, who:

    is likely to provide professional, contractual support on various levels

    is not likely to go away next week

    who can provide an annual report indicating the amount of $ invested in R&D

    ultimately who is accountable.

    I'm aware that you can purchase professional support with PostgreSQL (which is likely better then what the big 5 have to offer), but frankly, being responsible for a project I wouldn't bet my carreer on it.

    I never said that I necessarily agree with this mindset, but from a (project) management's perspective it's understandable. If I invest $200 000 000 into a mission critical system, the possible savings by using open source products in core elements (database, middleware) is probably not an option.

    Now for the rest:

    Reliable Database How do you determine reliable ? For millions M$ Access is the idea of a reliable database. Tell you what: It's not! Further disks will and go inevitably bad. Even if you have redundant disk subsystems a logical error or a pointer going haywire may have been written to disk.

    good backups We certainly don't have an argument here. However, if you can't afford down time or you can't afford even five minutes of lost data this might not be enough.

    Security Of course, but it's incredible difficult to provide good security. Especially in a multi tiered environment. Granted, that has nothing to do with the fact if a database is open source or not.

  • Well, the effect of caching query plans, or pre-parsing or pre-compilation will depend heavily on the query. For a query doing many joins that return little data using tables with appropriate indices, optimization time most likely will be an expensive component of the total time of query execution.

    On the other hand, a quick query on a single table might spend a fairly high percentage of the execution time in the parser, because the optimizer has few options to explore and the resulting execution is likely to be fast.

    Preliminary tests by the person who put together the current test implementation of the query cache code reports about a 3x improvement in the execution of simple queries.

    Here's a reason why your test of precompiled functions might not help as much as you'd expected: though the function itself is precompiled, the query you use to reference the function is not. The time required to parse and optimize this query might be enough to overcome the fact that the function itself is pre-compiled.

    Also, you talk of "stored procedures". I don't use PG's stored SQL functions, only PL/pgSQL. Stored SQL functions might not be pre-compiled, I can't say one way or another.

    I'm not sure exactly what you mean by "raw calls to table operations". I do know, though, that a PG plan consists of a tree of operations like "merge join", "indexed scan", etc. These are "table operations" and they're "raw", but they're complex table operations.
  • Sorry, Mr. AC, but ORacle 8i definately, and without a doubt, is Object Oriented. Read up, bucko.

  • This way, Slashdot can within hours be infected by thousands of lenghty fps (although not all of them, obviously, a fp in the original meaning of the word), none of which could be easily be dismissed as a troll without first reading the entiry crappy post.

    like your post couldn't be dismissed as a troll from the title alone...

  • by SuperQ ( 431 ) on Monday November 13, 2000 @04:44AM (#627816) Homepage
    Linux is great for manufacturing.. I work for a mid-sized auto remanufacutuer, and we use postgresql all over the place, to do inventory, and core checkin. postgres has made huge strides in the last few years.. we started out with an old pentium running 6.3 postgres.. it was slow as dirt.. we are currently running 6.5.3, on a 600mhz alpha system.. and will probably move to 7 one of these days.

    you have a budget? what's that? *smirk* I'm lucky if i get $50 for a stack of CD-R's.
  • There used to be an 8K limit on row-sizes.

    The author of the article says its going away next release.

  • by Trinition ( 114758 ) on Monday November 13, 2000 @04:51AM (#627818) Homepage
    Don't forget row-level locking. Last I heard, MySQL only supported tabel-level locking. Someone was supposed to be adding it to their own flavor of MySQL, but not the main package.

    I think the lack of tools, row-level locking, transaction support, etc. in MySQL are a shame. Yet, people talk like its the best thing since sliced bread. This has lead me to discover two kinds of MySQL users:

    1. The first-time DB user who jumped into MySQL because its free. They sing the praises because they like OpenSource, the popularity of MySQL and they don't know any better.
    2. The experienced DB-user who knows of MySQL's shortcomings and have determined that those shortcomings are not significant compared to the cost of a DB that does have those features -- at least for their purposes.
  • I noticed, though, that it would not understand insertion of large text object data within an sql INSERT command as opposed to MySQL.

    Postgres supports Large Objects but I've never had to use them. In fact I'm not even sure how to use them. :-)

  • Of course, any serious, enterprise-class projects use Oracle or DB2. I don't think that that's even a question.

    You're forgetting Informix which runs many enterprise-class installations, 24/7, OLTP, Tbytes of data, all that rot.

  • by Anonymous Coward
    Did anybody else notice that PostgreSQL actually improved performance with more concurrent users ?

    I didn't notice that. In fact, I noticed the opposite (which is exactly what was expected.)

    What I noticed was that as you increased the number of clients, the total number of requests served increased, while the number of requests per client was lower - which is exactly what should happen - the performance decreases as the number of clients increases.

    The problem is that you don't know how to read a graph.
  • Try looking here [postgresql.org]. Version 7.1 should allow you to insert large (>8kB) amounts of text without having to use the large object interface (i.e. a simple INSERT/SELECT/UPDATE will do). If you're planning on implementing large object storage under PHP, read here [php.net] about the pg_lo* functions, it has some useful examples.
  • That's bullshit. First of all, jsut because you're doing open source stuff, and have never even seen a real enterprise app, don't tell me that one terrabyte is 'enormous'. It's fairly standard. I've worked on several projects that had databases this big.

    Secondly, Oracle IS object oriented. You can use eithe ra relational or object-oriented structure in Oracle.

    Thirdly, I've never even heard of these DB's that you mention. Do they even support databases of this size?

  • It's got both stored procedures and triggers. Some people I know that moved from Interbase to MSSQL have liked both features better on Interbase. Of course, MSSQL comes with a lot of eye candy such as Enterprise Mangler. Also, a lot of people consider Interbase one of the most rock solid db servers out there. I'm running one setup at home using Interbase 6 on RedHat 6.2 on a 486DX2/66 with 32MB RAM. Performance is quite decent for 3-5 users. The limiting factor is the slow-ass drive in there (250MB IDE). More RAM and a faster drive would grease it up.
  • by Cramer ( 69040 ) on Monday November 13, 2000 @09:28AM (#627825) Homepage
    Not necessarily. Database design is one of a very few true black arts (read: serious voodoo.) The number of queries alone isn't a problem. The layout of the data, indexing, number of queries, complexity of queries, and the volume of returned data all factor in.

    Let me provide a real world example. Some years ago, it was my task to roll out RADIUS for an ISP's dialup network. The stock setup for the USR RADIUS server -- it was the only one I could get to talk to a SecurID server correctly -- was, shall we say, non-optimal. USR's stats said one (1) authenication per second when using postgres. Let's ignore the way they had postgres setup (fsync enabled) for the moment. The reason everything was so bloody slow was the database schema. Even though there was only 1 query against the USERS table, it had to return 5k. Each row of the USERS table contained everything you could ever imagine (well, that USR could imagine.) Each row consumed almost 5k of space.

    After three months of testing, data collection, and analysis, I redesigned the mess. USR literally laughed at my design because it made between three and six queries depending on the user. They stopped laughing when I showed one of their engineers the setup actively handling over 100 authentications per second. The USERS table, holding 10,000 entries, was about 2M.
  • Uhhh...you can use Oracle 8i for DEVELOPMENT for free for as long as you want, or use an EVALUATION copy for 30 days.

    But ... when you deploy, you must pay. Read your licence, please. While this wouldn't bother me a lot, it might bother Larry's lawyer's, and if your resulting site's visible I suspect they'll notice. Since you've posted your URL on slashdot, the odds that they'll notice may be substantially greater than it was until you posted :)

    The resulting lawsuit might not be terribly pleasant.

    While you're free to do what you want, and while Oracle's not known for going after small-fry and seems to be much less obnoxious than MicroSoft in terms of license enforcement, folks should at least know that what you are proposing is a violation of your Oracle license.

    At the risk of being redundant - deployment or other non-development, non-evaluation use = $$$.
  • by bracher ( 33965 ) on Monday November 13, 2000 @09:44AM (#627827)
    3. MySQL's "text indexing" is useless. The evaluation function returns every record that contains any of the search terms; there is no way I've found to require all search terms. No documentation, of course.

    actually, in the mysql fulltext search docs [mysql.com] the mysql guys do a decent job of detailing the _two_ options for querying the fulltext index. putting the "match (a,b) against ('words')" in the "select" list returns the relevance rank against all rows in the table, including those with zero relevance. if you put the match clause as part of the "where" clause then mysql returns only those rows with non-zero relevance, ordered by the relevance score (but the relevance score is not one of the return columns).

    - mark

  • While InterBase is a fine choice (the OpenACS [openacs.org] project intends to port to InterBase if the politics surrounding it clear up, as well as continue to support PostgreSQL), let me clear up two misperceptions about PostgreSQL that you apparently hold:

    1. PostgreSQL does run under NT. It runs under cygwin, and I have no personal experience running it under NT, but it does run under NT (and, yes, I do mean the server, not just various clients).

    2. In PostgreSQL, writers never block readers, so your statement to the contrary is a factual error. InterBase and PostgreSQL use a very similar paradigm for the storing of actual data, and InterBase's "Multi-Generational" stuff is equivalent to PostgreSQL's "Multi-Variate Concurrency Control", or "MVCC" for short).

    Again, I'm not disparaging your use of InterBase, a fine product that hopefully will survive the current political machinations that surround it. But I can't let your misconceptions about PostgreSQL pass without comment.

  • But "nonzero relevance" doesn't mean "contains all terms"! Go back to the docs and take a careful look at the examples they give.

    The only use I've found for the full text search is in ordering items found by other means. Marginal utility, at best.

    It would also be nice if they gave a little formula somewhere for the "relevance". What's it really measuring?

    --
  • Does anyone know when the PostgreSQL 7.1 beta is supposed to be released?
  • PostgreSQL does run under NT.

    I had not considered the cygwin option and I concluded that PG does not run on NT natively when I went to download a binary distribution. Have you tried it under cygwin? Is there a performance hit because PG would run in the cygwin emulation, rather than native?

    I was unaware of the MVCC of PG because the NT part was the first issue, so I did not explore use of PG after I didn't find a binary NT version.

    Thanks for the corrections!

    "Fat, drunk, and stupid is no way to go through life."

  • Well, I just recently attended the Boston.pm group meeting which this time around was hosted by NuSphere. These are the folks that are doing MySQL modifications for their release of MySQL/Perl/Apache and PHP.

    The big news was that a re-worked version of the *Progress* database back-end would be turned into an open-source MySQL-compatible back-end called Genie. This would be pretty huge, as Progress is everything that people complain that MySQL isn't. Also, MySQL is adding (either in their next release or the one after that, depending on the feature):

    * Subqueries
    * A boolean type
    * Foriegn key constraints

  • by djweis ( 4792 )
    It's good to see that Postgres has shown you can include real RDBMS features and still be faster than MySQL.
  • Was the author using BDB tables? If so, that's probably the problem. MySQL allows you to use BDB tables, which support transactions, but they are much slower. It is also possible that transactions, which are somewhat new to MySQL, are not yet well optimized. Either optimize your database to make transactions unnecessary, or use postgresql, at least for now. Why MySQL made the mistake of supporting transactions, I will never understand.
  • Which of them is most stable (with HUGE databases) ?

    What does HUGE mean? Gigabytes? Terabytes? Exebytes? Also, size is not the only metric involved in stability. Stability with a given number of records or fields, for instance. A 100 gigabyte database could have 10 records or a million records. Depends on how big each record is. Another metric is the number of simulatenous requests it can handle. One might work fine if it gets a 100 requests at a time, but chokes if it gets a 1000 requests at a time. A million record database might be only accessed by 2 clients at a time...size doesn't equal number of users.

    Which of them is most likely to be able to recover withouth glich from an "impure" shutdown ?

    Good point, but they're probably about the same in that regard (just taking an educated guess here).

    Which will benefit more from a HUGE ram quantity or multiple cpus ?

    Most well-written software will take advantage of as much RAM as it can get. The question would be which program is better at managing its own memory?

    As far as mutiple CPUs go, programs have to be written specifically to support multiprocessing. On Linux or Windows NT, for the most part, this means writing the code to use multiple threads. I'm not sure if either of them are or are not, but that would be a very good point.
  • What are you talking about? The linked article has a date of 20001112. At the top of this (new) aticle, the guy refers to a previous article he wrote back in July...
  • but how does SourceForge function when it has pages with 16 queries, crashes with 30 simultaneous clients, and serves .77 pages/s with only 5 clients? I'm no web/database guru but that kind of performance seems crazy. I guess those projects can't be all that busy or the site would be locked-up all the time? or what?? Judging by the performance of this column at phpbuilder.com maybe that site has some performance issues as well. I guess my question is, should I be using these sites to learn PHP/Database programming? Do these performance problems show actual problems with the database server or with the database schema and program designer(s)?

    As I said, maybe I'm ignorant and missing something as I am new to this type of development...
  • The link to the supposed IBM study returns an error. Which is too bad, because I'd love to critique the study - Tim Perdue's benchmarking work isn't the first to show that MySQL's table locking paradigm breaks down under high load. Remember that he's been studying alternatives because SourceForge, using MySQL, has had problems in this area. And remember that Slashdot has helped fund the integration of the Berkeley DB backend and MySQL for the same reason.

    It's not all myth, folks. Table locking sucks, page-level locking (Sybase) is much better and row-level locking (Oracle, Interbase, Postgres) much better still in high concurrency systems.
  • by ichimunki ( 194887 ) on Monday November 13, 2000 @05:20AM (#627839)
    I don't know about MySQL, but from what I can tell psql for postgreSQL is fairly full featured in terms of wrapping SQL in functional language-- and comparing the minimal work I've done in GUI DB tools to the experiences I've had working in command line or Perl with pgsql, I'd take the non-gui process any day, it's much more direct, flexible, and doesn't suffer from the same tool-specific learning curve (i.e. solid Perl + solid SQL = extremely portable skill, whereas any of the listed commercial tools may or may not be portable-- this is important for both ramp-up time on new experiences/jobs/whatever, and for personal development or career building). As for the tools you mention, do any of these function via ODBC, and therefore have the capability to connect to any database with drivers? Not to say that you don't have a valid point, I'm just wondering if there isn't a good reason why the Free Software types haven't embraced GUI and ungeek-friendly tools...
  • I wonder what www.phpbuilder.com uses, Postgres or MySQL, because right now, its slow as molasses.
  • A while ago, interbase [interbase.com] was brought to my attention as a commercial grade, free, "OS" database. Their licence, the IPL [interbase.com], is prety non-restrictive, but I don't know how compatible it is with the GNU/GPL. Any opinions on this?

    I haven't had time to evaluate it myself, either. Anybody out there used it?

  • by bugger ( 101595 ) on Monday November 13, 2000 @05:23AM (#627842)
    Interbase is free, it is not restricted by the GPL.

    Interbase has been released under a variant of the MPL that "protects" the Interbase name (IOW, only Interbase may release certified Interbase builds). Anyone may grab the code and use it whichever way they like.
  • This gets right at the heart of why good Oracle DBAs often pull salaries in excess of $150K. At the end of the day, speed and reliability count for a lot more than other peoples' preconceptions.
  • We were using Informix at my workplace for a while; in less than a month the thing completely "melted down" and dropped all of our data. The suits were in a panic.

    We switched over the Postgres in one swift maneuver, and it's been running great ever since. Speed was never in question; we just needed a database which wouldn't drop all its data one night while no one was looking. :)

    Besides which, the ultra-cool pg_dump command makes backups a no-brainer.

    Kudos to the Postgres team. And my company won't be dump enough to try to use proprietary software again in the future, that's for sure...
  • I stand corrected. That's one of the things I didn't like about it when checking things out earlier and must have missed the subsequent announcement. Thanks.

    There's still the question of scalability. How does it fare in that category?

  • I've been using Pg since the 6.4 series of builds, and about a month and a half ago switched to 7.0.2. It is worth the upgrade IMHO, but of course since you're relying on it for critical business data you'll probably want to test it some first... ;-)

    Oh, and if you don't have a budget how come you have a 600mhz alpha? 600 amd mhz is pretty cheap, but my impression is that 600 Alpha mhz are very much not... :-)


    --

  • That may be correct. I don't remember off-hand which product it sprang from but they've taken vastly divergent paths since then.
  • Well, the printable version [phpbuilder.com] is one page, read it.

    -- .sig --
  • <rant mode="zealot">That license is the most insanely tortured piece of legalese I've seen in a long time. How is it useful to call something Open Source when, after reading the license, I have almost no clue what I'm allowed to do with the software? Is this not proof that Open Source is a phrase that is too vague to be useful, since so many seem to feel the need to write these kind of "Open Source" licenses that confuse the issue of user freedom to a point beyond which it becomes almost unrecognizable?</rant>
  • by RocketJeff ( 46275 ) on Monday November 13, 2000 @05:31AM (#627851) Homepage
    db.linux is OSS Data Management for Information Appliances
    Open source and royalty free.
    Except that this is only true when using it on an 'Open Source Operating System.' This means that your project can't use db.linux if you ever want it ported to Solaris, AIX, or Windows - even if your software is Free/Open.

    I'd love to use db.linux on a couple of projects but this limitation is a killer. The complete license is at License Terms [dbstar.org].

  • by StormyMonday ( 163372 ) on Monday November 13, 2000 @05:32AM (#627852) Homepage
    Please note that this benchmark (and any benchmark, for that matter) applies only to the system and application that it's testing. Database applications vary so widely that it's very difficult to get any meaningful numbers outside of very specific areas.

    As an example, I recently ran some tests that came up with the exact opposite results. My application is a large message tracking database. Query speed is secondary; insertion speed is critical. MySQL handles my test data set in 20 minutes; PostgreSQL takes over 3 hours (!). For this application, PostgreSQL is Right Straight Out.

    Other notes:

    1. PostgreSQL's tables took up roughly twice the space of MySQL's.

    2. MySQL's lack of transactions is a real pain. I can, however, work around it (in this particular application, at least).

    3. MySQL's "text indexing" is useless. The evaluation function returns every record that contains any of the search terms; there is no way I've found to require all search terms. No documentation, of course.

    4. The latest beta of MySQL can use Berkeley DB [sleepycat.com] tables to get real transaction handling. Unfortunately, this is even slower than PostgreSQL.

    Obvious conclusion: Run your own tests and draw your own conclusions.

    --
  • I think the lack of tools, row-level locking, transaction support, etc. in MySQL are a shame. Yet, people talk like its the best thing since sliced bread.

    I'm pretty certain few people actually think MySQL is a wonderful DBM that's the best thing since sliced bread. Most are keenly aware of its deficiencies but defend it because of two features that are invaluable to a successful open-source product:

    1. It is well-documented. MySQL has copious amounts of online documentation (much more than PostgreSQL, as I recall) and at least two large books (one if which is quite informative). The importance of good documentation cannot be overstated.
    2. It interoperates well. MySQL has bindings for about a zillion different languages, whereas PostgreSQL is still catching up. This gives MySQL the appearance of being better supported, and in the world of open-source, the better supported product will win out over the technically superior one.

    There's no doubt that PostgreSQL is a good DBM, but MySQL will remain more popular until the support for it arrives.

  • the url you posted is wrong, please post the correct URL, I do like to read IBM's report.
    thank you.

  • by nebby ( 11637 ) on Monday November 13, 2000 @05:53AM (#627856) Homepage
    I'm pretty sure k5 sets a limit of <100 queries on a page before they start worrying about performance. At least that's what Rusty said if I remember correctly. I'm guessing they probably run about 10-20 queries on most pages, however.

    My site [half-empty.org] runs about 10-20, but thats only after caching lots of data in memory.
  • Mysql uses multi-threading (each connection gets it's own thread), but I've read slashdot posted benchmarks that show Mysql failing after so many threads are spawned via concurrent connections(I think 60). Perhaps they're using some finite resources and the thread spawner was not using semephores or some other such resource control.

    Postgres uses multi-processing (each connection gets it's own process) with shared memory segments and a controlling/monitoring process (postmaster).

    Thus if Mysql could do a better job of managing resources under heavy loads, then it would waste less physical memory. Additionally, I don't know how good Linux is with multi-threading under heavy loads. Can it faithfullly handle hundreds of potentially thousands of concurrent/competing threads? I've never seen any tests to show this... I'm aware that the default max processes/threads was hard-coded in the kernel to 512 at one point (I believe it requires a recompile to change). This might have something to do with mysql's crashing.. You'd have to go to single-user mode to max this puppy out. Additionally I don't know if it's possible for an app to determine how many more threads can be spawned by the OS.. If a thread fails to spawn, then you can corrupt the entire process.. In MP, if you fail to spawn a process, then you simply lose that connection.

    In general, I'd prefer MP on a Linux system. It's a similar story with Apache. You get a lot of reliability, and not _that_ much memory waste when compiled correctly. Additionally, MP has the potential to span computers (as in a slasdot article on a multi-hosted extension to Linux)

    -Michael
  • Actually, this probably means that the database was very well-designed. I have seen way too many databases that consisted of one to three tables. Most real-world databases require a whole lot more. Maybe you should also look at the page he is talking about before you comment. That page has a lot of functionality.
  • Since you bring this up, you seem to bring it up as if you would like to use it. Open and free software come as a result of programmers wanting to scratch their itch, if you have this itch, why don't you start the project? What are you waiting for? You can do it, even if you are not a great coder, start something, look at slashdot, look at where it is, it didn't come from a great codebase, but it is here because it was started. Scratch that itch! :)

  • We hope to go beta in the next month, and final around January/February.
  • by King Babar ( 19862 ) on Monday November 13, 2000 @06:07AM (#627879) Homepage
    Please note that this benchmark (and any benchmark, for that matter) applies only to the system and application that it's testing. Database applications vary so widely that it's very difficult to get any meaningful numbers outside of very specific areas.

    As an example, I recently ran some tests that came up with the exact opposite results. My application is a large message tracking database. Query speed is secondary; insertion speed is critical. MySQL handles my test data set in 20 minutes; PostgreSQL takes over 3 hours (!). For this application, PostgreSQL is Right Straight Out.

    Well now, that information is completely useless, because you didn't mention anything that I would need to replicate your result. You didn't mention the versions (or the identities, really) of the databases you used, the hardware, the operating system, anything very interesting about the application...

    So what exactly were we supposed to glean from this? In particular, the point of the target article, as I understand it, was really "PostrgeSQL used to be horribly slow for our application, but when we had some issues with our current set-up, we re-tested the very newest version of PostgreSQL and the latest mySQL we could deal with, and, wow, things have changed."

    Now, I don't personally either believe or disbelieve what you posted, but only because I have no idea what proposition I'm supposed to be interested in here, because you didn't tell us.

  • You can get Oracle 8i for free from oracle, and while its a bear to setup, I've found it to be faster than PostGRE 7.0 in a lot of ways ... of course, you'll make up for that with all the money you'll spend in training costs ;).
  • What does HUGE mean? Gigabytes? Terabytes? Exebytes?
    Ah, "exebytes". That's the unit of choice for measuring the size of executables produced by that big & rich [microsoft.com] company, right? :^) The SI prefix you're looking for is exa. On the other hand, perhaps exe is handy as a "computerization", along the lines of the "K" for 1024 in KB (SI uses "k" for 10^3)... More information about SI prefixes is available here [nist.gov]. Or, maybe your fingers slipped and I'm just being more nitpicking than usual. ;^)
  • Stable on huge databases? Geocrawler has 10GB of text in a giant table and runs with only 750MB RAM on a single server. Since updates/inserts/deletes can happen simultaneously, geocrawler does not shut down when new emails arrive.

    Stability from an impure shutdown? This is definitely a win for MySQL. Postgres has completely self-destructed for me a handful of times when the machine it was on hard-locked. The only recovery method is to create a fresh database from a backup dump.

    Huge RAM and multiple CPUs? This is probably a function of the OS more than anything. MySQL is limited more simply because of its locking problems, right? So no matter how many CPUs you have - only one can update a given table at a time. That's a pretty important limitation if you have a database that has any updates simultaneously with selects.

  • by woggo ( 11781 ) on Monday November 13, 2000 @06:10AM (#627895) Journal
    PostgreSQL increases total throughput with an increased number of clients, although it does not do so linearly. That's not the same as "increased performance". Think "scale-up", not "speed-up".

    Why? The answer is simple: load controlling. Basically, as concurrency increases, the cost of locking and that of buffer pool misses become prohibitive and transactions begin to starve. So most "smart" DBMSes will queue a few transactions for later processing after a certain point.

    Because MySQL uses such coarsely-grained locks (at the table level), the opportunity for concurrency is very low to start with; hence the decreased throughput and starvation.

    If you're interested in this stuff, look at Jim Gray's seminal paper "Granularity of Locks and Degrees of Consistency in a Shared Database" and Chou and DeWitt's "An Evaluation of Buffer Management Strategies for Relational Database Systems." You should be able to find either from the ACM Digital Library or in an anthology (like Stonebraker's _Readings in Databse Systems_).

    ~wog

  • Essentially what that test showed was that mysql had to lock all those tables in order to do the join.. And that with Postgres's versioning system, it was a trivial matter.

    I remember reading on the topic of excessive joins in a Sybase Admin book once (excelent book on "practical" database, since it shows you exactly how one company solved all the DB problems). It essentially said that determining join order was actually a very slow process.. Exponential in fact.. Especially if your DB tries to determine which indexes to used based on statistics... The selection of indexes increases the number of possible variations incredibly. Sybas's solution was to only compare a select few of the possible join methods - which is a trade-off.. Alternatively, if you were to create a stored procedure, then which-ever method was best at creation time will be used from there-on-out...

    Stored procedures - in Sybase at least - reduce a great deal of the overhead, especially in a 16-way join. Sadly, I don't know if Postgres does ANY pre-compiling of stored procedures. They have two main methods: straight named 'sql' statements, and 'psql' (a true stored procedure 'language'). My guess is that they do little more than pre-parse the statment (if even that). If this is the case, then using a stored procedure might even be slower than raw SQL since a call requires the parsing of the function name, then loading of the function into memory from a table, then the parsing of that statement (two more steps than raw processing). It is, however, necessary for triggers and rules.

    Postgres (and I believe mysql) has raw-c extensions, but I really dislike these for general DBA operations (not to mention I haven't spent the time learning them).

    As for the 16-table join, a general web page probably shouldn't do this (especially a heavily used page), but a database SHOULD at least let the DBA perform the ad-hoc query when trying to extract information. The only real solution is summary information, which often tends to be more trouble then their worth... Another solution is de-normalization (which makes me feel dirty)... Basically like what Blacknova.net uses... If you have 1 to N data, then just pick some upper bound N and include that many extra columns directly in your main table. Fast and efficient, provided that you choose the right N.

    -Michael
  • > What I noticed was that as you increased the number of clients, the total number of requests served increased, while the number of requests per client was lower - which is exactly what should happen - the performance decreases as the number of clients increases.
    What *I* noticed is that in one graph the number of request *per*second* increseased for PostgreSQL. And this does not look right, so the original poster is quite right.

    > The problem is that you don't know how to read a graph.

    Compare graphs on page 3 and page 4. Then come back explaining us how *you* read those graphs.

    Interpreting profiling results is a very difficult task. In that case, I suspect that it is impossible without getting the hands on the test configuration to try various alternative loads. Results are (IMHO) too strange to be accepted without further investigation.

    Cheers,

    --fred
  • by johnnyb ( 4816 ) <jonathan@bartlettpublishing.com> on Monday November 13, 2000 @06:13AM (#627898) Homepage
    I think you're missing how many clients 30 is. As far as hard-hitting clients go, 30 is a _huge_ number to have simultaneously. If you remember MathWorld [wolfram.com], it served up 2 Gig worth of data per day, yet only had between 5 and 20 clients at any one time, most of which were downloading pictures, not pages. I'd say that we were probably getting about 3 page connections per second (I could be wrong, my data is obtained from watching a tail -f on a log file, and remembering the results).


    Anyway, the number of tables you have depends greatly on the problem you are trying to tackle. If you've used SourceForge, you'll see that there is a whole lot of data displayed on each page. I would be more worried if he would have been able to get all of this from a single database query (could you imagine a table that would do that - yuch)

  • MySQL made the "mistake", as you call it, of support transactions for the simple reason that in a web environment like MySQL is often being run, transactions become especially important.
    If I'm running a query that is updating a number of financial records to place orders and proccess information, I don't want that process to bomb out half finished, I want it to either be completed or fail to complete. I want the security of knowing that there isn't random orphaned data hiding within the system.
    MySQL is a great quick small end database for web applications, and I really hope they find a way to optimize thier transaction work, because I really liked working with it. Until then we will continue in our transaction dependant world.
  • Actually, you are quite wrong. PostgreSQL has more documentation

    The online PostgreSQL book is ~490 pages and the latest online MySQL manual I have is ~460, which is comparable. But MySQL also has an >700 page "New Riders" book and an >460 page O'Reilly book. Other than the online text, I haven't found any other PostgreSQL books, which is unfortunate.

    Python, in particular, has PostgreSQL bindings but they don't yet conform to the 2.0 DB spec, which is a pain. In general, the PostgreSQL modules seem older and less maintained than the MySQL modules, which is both a cause and a symptom of support problems.

    Again, I see nothing technically wrong with PostgreSQL. But if it's going to get more popular, I still believe it needs more support in terms of documentation and languages.

  • I have been unable to locate binaries for Windows NT/2000? Does anyone know where/if such a thing exists? I would like to evaluate PostgreSQL for our project instead of SQL Server 7.0 or 2000. I would also appreciate comments anyone has on such a move.

    Thanks in advance!

  • Subject says it all.
  • This is absolutely not true. There are three Python 2.0 DB compliant PostgreSQL drivers. PyGreSQL has just added 2.0 compliancy, PoPy is 2.0 compliant, and there's another one that also purports to be 2.0 compliant.

    Apart from Momjian's excellent online manual there is still the existing PostgreSQL documentation. Everything from basic queries to creating your own types, to hacking out extensions in C are covered by this documentation. MySQL may have more books about it, but most of the information in these books will be duplicate.

    The fact of the matter is that PostgreSQL has more features than MySQL, is more standards compliant (SQL-92), is released under a more liberal license (BSD style versus GPL), and it is now even starting to outperform MySQL for simple selects and joins.

    The reasons for staying within MySQL's limited abilities are getting smaller and smaller. Why not just use PostgreSQL in the first place? Eventually every successful database project is going to want the features that PostgreSQL has, why start with MySQL and face the prospect of migration later.

  • by tmu ( 107089 ) <todd-slashdot@renesys . c om> on Monday November 13, 2000 @07:31AM (#627911) Homepage
    I'm sorry but I think this grossly misses several points and undermines the appropriate difference between threads and processes.

    One way to think of threads is as a solution to bloated processes. In operating systems like Solaris, which are designed to scale to 32 or 64 (or more) processors, the number of locks in the kernel is enormous. As a result, context switches and process creation time tend to be higher. In gross terms, you pay for the concurrency at high numbers of processors with poor performance at lower numbers of processors. This is not necessarily bad (especially if you plan to use 16 or more processors at some point) but it is just the way things are.

    Linux, on the other hand, penalizes scalability for large numbers of processors in order to get much better performance with smaller numbers of processors. Linux does this because Linus doesn't believe that 16+ processor machines are common or sensical and that the kernel should be optimized for common (and sensical) cases.

    Why do I mention all of this? Because Linux process creation times are slightly faster than Solaris thread creation times. Anyone who fetishizes multi-threading versus multi-processing doesn't really understand the difference between the two and when it really matters.

    Processes are contexts of execution. Threads are contexts of execution that may share a memory space with other contexts of execution. These are fundamentally different things in some operating systems. In Linux, a thread is just a process that shares a memory space with another process. This is because processes are *fast* to create and switch to in Linux (optimized for the common case, remember?).

    Anyway, I don't mean to bash maraist at all, just to point out a common set of misconceptions about these things. When considering these architectures, each database system must try to optimize across all of the operating systems they plan to be deployed on (just like the apache project, which i think was an excellent analogy).

  • by cjsteele ( 27556 ) <coreyjsteele AT yahoo DOT com> on Monday November 13, 2000 @04:09AM (#627912) Homepage
    I work for a large manufacturing company, and recently we submitted our budget for next year (2001)... well, on there we had SQLServer, and a bunch of other commercial software. We got our budget back in its final form, and it did NOT include our SQLServers... we needed a database system, so I threw together a proposal for some smaller MySQL servers.

    My manager liked the idea, but wanted info on other open source databases... I pointed him towards Tim's article (this was last week) and we've been on our merry way since. We now have both MySQL and PostgreSQL boxes in production for testing, and will make our determination later on!

    Tim's article played a HUGE roll in our initial impression forming, and I would recommend it to all and any who are looking for info on open source databases!
    -C
  • Which of the two is the more widely used? All you hear is that MySQL is the best. Seems that this is not the case...
  • by Anonymous Coward on Monday November 13, 2000 @04:14AM (#627916)
    We are a medium sized company developing web apps. We are not poor, but we would be if we ruled out open source software. We use db2 for most work, but postgres is being investigated and we have found it quite workable for our many needs.
    Your comments remind me of similar ones a few years back when we started using linux and ppl said "serious companies don't use linux" Nobody (except a small-minded minority of change-haters) is saying that now.

  • by CaptainZapp ( 182233 ) on Monday November 13, 2000 @04:18AM (#627919) Homepage
    Did anybody else notice that PostgreSQL actually improved performance with more concurrent users ?

    The only explanation here is the caching behavior of the data base. Howerver, this also indicates that benchmarking databases is not a really trivial task, because exactly such effects must be considered for database benchmarks.

    Further, besides a few graphs. The test says actually nothing different then: PostgreSQL improved strongly, while MySQL is a dog. Actually it says nothing at all.

    Not that I mind the results. I worked with both databases briefly and believe that PostgreSQL is far closer to an industrial strength database (Lack of transaction control disqualifies MySQL for that in the first place). Nevertheless, I think the results really lack any significance.

  • You are way out of your league here.

    Not so. You say "you are uniformed and hasty in your conclusions" and then you say stuff like I quoted above. I have extensive experience in large scale web site back end design and have been a lead engineer in several sites that have very heavy loads (such as large dot-coms and large e-commerce sites for more traditional "chimney" companies). I do know what I'm talking about here.

    I know quite well that a site might need to pull data from a large number of data sources to build up a page (especially in personalizable portal-type front pages). This doesn't mean that you have to do a large amount of database queries, however. Caching is one crucial strategy if you're going for scalability and speed, for instance. Planning the site and page layout so that you don't need so much data for one page is also crucial. If there is no clear simple way to cache the data you need to produce something more complex then cache the result (yes, the HTML!) and only update the HTML every so often - as needed - based on some flag. Read the flag together with some other data that you must get from the database anyway. This is what I'm talking about.

    I'm honestly majorly surprised that the performance test mentioned that the site used a page that required 16 queries. When I saw the performance results, I wasn't so surprised they were so poor. I'd like to see the source code for that page..

  • by psergiu ( 67614 ) on Monday November 13, 2000 @04:19AM (#627921)
    - Which of them is most stable (with HUGE databases) ?

    - Which of them is most likely to be able to recover withouth glich from an "impure" shutdown ?

    - How much disk space can they waste in tablespaces with frequent spaces ?

    - Which will benefit more from a HUGE ram quantity or multiple cpus ?

    --
  • One page may well be doing some extremely complicated processing, which has evolved over many iterations of the code - it's not hard to imagine that such a page could have 16 queries in it, especially if it's being written in an oo style.

    I don't agree with that statement. In cases where you need data using 16 different queries, no doubt many of those could have been cached for a huge speedup. I've written code for back ends for several major sites and I've seen very complex funcationality indeed where only a few queries were needed per page - most data was pulled from caches. Also, OO helps you write code where you can fetch data lazily. For instance in a discussion forum, you might keep all root-level messages headers in a cache, the 1000 last accessed bodies in a cache, 10000 last accessed headers in cache, parent-child relationships in cache.. In this way, you don't have to go to the database 16 times every time users come to the front page of the discussion forum and are served the exact same page - 30 times per second.

    Like you said, computer time is now significantly cheaper than programmer time. Well, memory is also cheap. Keep stuff in memory and use good old abstract data types to get your data fast.. RDBMS's are very overused these days.

  • Perhaps the database was well designed from a data modelling point of view but not from a performance point of view. A good database design for a web site that needs to scale is such that it allows you to cache the data that is used most often. On a web site that gets 30 hits per second but where data only changes about once per 30 seconds, one should start thinking why the data is pulled from the database every single time. User profiles on personalized pages can be cached, data can be pre-fetched with one query, cached and then accessed from the cache when needed instead of doing several queries for a smaller result set etc. etc.. 16 queries is insanity!
  • Sadly, I don't know if Postgres does ANY
    pre-compiling of stored procedures. They have two main methods: straight named 'sql' statements, and 'psql' (a true stored procedure 'language'). My
    guess is that they do little more than pre-parse the statment (if even that).

    Commentary like this disgusts me, because the writer makes it clear he doesn't know anything about Postgres but merely presumes it's a poor implementation.


    For the record, PL/pgSQL queries are pre-compiled, which is why there's no dynamic SQL capability in the language. You can create dynamic SQL from one of the other languages (pltcl, plperl) which exposes the SPI interface to the query engine to the user.


    Work is under way to allow optional caching of individual query plans (i.e. by explicit statements), which allows one to avoid parsing and optimization overhead. When completed (7.2?) this will be great in the web environment.

    I know little about Sybase so will avoid speculation about that RDBMS, might you be so kind as to avoid speculating about Postgres as well?

  • As a followup idea.. 16 joins reminds me of a project that I did once where we used ID's for every table.. Thus every table stored meta data and a single piece of actual data.. Thus if you had 17 pieces of data to display, then you'd have to do 16 joins. There were three solutions to this:

    First, cache all the smaller tables, and do the translation (from say user_id to user_name) in an external hash (I know perl can easily do this; don't know about php). For static fields such as user-names, email addresses, etc, this is easy. Unfortunately this doesn't help when trying to find all users with email addresses going to aol and souce code that's over-due, etc. But it does solve the summary report problem.

    Second is to break up the query into multiple operations.. You need an incredible amount of memory to do certain types of joins, and sometimes the sql-processor guesses the wrong method. Even though postgres's "explain" command told me it was doing hash-lookups for a user-name summary, I found that it was faster to first get all the data with user-id's, then perform a second search with
    "select user_name from users where user_id in (x,y,z,....)"

    So long as you don't have more than a couple dozen entries, this works nicely.. ESPECIALLY if you have thousands of rows of real-data with only one or two unique user-names. This was a nice-short term fix, but "clean" sql should all be done on a single statement.. Soo onto the third

    Denormalize back to form 4 or 3. Namely, whenever you have a user-name, don't use a user-id.. Sure you consume 8 more bytes of memory per row, but it greatly speeds up searches. The method is to ONLY use ID's when you can not uniquely represent a primary key with a single column.. (such as first-name, last-name,...).. But in our computer world, "user-name" tends to be unique. "project-name" should be unique, "working-branch" should be unique.. In which case, at the design level, it's worth the extra consumed space.. Now I know that Mysql advocates (including me a year ago), cringe at the idea of keying off variable length strings.. For me, it was the idea that you can compare integers much faster than strings.. BUT, if the string is an index, then your sorting is going to be MUCH faster, strings distribute much better than auto-incremented integers (which tend to want to be linear). Additionally, the fact that you won't have to perform a join at all removes almost all of the performance problems.

    When you don't use meta-data as primary keys, then you find that you only perform joins when you actually have relational data... That joins based on summary data almost always goes away.

    Additionally, if your DB ever get's corrupt (say you lose a table), then you don't have to fret over "who the hell was user-id 155?". Given that Mysql doesn't have roll-backs, the potential for data corruption with meta-data is enormous (when you do both reads/writes).

    -Michael
  • by segmond ( 34052 ) on Monday November 13, 2000 @06:32AM (#627930)
    Actually, you are quite wrong. PostgreSQL has more documentation, I know this cuz I have combed every possible link on both site. Go back to PostgreSQL page, there is a full book free on PostgreSQL on their documentation page. Also, PostgreSQL has bindings for a zillion languages as well, what do you want? Perl? C? Python? C++? more? Before you post, please update yourself before you post your misinformations. I like both database, and I use both. :)

  • First, I started off with the words "Sadly, I don't know".. Therefore removing myself as an expert in that area. It was more a question than anything else.

    As for the other languges, perl, tcl, etc, those can only be less optimial forms, and are more for procedural operations, which has nothing to do with over-comming the 16 join problem. (With the exception maybe of hashed-lookups, but that's dangerous)

    It's great to hear about cached query plans (I'd forgotten the term).

    As for the problem with speculation.. If nobody were to speculate, and those in the know don't impart their knowledge, then has anyone gained anything? 'Qualified' speculation does exactly as here; Inciting those knowing specific facts to quality errors. My speculation _was_ based on emperical analysis, however, so it wasn't as if I was blindly stateing something.

    But now, as a question, is the compiled sql stored in any more optimal fashion? Or does postgres use an interpreter (which does little more than tokenize). To me, a compiled sql statement means that its making raw calls to table operations..

    Simple benchmarks that I've done haven't shown any performance increases with using stored procedures, but that might just be because the compilation / interpretation stages were insignificant with respect to the overall operation. The other interpretation is that little is actually gained by the compilation of the sql statement (at least until the cached query plans come around).

    -Michael
  • As I recall, postgres is designed (optimized) for a PPro. In my testing of postgres (it was the only DB the RADIUS application knew how to talk to), a 486DX50 was faster than a 533 Alpha -- running 6.2.1 (as you can see, that was a while back.) I've not done any tests since "time travel" was removed -- and boy did that make a difference.
  • by smoon ( 16873 ) on Monday November 13, 2000 @04:22AM (#627937) Homepage
    Open source databases -- great. But how about tools to facilitate database development? In commercial terms you've got modeling systems (e.g.: Erwin), Extract-Transform-Load (ETL) tools (e.g.: Informatica, Sagent), Reporting tools (e.g.: Brio, Business Objects, etc.). GUI managers, GUI drag-n-drop development tools, etc.

    In the open-source world you've got... Well you've got to custom code in some non-database-centric language, e.g.: C, C++, Java, Perl, etc.

    Anyone want to start up an open-source alternative for an ETL tool? Target open-source databases as source and target systems. A usable open-source system would really raise the visibility of open source in the (hitherto) commercial-only data warehousing market.
  • by macpeep ( 36699 ) on Monday November 13, 2000 @04:27AM (#627940)
    Anyone who has designed a site and its database in such a way that generating a page requires 16 queries should not taken seriously when conducting a database performance test. Still, it does seem to suggest that MySQL does not perform too great (which doesn't come as a surprise to me really.. All I'm saying is that a site with a page that requires 16 queries is *VERY BADLY DESIGNED* and will not scale very well at all!
  • by kootch ( 81702 ) on Monday November 13, 2000 @04:29AM (#627941) Homepage
    I've posted this before, but I'll post it again.

    Centura Software makes an open source mobile database software called db.star and db.linux

    db.star [centurasoftware.com] is the first open source, embeddable data management solution for Information Appliances (IAs). "It is a high performance, small footprint, developer friendly database engine which facilitates the development of rich and powerful applications for popular IA platforms, such as the Palm® Computing Platform and Windows®-powered Pocket PC. "

    Open source but not free.

    db.linux [centurasoftware.com] is OSS Data Management for Information Appliances

    Open source and royalty free.

  • by f5426 ( 144654 ) on Monday November 13, 2000 @04:31AM (#627943)
    PostgreSQL is faster the MySQL for this guy.

    But, as always in the case of database server choices, nothing replace real world test. You should test the various servers on *your* data.

    Frankly, I found the numbers quoted in the article ridiculously low. But as we don't know te volumetry of the data, the hardware used, the database cache size, etc, etc, this boils down to 'someone found PostgreSQL better than MySQL for its usage'.

    It is not uncommon to have orders of magnitude performance differences between databases. I would love to know how what a tuned ORACLE would get.

    Cheers,

    --fred
  • Thanks for your experiences. I may yet collapse and simply use Linux for a test RDBMS server, but I suspect my client will opt for the perceived safety of SQL Server 2000. That's OK, it'll look good on my resume! ;+)
  • Erhm... you just contradicted yourself.

    First you say that programs do not need to be specifically written to support multiprocessing and then you say that using multiple threads is the path of least resistance [to support multiprocessing.]

    Actually, applications can run on SMP machines with no problem, but if they use only ONE process or thread (about the same thing on Linux, but very different things on Solaris, as a previous poster mentioned), then they will not be running on more than one processor, thus not taking advantage of multiprocessing... Yes, there will be SOME benefit when running other processes, but the application taken by itself (not counting other processes...) will not have any added advantage other than the fact that the proocessor it is running on will have more cycles free.

  • I am building a data entry site in PHP + Interbase and Interbase has shown itself to be rock solid (both on NT and Linux).

    I could not use PostGreSQL because it does not run on NT (client only has NT server and does not want a Linux box for this system). I don't know much about PG, so I don't know how good it is.

    I did not use MySQL even though I have run it on Linux and NT for these reasons:

    • No transactions (could not wait for MaxSQL before implementation).
    • No integrity constraints. Unfortunately, data entry systems require integrity constraints and the attitude [mysql.com] of the MySQL developers is IMHO assinine. I have worked on databases with and without integrity constraints and in my experience, the data is always worse in databases where the philosophy is to 'let the database developers' deal with it.
    • Unlike MySQL, PG, and most databases, in Interbase, writers never block readers. Check the product overview [interbase.com] for details.

    Interbase may not have industrial strength for massive number of concurrent users, but this particular implementation is a data entry, low volume system. I cannot personally attest to the viability of Interbase in a large number of users.

    Concerning the 'opennes' of Interbase, the source is released, there are no restrictions and this database is not going anywhere soon. The main problem is that Inprise's release of the source left a bad taste in the mouths of free software advocates. This issues have been remedied and they have delivered on their promise to release the source.

    Check Interbase out, you'll be surprized.

    "Fat, drunk, and stupid is no way to go through life."

  • Yup. The results have the BEGIN/COMMIT brackets in there. You're right; it does make a really big difference.

    Question is moot, anyway. We'll probably just stuff everything into a flat file and forget about "online response" (:-(.

    --
  • I think he is a lying schmuck, many people have replied to his post requesting for the URL, and he hasn't responded. I checked google's cache and it returned nothing. he is just on a karma rampage.

  • After having used MySQL for quite a while I played with Postgresql and was quite impressed. I noticed, though, that it would not understand insertion of large text object data within an sql INSERT command as opposed to MySQL. I can't find comments about this anywhere... Does anybody know if it is something you wouldn't (shouldn't) do for some reason? Or is it another limitation of Postgresql?
  • by deefer ( 82630 ) on Monday November 13, 2000 @04:34AM (#627955) Homepage
    What about Interbase?
    I'm a bit confused about the whole deal, Interbase is now Open Source, which is nice, but is it free beer? The website (here [interbase.com] says it is freely downloadable and open source) but is it free as in beer?
    Interbase is a nice database for small to mid range deployments. Fast, small, and rock solid. But nobody is getting behind it - the Interbase section on sourceforge [sourceforge.net] has no files released yet.
    Any other /.'ers out there with links to what is really going on? I gather there was some grief and a splinter group has been formed (FireBird?)

    Strong data typing is for those with weak minds.

  • by tmu ( 107089 ) <todd-slashdot@renesys . c om> on Monday November 13, 2000 @04:39AM (#627957) Homepage
    I should say at the outset: I've used both Postgres and MySQL on production projects and like both for very different reasons. But here's the thing (and this is a common problem with technology product evaluations): these products have very different design goals and aren't really that comparable.

    I'm not saying the standard "MySQL's not a database because it doesn't support transactions and databases have to. ACID!" (although I'm sympathetic to that point of view, I don't think individual words like "database" are worth fighting over--If MySQL wants to call itself a database, fine). What I am saying is that Postgres was designed to be a full-fledged SQL92-compliant database with transactions and triggers and foreign keys and the whole lot. MySQL was designed to be a SQL-based front-end to a very fast database file format. These products are not the same and comparing them without agknowledging that seems foolish.

    I've been very pleased by the speed improvements in Postgres recently (partisan testing aside, Postgres 7 really is much faster). I've also been impressed by feature additions in MySQL (although it still isn't close to what you would expect to get if you're coming from the Oracle or DB2 world). But both remain inherently died to their design goals. This isn't a bad thing, at all, because different projects need different products with different design goals.

    I compare this to the (often senseless) comparisons of NetBSD, OpenBSD and FreeBSD. One is designed for portability, another for security (features be damned!) and the last for performance, features and multi-processing under (primarily) intel. Although they come from a common code base, they have obviously diverged in design goals. Instead of bashing one product or another (all of the *BSD's and the two databases discussed have *major* problems that are bashable) use the one that seems most appropriate to your needs.
  • by tommyq ( 183576 ) on Monday November 13, 2000 @04:40AM (#627958)
    Lest we forget, technical superiority is only a very small part of the battle, you must also win Mind share

    Well, that depends on what you mean by "win." Open source is not a popularity contest. In the past, it may have been true that to ensure the continued existence of a cool project, it needed to have a chance of dominating the market at large. But nowadays, as long as there are enough developers interested in a project--oftentimes only because of its "technical superiority"--it can stay alive, grow and improve. If a project can attain real technical superiority in this way, I call this winning.

    That being said, I think it should be remembered that for many simple web sites, MySQL was the right choice because its superior performance with a read-oriented setup. Of course, Postgresql people claim a read-oriented setup isn't a real database application at all, but having designed several such small sites, I really wouldn't use anything else. (Flat files?? What an unnecessary pain!)

    Just my two cents . . .

If entropy is increasing, where is it coming from?

Working...