Counting the World's Books 109
The Google Books blog has an explanation of how they attempt to answer a difficult but commonly asked question: how many different books are there? Various cataloging systems are fraught with duplicates and input errors, and only encompass a fraction of the total distinct titles. They also vary widely by region, and they haven't been around nearly as long as humanity has been writing books. "When evaluating record similarity, not all attributes are created equal. For example, when two records contain the same ISBN this is a very strong (but not absolute) signal that they describe the same book, but if they contain different ISBNs, then they definitely describe different books. We trust OCLC and LCCN number similarity slightly less, both because of the inconsistencies noted above and because these numbers do not have checksums, so catalogers have a tendency to mistype them." After refining the data as much as they could, they estimated there are 129,864,880 different books in the world.
Re:How do you define "different book"? (Score:2, Informative)
Re:How do you define "different version"? (Score:4, Informative)
Look at textbooks - new editions that are almost indistinguishable from the previous editions have new ISBNs. Do we count every single one as a different book?
From TFS : if they contain different ISBNs, then they definitely describe different books
If they're using this method, GP's point is valid. The books are not really new books, they're essentially the same as previous editions but have different ISBNs. In essence, these new editions with new ISBNs are being counted twice (or more) for very small revisions to the same book.
Re:How do you define "different version"? (Score:2, Informative)
From TFA: Well, it all depends on what exactly you mean by a “book.” We’re not going to count what library scientists call “works,” those elusive "distinct intellectual or artistic creations.” It makes sense to consider all editions of “Hamlet” separately, as we would like to distinguish between -- and scan -- books containing, for example, different forewords and commentaries. (emphasis mine)
For Google's definition of what constitutes a unique work as used to derive the stated quantity, the use of ISBN as described is perfectly valid. They are OK with "almost the same work" != "the same work".
So their counting methodology would consider "Fundamentals of Math 3rd Ed by I. M. Counting" to be a distinct work from "Fundamentals of Math 4th Ed by I. M. Counting".
In fact, if the publisher released a paperback version, it would be considered another separate work, because the typesetting and page layouts may differ, and might include different forewords, different pages on the index, etc.
It's a separate and distinct work, from Google's point of view, where they are trying to index the works that they want to scan.
Remember, their goal is to capture as much as possible of the entire sum of human writing. A different foreword is a unique work to them.
Of course, you can then disagree with Google's counting methodology, which is fine. If you do, then the number they have reached for their purposes is meaningless to you and you'd better start counting based on your own definition.
It'll take a while, good luck, and let us know what you come up with. :)
Re:1 in 50 people wrote a book (Score:4, Informative)
Isaac Asimov wrote over 500 books. I don't know know haw many Terry Pratchett has written but the number is in the dozens. There's Clarke, Heinlein, Niven... and those are just a few science fiction writers (yes, Asimov also wrote nonfiction and Pratchett is known mainly for fantasy). Serious authors write more than one book each.
So your average is a little meaningless.
No, averages are very meaningful. Extremely meaningful. They are the AVERAGE (usually the mean), which means that some values will be above, and some values will be below. The idiocy comes in when people mistakenly jump to the conclusion that just because an average exists, it means that every value must be exactly the same as the average. Or, just because you can find extreme values far away from the average that again the average is not meaningful.
If the average states that 1 in 50 people have written a book, then, by gum, it will be easy to find plenty of people who have written zero books, somewhat fewer who have written exactly one (something below 1 in 50), much fewer who have written exactly two, even fewer who have written exactly three, etc. That does not mean that example authors with hundreds of books cannot exist, it only bounds how frequent they can be.
Of the myriad of ideas that the academic community has utterly failed in educating the general public about, it's the relationship between averages and distributions. One more time: just because an average exists, it does not mean that every datum has the same value as the average. As an example, just because the average male in the US is 5' 9", it does not mean that every single male is that tall, nor that you will not find ones that are shorter, taller, or even much shorter or much taller. The tallest man (according to my 20 seconds of research through Google) was 8' 11", and the shortest was 1' 10" ... does that lessen the meaningfulness or utility of the average male height? Rather the contrary: it provides important information as to the extent of the distribution of heights.
Now, I suspect that the parent poster is trying to say that because -- by loosely founded speculation -- most authors are professional authors ("serious authors") and therefore will have more than one book to their name, the classification of people into authors and non-authors will be skewed against 1:50. I would not argue against that (in fact, I indirectly argued for it above). Nevertheless, using the utterly non-scientific sample of the books above my desk, most authors have only one book to their name, so the number isn't going to be much worse than 1:50, perhaps 1:55 or 1:60. That kind of pure, unadulterated speculation is exactly the sort I would love to see proved wrong with hard data.