Forgot your password?

typodupeerror
Books Google Math

Counting the World's Books 109

Posted by Soulskill
from the not-goin'-anywhere-for-a-while? dept.
The Google Books blog has an explanation of how they attempt to answer a difficult but commonly asked question: how many different books are there? Various cataloging systems are fraught with duplicates and input errors, and only encompass a fraction of the total distinct titles. They also vary widely by region, and they haven't been around nearly as long as humanity has been writing books. "When evaluating record similarity, not all attributes are created equal. For example, when two records contain the same ISBN this is a very strong (but not absolute) signal that they describe the same book, but if they contain different ISBNs, then they definitely describe different books. We trust OCLC and LCCN number similarity slightly less, both because of the inconsistencies noted above and because these numbers do not have checksums, so catalogers have a tendency to mistype them." After refining the data as much as they could, they estimated there are 129,864,880 different books in the world.
This discussion has been archived. No new comments can be posted.

Counting the World's Books

Comments Filter:
  • by Anonymous Coward on Friday August 06 2010, @01:38PM (#33164782)

    estimate would be about 130 million, not 129,864,880

  • by SomeJoel (1061138) on Friday August 06 2010, @01:42PM (#33164862)
    But 130 million can't possibly be right! We better assign some false precision to make our estimate believable. Significant digits are for science teachers and marriage counselors!
  • Wow (Score:3, Insightful)

    by demonbug (309515) on Friday August 06 2010, @02:11PM (#33165438) Journal

    They should write a book!

  • Re:Seriously... (Score:4, Insightful)

    by SomeJoel (1061138) on Friday August 06 2010, @02:15PM (#33165504)

    Who cares? Does it matter?

    Does anything?

  • by SomeJoel (1061138) on Friday August 06 2010, @02:18PM (#33165546)

    I suspect that most books have been written recently and their writers are still alive.

    And I suspect that you are full of crap.

  • by andrewagill (700624) on Friday August 06 2010, @03:36PM (#33166920) Homepage
    How about the books that people write and spread around to friends or books published by small in-house printshops, often as promotional material? Books written before ISBN that are still in libraries but no longer published (Bodoni's type specimens come to mind, though it looks like some of these are indeed catalogued by WorldCat)? Books that were printed years ago that we know we lost to the ages (the lost Gospel of Barnabas--not the forged Gospel of Barnabas--comes to mind). What about the books that we never knew existed?

    This estimate isn't bad for published works, but it does not adequately answer the question posed, ``Just how many books are out there?''
  • by bcrowell (177657) on Friday August 06 2010, @05:17PM (#33168536) Homepage

    ISBNs suck as identifiers for digital books, especially digital books that are free. There are two problems.

    Problem number one is that they cost money. Let's say someone writes up a really nice manual documenting some open-source software. He wants the manual to be free, just like the software. But now if he wants an ISBN, he has to pay money to get the ISBN, which means expending dollars on a book that is not going to be bringing in any dollars. The fact that ISBNs cost money is out of step with the fact that we have this thing called the World Wide Web, which is basically a huge machine for letting people do publishing without the per-copy costs that are associated with print publishing.

    The other problem is that ISBNs are supposed to uniquely identify an edition of the book. This makes sense for traditional print publishing, where the economics of production forced people to make discrete editions widely spaced in time. It makes no sense for print on demand or for pure digital publishing. I've written some CC-licensed textbooks. When someone emails me to let me know about a typo or a factual error, I fix it right away in the digital version, and I usually update the print-on-demand version within about 6 months. No way am I going to assign a different ISBN every 6 months.

    We can say that ISBNs are for printed books, not for ephemeral web pages, but that doesn't really work. The two overlap. My textbooks exist simultaneously as web pages, pdf files, and printed books. Amazon sells a book for the kindle using one ISBN, assigning a different ISBN to the printed version. Print-on-demand books share some characteristics with printed books (e.g., they're physical objects) and some with the web (can be updated continuously).

    By the way, why do you think library catalogs don't show ISBNs? It's because ISBNs are meant as commercial tools, like the barcode on a box of cereal. If google finds ISBNs useful for other purposes than selling copies of books, it's probably because google is trying to deal with a massive number of books using a minimum amount of human labor.

FORTUNE PROVIDES QUESTIONS FOR THE GREAT ANSWERS: #15 A: The Royal Canadian Mounted Police. Q: What was the greatest achievement in taxidermy?

Working...