Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Books Google Technology

Google Books Makes a Word Cloud of Human History 127

An anonymous reader writes "From Ed Yong at the Not Exactly Rocket Science blog: 'Just as petrified fossils tell us about the evolution of life on earth, the words written in books narrate the history of humanity. The words tell a story, not just through the sentences they form, but in how often they occur. Uncovering those tales isn't easy — you'd need to convert books into a digital format so that their text can be analyzed and compared. And you'd need to do that for millions of books. Fortunately, that's exactly what Google have been doing since 2004.' Yong goes on to explain that the astounding record of human culture found in Google Books offers new research paths to social scientists, linguists, and humanities scholars. Some of the early findings (abstract), based on an analysis of 5 million books containing 500 billion words: English is still adding words at a breathtaking pace; grammar is evolving and often becoming more regular; we're forgetting our history more quickly; and celebrities are younger than they used to be. You can also play with the Google Books search tool yourself. For example, here's a neat comparison of how often the words Britannica and Wikipedia have appeared."
This discussion has been archived. No new comments can be posted.

Google Books Makes a Word Cloud of Human History

Comments Filter:
  • OCR errors (Score:5, Interesting)

    by SputnikPanic ( 927985 ) on Friday December 17, 2010 @03:25PM (#34591090)

    AFAIK, Google Books doesn't do the sort of methodical OCR clean-up that Project Gutenberg does, so a lot of Google's digitized books have a a fair number of errors. It'd be funny to see what kind of blips this might creates in our extracted cultural history!

  • by alcourt ( 198386 ) on Friday December 17, 2010 @03:37PM (#34591282)

    I wish they had gone in the article into more depth about grammar changes, rather than just word forms. For example, sentence ordering, comma usage, and some various other grammar items would be more intriguing. I found the burnt/burned the most interesting comparison because it showed an example of two competing versions of a word.

    Interesting idea, but as was stated in the article, there are definite limits to what this technique can study, and many are unconvinced of its value for more than highly limited problems.

Living on Earth may be expensive, but it includes an annual free trip around the Sun.

Working...