Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Books Google Technology

Google Books Makes a Word Cloud of Human History 127

An anonymous reader writes "From Ed Yong at the Not Exactly Rocket Science blog: 'Just as petrified fossils tell us about the evolution of life on earth, the words written in books narrate the history of humanity. The words tell a story, not just through the sentences they form, but in how often they occur. Uncovering those tales isn't easy — you'd need to convert books into a digital format so that their text can be analyzed and compared. And you'd need to do that for millions of books. Fortunately, that's exactly what Google have been doing since 2004.' Yong goes on to explain that the astounding record of human culture found in Google Books offers new research paths to social scientists, linguists, and humanities scholars. Some of the early findings (abstract), based on an analysis of 5 million books containing 500 billion words: English is still adding words at a breathtaking pace; grammar is evolving and often becoming more regular; we're forgetting our history more quickly; and celebrities are younger than they used to be. You can also play with the Google Books search tool yourself. For example, here's a neat comparison of how often the words Britannica and Wikipedia have appeared."
This discussion has been archived. No new comments can be posted.

Google Books Makes a Word Cloud of Human History

Comments Filter:
  • Case sensitive? (Score:5, Informative)

    by IWannaBeAnAC ( 653701 ) on Friday December 17, 2010 @03:25PM (#34591092)

    Interesting that it is case sensitive. Searching for "britannica,wikipedia" in lowercase, produces, for today, close to zero for brittanica, and 0.00005% for wikipedia, which is not far off the result for Wikipedia (with capital).

    Putting these together, the case-insensitive comparison of brittanica and wikipedia has wikipedia already well ahead of brittanica, at around 0.00010% for britannica, vs 0.00013% for wikipedia.

  • by jfengel ( 409917 ) on Friday December 17, 2010 @03:48PM (#34591450) Homepage Journal

    Most of the actual hits there appear to be OCR-os for the word "suck" and "such", often due to the use of medial "s" that resembles an "f". The word "such" appeared on a page which was badly speckled.

    Given that the word "suck" was often used in the expression "to give suck", many of those pages are quite hilarious ("she would not suffer the strange lamb to fuck"). I didn't see any actual "fucks" in the first few pages of hits.

    I know that the word was known. Shakespeare made a sly reference to it in Merry Wives of Windsor. But I suspect it wasn't often set down on paper, at least not in the kinds of books that got preserved.

  • by CorpusProf ( 1961030 ) on Friday December 17, 2010 @05:50PM (#34593228) Homepage
    http://corpus.byu.edu/coha [byu.edu]
    Corpus of Historical American English.

    -- 400 million words, 1810s-2000s.
    -- Allows for many types of searches that Google Books can't:
    * accurate frequency of words and phrases by decade and year
    * changes in word forms (via wildcard searches)
    * grammatical changes (because corpus is "tagged" for part of speech)
    * changes in meaning (via collocates; "nearby words")
    * show all words that are more common in one set of decades than others
    * integrate synonyms and customized word lists into queries
    * etc etc etc
    -- Funded by the National Endowment for the Humanities (NEH), 2009-2011.

    Take a look at the "Compare to Google/Archives" link off the first page.

For God's sake, stop researching for a while and begin to think!

Working...