Boiling Down Books, Algorithmically 177
destinyland writes "A year ago, Aaron Stanton harangued Google over his new project, a web site analyzing patterns in books to generate infallible recommendations. In March he finally finished a prototype which he showed to Google, Yahoo, and Amazon, and he's just announced that he's finally received a big contract which 'gives us a great deal of potential data to work with.' The 25-year-old's original prototype examined over 200 books, plotting 729,000 data points across 30,293 scenes — but its universe of analyzed novels is about to become much, much bigger."
Just one more errosion.... (Score:5, Insightful)
The difference between now and 100 years ago becomes more apparent each day. Then, owning books was a sign of affluence, of intelligence. Now? Everything is up to question, and should be. Analyzing books and other public material is just another step in putting intelligence out there for everyone, not just those that can afford it. I applaud it, and all the dangers it brings. Such hurdles are necessary, but we must assault them to overcome barriers that should no longer exist.
If you already read, you don't need this... (Score:5, Insightful)
...and if you do not read, you won't want this.
Re:Just one more errosion.... (Score:5, Insightful)
Knowledge, not intelligence.
I'll believe it when I see it (Score:5, Insightful)
Yet Another Pointless Dot-Com (Score:4, Insightful)
algorithm bombing (Score:4, Insightful)
how long before someone figures how to fool the algorithm, and we all start reading books about enlarging our genetalia, but in a classy way?
Re:Just one more errosion.... (Score:5, Insightful)
What really hits a nerve with me is why the scientific community hasn't opened up all their journals for others to read. I imagine many retired and amateur scientists, engineers, hobbyists, etc, would have a lot of insight into many engineering and scientific problems and also make many discoveries as well. Intelligence is not limited to the credentialed, those of high status or currently employed, many discoveries happen simply by exposure to as many minds as possible, and finding connections and errors in others works..
Re:I'll believe it when I see it (Score:3, Insightful)
I wholeheartedly agree! Take for example two phrases which are equivalent...
"Eighty seven years ago our ancestors ..."
and
"Four score and seven years ago our forefathers ..."
They say the same thing but what a difference in eloquence.
Re:a tool (Score:4, Insightful)
A different book (Score:1, Insightful)
Re:Just one more errosion.... (Score:5, Insightful)
Re:Just one more errosion.... (Score:5, Insightful)
Re:Yet Another Pointless Dot-Com (Score:4, Insightful)
Of course all the problems you listed apply anyway. It's very easy to have a work with all the same pieces as a great work of art, but assembled in such a way that the derivative work is completely unsatisfying.
A great example of something similar that you can try today and watch as it fails miserably is Pandora.com. They categorize music by a number of different elements so they can recommend similar pieces. And their categorization is quite accurate; they correctly surmised after about five minutes that I enjoy symphonic rock with a mix of acoustic and electric guitars, obscure lyrics, complex themes, unusual rhythm patterns and interesting chord changes. They then proceeded to present me with one after another shitty Coldplay or Radiohead rip-off band who had every element down perfectly, but still managed to make amazingly bad music. I much expect this product to be the same thing but with books.
"If you like A Deepness In The Sky, why not try An Ewok Christmas: The Novel! They're both about humans meeting strange aliens, and spaceships, and computers, and why are you giving me the finger?"
Re:Just one more errosion.... (Score:5, Insightful)
It's getting to the point that you need a 2+ year filter just to dampen the noise in the signal.
And let's give a shout out to all of the library homiez. While I'm affluent enough to afford the occasional impulse book at the store with the built-in coffee shop, I do recall many an hour of random wandering in the public library in my youth.
Re:Newspeak (Score:3, Insightful)
I'd do anything to get a decent government again.
"Be thankful we're not getting all the government we're paying for." --Will Rogers
Re:Just one more errosion.... (Score:5, Insightful)
Referees and Peer-Review Referees are invaluable because someone has to objectively assess articles for basic scientific merit and rigor. The better journals can recruit referees for each submission that truly grok the subject matter and can often work very productively with an author. Quite a number of important advances are made and pitfalls avoided because a referee insisted that a researcher cover her bases before submission. Of course, nobody claims that PR journals are bullshit-free, but they are certainly far better than un-reviewed sources like arxiv.
This function is especially important for readers in multidisciplinary fields (myself included) that often read papers on subjects in which we are not expert enough to know what constitutes sound science. When I read about some group that has extracted and crystallized some protein, I'd like to know that someone competent at the relevant techniques has scrutinized their methods because I haven't the faintest clue (I'm a physicist by training, a biophysicist by necessity).
Prestige and Selection Another important function of the journals is to select articles by importance. If a paper makes Nature or Science, that's usually a good indicator that they've made an important advance. The benefits of this selection are twofold: first, readers can keep tabs on work at the forefront without wading through lots of papers. It sounds lazy, but most of us cannot read every paper that is published and are quite glad to outsource some filtering to the journals.
Secondly, it allows authors to demonstrate to people outside their immediate field what caliber work they've done. Even among people in the same department, it's not immediately clear what qualifies as a breakthrough work (as opposed to incremental work, which I don't trash in the least bit, but it's not really the same hat) -- prestigious journal cites are a good substitute, especially when the alternative is to either become an expert in the field or find one and ask.
Review Articles Most journals have an in-house staff to write articles reviewing the state of a particular field/technique/whatever. This is also an invaluable services because sometimes one needs a broad, textbook-level summary instead of a large number of discrete, deep papers on a topic. Given that science is done in small, insular little bits, it's natural that there is room for someone to aggregate and summarize those bits and put them into a larger perspective.
Editing Another thankless job (the snarky comments about the /. eds belie the fact that editing is hard work). Dupes are weeded out and researchers with poor language skills (especially when writing in an adopted language) are given help communicating their ideas. Confusing or unclear language is massaged back into form, figures are well-presented and well-labeled, text is formatted to be easy on the eyes, references are given in a standard form. These things count more than most /.ers realize (Knuth was on to something guys . . )
Access Brutal honesty, we don't really care about the access restrictions. Every university has license to pretty much all the major journals. We can get them from wherever with a quick login and so can everyone we know. Sorry, but that's the truth.
Re:Just one more errosion.... (Score:2, Insightful)
What really hits a nerve with me is why the scientific community hasn't opened up all their journals for others to read. I imagine many retired and amateur scientists, engineers, hobbyists, etc, would have a lot of insight into many engineering and scientific problems and also make many discoveries as well.
I like your spirit and agree that there's a lot of really smart, creative people who aren't scientists. However...
One of the dispiriting things about science is how specialized most subjects have gotten. If you're not an expert in a field, its almost impossible to do anything. Even being an expert in a closely related field often isn't good enough. I don't think this is anyone's fault, its just the natural course of development. So I think the days of ameteurs accomplishing very much are behind us in a great many fields.
Another issue is that the people who fund scientists often aren't sufficiently literate to distinguish good science from phony science. Scientists are threatened professionally by people who peddle counterfeit knowledge, which has an effect on their fields similar to the effect a flood of counterfeit currency would have on an economy. So they try to protect themselves by controlling the validation of information and the supply of scientists. This legitimate desire is of course twisted by other less honorable inclinations. But there's a legitimate motivation there also.
Re:Just one more errosion.... (Score:5, Insightful)
I dunno, man. Pretty much every point you covered is Wiki-able.
Re:Just one more errosion.... (Score:4, Insightful)
...will encourage people to stay with the safety of yet another rehash of something they've already read.
Like most people since mass printing became possible. There are many authors whos work would give you great satisfaction, but who you will never read. Perhaps by at least giving people a good selection of tailored recomendation; the quality of that selection could hopefully improve.
The span of taste is wide and varied. More so than what any bookstore could provide (unless it is online). However when you take things online you encounter another problem; there is truly a vast (and growing) number of books avalible for purchase; trying to create a system for automated recomendation is a logical goal. Even if a system like that doesn't encourage reading things outside your established field of interest. If you arrive at a point where you need something different, a good system should be able to let you browse the top sellers, best reviewed and established classics of any genre. I have no doubt that after various tries, failures and breakthroughs, and as technology improves; consumers of litterature will be given a good online, digital tool for searching through databases and lists of material they might enjoy.
Re:Just one more errosion.... (Score:4, Insightful)
...they don't have to struggle to learn something...
Not even mensa is that arrogant. If it's easy to learn and/or comes via the ether, it's probably trivial. Intelligent people have to work every bit as hard, we expect a whole lot more from them.
Historic records, yes. (Score:3, Insightful)
Other old journals will likewise have a lot of valuable information in them. Archaeologists discover a lot through searching their own journals, discovering lost and forgotten reports of discoveries. Mathematicians routinely publish in arcane and super-obscure journals, making what is known far more extensive than what is known to be known.
Re:Just one more errosion.... (Score:2, Insightful)
Or wisdom, for that matter.
What about insight?
Re:I'll believe it when I see it (Score:2, Insightful)
Exactly ... which is why I read the summary as "Fast-talking kid talks fools out of their money."
Re:Just one more errosion.... (Score:5, Insightful)
Much better to blame the researchers for not publishing in a more open medium. They're the ones who might actually change their habits, after all.
Re:Just one more errosion.... (Score:2, Insightful)
That is fascinating - somebody came up with another way to dig patterns in mountains of data thus creating even more data to dig into and people claim it is intelligence, wisdom or knowledge and that everything changed. It is true of course. One big change between now and then (whenever that would be) is that today any ignorant connected to internet and equipped in basic reading skills is able to claim he posses all the knowledge of the world. Sadly the fact that more people have more and easier access to any information has changed one thing: they do not have to think for themselves they can read the answers on the web. In a sense nothing has changed though - reading with understanding is limited to the very few intelligent knowledgeable and (even bigger rarity) wise people. As for author of software in question - maybe knowledgeable fits him to some extent.
Re:Just one more errosion.... (Score:4, Insightful)
Things become easier to learn when you have more context. If I tell someone about a chemical reaction, they may struggle to remember or understand it. If they are a chemist or have at least a grounding in the subject, they'll be able to slot the knowledge in with associated information and thus understand it and recall it more easily. An "intelligent" person has a wider array of contexts or the ability to quickly find an appropriate context. I am fairly intelligent, but I still don't recall who scored what goal in what match the way some people do. These people absorb such information because they have context as much as for any other reason.
Re:Just one more errosion.... (Score:3, Insightful)
Charging people to read the journal creates only a false sense of prestige. Genuine prestige arises from the quality of the articles in the journal.
Quick addendum, I don't think you understood my meaning. The prestige comes from the quality of the articles that you've written, not acceptance into an elite journal. The point is that the elite journal provides a very useful proxy for assessing the quality of the work. A researcher's publication history provides a very good measure of the quality and focus of his work which would otherwise be difficult to gauge (it would require become expert in the field(s), reading all the articles and then judging their merit -- way too much work).
Re:Just one more errosion.... (Score:2, Insightful)
If you wish to spend your nights reading information from 2+ years ago, that is your problem. The rest of us want today's information, and now.
First: there is absolutely nothing wrong with reading information that is 2+ years old. Information itself does not become outdated. News may, but that's only a small portion of all available information. Unless we get some stunning development in calculus, for example, a student in 10 years will be just as well off reading a calculus book published today as a calculus book published 9 years from now. Furthermore, not all books are nonfiction books. Fiction is just as good whenever you read it... or are you saying that all those poor saps who still enjoy Shakespeare's work are just deluding themselves?
Second: as useful as the Internet is for rapid dissemination of information, printed media still has its place. It's generally far easier to organize and reference a personal-sized collection of books than it would be with electronic versions. Large collections are easier to manage electronically, but the average individual's collection is probably better off on dead trees. Books have the advantage of being more tactilely pleasing, as well: it feels a lot nicer to turn physical pages than to scroll down with your mouse wheel.
There's a place for both media; it isn't as simple as you paint it, where the enlightened use electronic media, and only Luddites use print.
Re:Just one more errosion.... (Score:3, Insightful)
I hear you, and I know it's difficult. If it were easy, we wouldn't be talking about it, it would be done already.
I'm just saying that the change must come from the researchers, not the journals. The traditional journals have nothing to gain and everything to lose from going to a more open system, so looking to them for change is the wrong thing. The researchers are ultimately who decides what's reputable and what's not. It will surely take a long time, but if it's going to happen at all then that's where the change will come from.