Google To Digitize Much of Harvard's Library 296
FJCsar writes "According to an e-mail sent today to Harvard students, Google will collaborate with Harvard's libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University's extensive library system, which is second only to the Library of Congress in the number of volumes it contains. Google will provide online access to the full text of those works that are in the public domain. In related agreements, Google will launch similar projects with Oxford, Stanford, the University of Michigan, and the New York Public Library. As of 9 am on December 14, a FAQ detailing the Harvard pilot program with Google will be available at hul.harvard.edu."
Will it be like google scholar? (Score:5, Interesting)
I think this is a great start, There's incredible profit here too, universities spend millions for catalogue systems. If I could use one interface to search for books, chapters, and articles on a subject, I could spend more time actually learning, and less time looking at the same damn "no results" page on GeoWeb. Grrrr.
The Fight against Plagiarism (Score:5, Interesting)
Re:Will it be like google scholar? (Score:2, Interesting)
-Stormy
Re:Are these volumes stored as text or pictures? (Score:4, Interesting)
Re:Will it be like google scholar? (Score:5, Interesting)
Or finding that perfect article in the MLA database, only to find out that nobody in Canada subscribes to the journal, nor does anybody have the journal on fulltext. I'd rather have a more comprehensive fulltext database in plaintext rather than digitalised copies of everything anyway - makes searching a hellova lot easier.
How will the books be scanned? (Score:2, Interesting)
Re:Flipside: The false positive problem (Score:2, Interesting)
Using work of other people in academic work is not only possible, but greatly encouraged. Just make sure that it is very clear what comes from whom.
In many ways, science is done exactly as Open Source software. Take what you need, modify and improve it where appropriate, and make sure you give full credit where due.
As a teacher, I have given full points to a paper that has hardly any text of their own, as long as they are properly referenced, and used together to make a valid point, not made by any of the sources.
So I do not think students should bother staying below the rarad. Just reference everything,and voila, you are doing science
It's about Time! (Score:2, Interesting)
Someone hurry up with nanostorage so I can store the entire content of human knowledge on a postage stamp (with nanosecond seek time and gigabyte transfer speeds, of course)
Mailing Lists (Score:2, Interesting)
Re:Nice! (Score:1, Interesting)
Re:University of California is anti-digital (Score:3, Interesting)
Ever tried a Freedom of Information Act (FOIA) request? Strange as it may seem, that apparently works in the State of Washington.
Re:U of Michigan (Score:3, Interesting)
The size of the U-M undertaking is staggering. It involves the use of new technology developed by Google that greatly speeds the digitizing process. Without that technology -- which Google won't discuss in detail -- the task would be impossible, says John Wilkin, the U-M associate librarian who is heading the project.
"Going as fast as we can with the traditional means of doing this, it would take us about 1,600 years to do all 7 million volumes," he said. "Google will do it in six years."
Under the agreement, the library will get a digital copy of every book scanned. With those copies, the library can prepare special research projects, virtual exhibitions and more relevant scholarly and academic material for its students and faculty.
"If we were to do this job ourselves, it would probably cost us $600 million," Wilkin said. "That's just the human cost of preparing the material for scanning, packing it up and sending it out to vendors and then quality-control checking of the results. This is easily a billion-dollar effort."
Items will start appearing in 2005 with completion predicted for 2010. Can you imagine how many libraries there are out there? The information that could be gathered seems endless. I'm guessing they'll come up with a good way to detect duplicates in future libraries, but as anyone who has wandered through a University library knows there are a LOT of shady books that seem like they haven't been widely published and there are a LOT of things that were self published by academics in the University itself (theses, postdoc research, etc).
Re:Will it be like google scholar? (Score:3, Interesting)
Why journals are expensive. (Score:5, Interesting)
No; the reason there are so few copies is there are so few people who want to read specialized journals. And the small audience only accounts for a small part of what many academic journals charge.
No; the problem is not overhead costs or small audiences. The problem is that the owners of much of that kind of content are greedy bastards. There is no reason for the outrageous price of some journals. Some scientific journal subscriptions are in the tens of thousands; even many liberal arts journals are far from cheap. And if you want to copy an article for your students to buy at kinkos, expect them to pay 35 cents a page or more for the copyrights alone.
And many of them are worse than the RIAA in terms of access to content electronically. Journal articles are included in databases sold to some universities You can read articles in some databases but only by loading a .gif of every page one at a time. No copy and paste, no text access at all. So much technology going into preventing the thing from being copied that the online version is actually less useful than the dead tree version rotting on the shelf.
I think this is a great move by Google and Harvard, and I like the idea behind google scholar, but I expect this kind of work to be resisted by many of journals and professional organizations, to the extent that they have in a say in it. This will be a huge boon in terms of the availability of public domain resources, but unfortunately outdated perspectives on intellectual property are likely to hold back real progress for something really useful to scholars in a systematic way. At least until those perspectives change significantly.
Re:Oxford University gets every UK book published (Score:2, Interesting)
Re:Will it be like google scholar? (Score:3, Interesting)
Good quality search engines have lots of qualities that Google lacks. You could search for two words located within 3 words of each other. You could search for these two words within 3 words of each other while two other words don't occur within 6 words of each other. Indexes are gennerally well-thought-out and vocabularies are sometimes controlled.
Google allows many of these features, but they're cumbersome to use. If I ran two searches and I want to merge the results I have to be copying down everything I did, and try to concoct some kind of advanced search which combines the two sets of parameters. In a decent professional search tool you just ask it to return "set 1 or set 2" - giving you a set 3 that has any item that appeared in either. This is powerful and easy to use, and there is no comparison with google.
Don't get me wrong, I'm glad Google is going into this business. I no longer have free access to just browse the literature any time I feel like it, and this tool would provide that. I just don't think that they'll close down the commercial operations anytime soon.
Personally, I think that all articles written using federal funding should be released into the public domain. The NIH could sponsor journals if none of the commercial journals are willing to publish works that have no copyright. If my tax dollars were used to pay for a study on bumblebee migration patterns, then I should be able to thumb through the report whether or not some bureaucrat thinks that I have a need to know the results. And doing so should not require a trip to some non-public library halfway around the country...
Re:Nice! (Score:0, Interesting)