Proposal: Put Library of Congress' Contents Online 394
Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."
Er (Score:5, Insightful)
Can't have that, now can we?
Re:Er (Score:2)
I can't see how this could be done with any kind of public access to most of the content.
Re:Er (Score:5, Insightful)
Re:Er (Score:5, Insightful)
And that's exactly the biggest mistake people keep making; analogies don't work. The stuff we are dealing with is *new*. A library != Internet. There is no analogy.
I'm not saying that I have a solution to any of this, but I think the first thing people will have to realize is that things have changed in a dramatic way. The traditional way of thinking about IP (or really, information) no longer works.
There is no simple answer to any of this, and it makes no sense to come up with analogies and try to justify or make judgement based on that.
Fact of the matter is, all of a sudden it is possible for people to view/copy information pretty much instantly. What we need to realize is that _we_ are the ones that can/will put together the foundation of how to deal with this. No current laws really are suitable. Look at the mess with P2P networks and the music industry. Surely P2P networks _should_ be perfectly legal, but on the other hand if copying music would become so easy that you could listen to any song you'd like, at any given time without paying for it, it's hard to imagine how artists will be paid (and please don't give me the "they'll have to do live performances to make money" bs).
The people that will be able to figure out what the _real_ answers are to these issues are the ones that will do really well. Think about it.
Re:Er (Score:2)
BTW, check out half.com (shipping is a bit better, usually) or your local goodwill. I don't go there much, but I've found hardcovers that were out for less than a week on the shelves in mine for around $3. (not quite sure how this works, but whatever)
Missing something? (Score:5, Funny)
1. walk in and pick up a book
2. strike the author's name from it and replace it with your own
3. replace the copyright notice with your own
4. Make one thousand perfect copies
5. Offer it for sale, start taking orders, and PROFIT!
I could easily do that on the internet.
Re:Missing something? (Score:2)
Re:Missing something? (Score:2)
Re:Missing something? (Score:3, Insightful)
Re:Missing something? (Score:2)
Ok I'll stop now. I do see your point. It might work, but I think the "visionary" skimped on the details a little bit.
Re:Er (Score:3, Informative)
We're specifically talking about the Library of Congress, which has millions of books, not your local library with maybe 100k or so (I rememeber my university had about 800k books, probably a million by now). The idea is not to give access to the NYT bestsellers, but rare books that you would have a hard time finding anywhere else.
Re:Er (Score:5, Insightful)
The chief benefit? Even if the original is lost or destroyed, the digital version lives on - a big issue, assuming that ANY item ever enters the public domain from now on, the way that they were supposed to. Hell, I'd lay out money for a copy of the Library of Congress on a set of blue-ray DVDs, and so would many large corporations (those that still have research labs, that is), universities and colleges, as well as other organizations and governmental entities around the world.
Re:Er (Score:2, Informative)
I can't see publishers liking the idea of an online Library of Congress at all. Viewers would be able to make their own e-books at a whim. Not that *I* would mind, but . . .
Re:Er (Score:5, Insightful)
See, the ancient world had many items of great wisdom, and many of the only copies of these works were contained there. The burning of the great library was the end for countless such works.
Today, however, our knowledge is much more widely spread. We all owe a tremendous debt to Gutenburg, for his printing press (removable type press, 1436) for making this possible.
It's quite arguable that the dawn of the renaissance stemmed not from Galileo, or Kepler, but from the widespread nature of books in general after the removable type printing press made this possible.
How many of these works are unique or very rare? I'd consider that a large percentage of these works fall into this category - in which, it would be a wonderful thing to build in some redundancy into the preservation of not only these works, but the wisdom, insight, and humor contained therein!
Warm up the scanner, says I!
Re:Er (Score:5, Insightful)
Assuming a large sum of money is spent maintaining the digital versions. Computers lose and destroy data, even good computers fail. So it would require good backups done on a regular basis. File formats tend to change too.
Re:Er (Score:3, Funny)
The ghost of Sonny Bono with haunt you forever, being sure that you know nothing will ever reach the public domain again...
Re:Er (Score:2, Troll)
Who said anything about "free"?
Although this would potentially take dialog about the public domain out of obscurity and into the LoC mainstream, and the LoC does have some influence in the copyright debate. Certainly once the data exists, anywhere, it is going to be harder to make the argument that we should just throw it away, no matter what the reason.
Re:Er (Score:2)
Merely that the contents that could be made free would be, the rest would be up for a price (based on what you need) -- something like a huge searchable database, where you pay for what you access.
I can't bother looking for it, but I think it's more likely that is the case. Especially considering the point that you brought up.
Re:Er (Score:5, Interesting)
Creating primarily for money is shortsighted when a work has the chance to impact the larger culture. Just look at Michael Moore (ooh, isn't he ugly, but that's not the point), he's more interested in people seeing and being influenced by his movies than in getting richer off them. Enough money to be comfortable is great, but then, barriers to free movement of ideas should be relaxed.
Re:Er (Score:5, Interesting)
Can't have that, now can we?
No, we can't... it not be fair to lots of people whose copyrights haven't yet lapsed.
But scanning the materials is _still_ a good idea. It allows for automated OCR that allows searching for text _within_ a book (like A9.com does, and as Google plans to do.) The difference is that all books published in the US could be searched.
It would also make this scenario possible:
Since this process is handled by people trained to respect copyright (i.e. the librarians), it is a win-win for everyone.
Pilot Program (Score:3, Insightful)
No, we can't... it not be fair to lots of people whose copyrights haven't yet lapsed.
Let us scan only things for which the copyright has lapsed. This has several advantages.
Re:Er (Score:3, Interesting)
About funding being scarce: after initial seed funding by the government, a library should easily be able to fund infras
Re:Er (Score:3, Interesting)
> with government-subsidized pseudo-businesses.
Sure they do. But I imagine Kinkos and B&N would have access to the same LoC book database, and would be able to print books for purchase too (otherwise it would be unfair to them). This just puts libraries on an even footing - libraries that don't want to sell books to the public could stay that way.
> If you want information or to read a book, go to the library.
> If you wan
Scupper copyrights (Score:2, Interesting)
I would say, scupper copyrights for all volumes owned by LoC.Scan and put every volume on the internet.
Within few years we would witness a Renaissance of sorts once again in human knowledge and education.
Can't do that. (Score:5, Funny)
Re:Can't do that. (Score:3, Interesting)
Re:Can't do that. (Score:2)
Information is a commodity just like any other good or service. I can only imagine your surprize when you realize that the internet hasn't changed that fact.
Re:Can't do that. (Score:2, Interesting)
Re:Can't do that. (Score:3, Insightful)
death+70 years would actually be about 4 generations (if you include the author as the first).
Re:Can't do that. (Score:3, Insightful)
Their children can go out and get their own damned jobs. They would then be making a productive contribution to the economy.
My grandpa was a farmer who died over 50 years ago. Since I don't get to collect royalties on the corn he grew in the 1930s, I've had to work to produce my own income. Imagine that.
Re:Can't do that-Inheritance. (Score:5, Insightful)
I might inherit a portion of his farm. But that's a result of money that he saved at the time. I do not collect royalties on the *work* that he did 70 years ago.
If an author or musician wants to leave an inheritance, then they should save the money they make during a reasonable copyright term, and give that to their children. They can leave their typewriters, musical instruments, and other tools of the trade (analagous to a farm) as well.
They might have to actually forego a blowing everything they earn on cocaine and refrain from signing away most of their income on bad contracts to actually achieve this, but then so do the rest of us.
Re:Can't do that. (Score:5, Interesting)
You must mean currently. But we all know that as soon as anything major (like Steamboat Willy) comes close to coming out of copyright, we'll see Congress extend the term of copyright yet again, thanks to 'encouragement' from Disney.
Copyright terms are nigh on infinite in fact, if not in law.
Re:Can't do that. (Score:3, Funny)
The internet has stripped away the convenient medium(s) that used to contain an inherently scarce message that could physically command the price you asked for. The new reality of the situation is that either you think DRM + DMCA can and should be used to keep doing things the old way, by keeping a decades-old instance of information artificially scarce, or you think-- like millions already do --that information is cheap, and the value lies i
Re:Can't do that. (Score:2)
Re:Can't do that. (Score:5, Funny)
Hey its still a finite time
- Walt Disney
Re:Can't do that. (Score:2)
well, at least to those who can read English (Score:3, Funny)
So yes, it would benefit society as a whole.
Grump.
I disagree (Score:2)
So yes, it would benefit society as a whole.
Subjecting the world to the Babelfish translator would actually detract from knowledge considering the horrible linguistic bastardizations that people would then take as fact.
Re:I disagree (Score:2)
Re:well, at least to those who can read English (Score:2)
Or a speech synthesizer (assuming the text is available as text and not images) such as festival, if your vision isn't very good - also something you can't do with the dead tree version.
-jim
obligatory babelfish translation (Score:2)
With that who can't, they can reproduce and stick inside their language teacher.
For once (Score:2, Funny)
Storage (Score:5, Funny)
Re:Storage (Score:5, Funny)
Re:Storage (Score:4, Funny)
Re:Storage (Score:3, Funny)
Re:Storage (Score:5, Funny)
Re:Storage (Score:4, Informative)
Those first two estimates are based on the text content alone. If the graphical contents of those books were rendered into digital format. The third one assumes maps, photographs, sound recordings, etc.
Re:Storage (Score:2)
Err
Re:Storage (Score:3, Interesting)
Heh. Whichever it turns out to be, the LoC, being yet another part of the federal government, will probably make it available for viewing/downloading as a single PDF file.
PDF sucks [useit.com].
Re:Storage (Score:3, Funny)
Re:Storage (Score:2)
Then multiply that by the number of pages in a book. I don't know what the average is but just taking 200 pages as a WAG that would mean 1 MB per book. 26 million books would then be 26 million MB == 26 TB.
If on the other hand the pages were stored as images, I think you can safely assume a 10-20x increase. If each page was say a 100kb JPEG, then that would
We need to get our priorities straight (Score:5, Insightful)
Re:We need to get our priorities straight (Score:5, Funny)
I'm sorry, I don't get it. How does your proposal bomb anybody?
Are you suggesting we should bomb libraries?
I mean, I see libraries, I see money, but I'm missing the bombs.
Tell you what, rewrite your proposal with bombs and maybe some cool submunitions and make sure they're Furin libraries, and we'll talk.
Units?! (Score:2, Funny)
Someone, please.. how much is that in LOC?
Re:Units?! (Score:2)
One of the More interesting projects (Score:5, Interesting)
Re:One of the More interesting projects (Score:3, Informative)
All material is copyrighted at the instant of creation. All of it. You write a love letter to your girlfriend and it's copyrighted. It's all copyrighted! Beyond that, you're requiring them to *present* copies. I'm assuming this is to the LoC.
You could make a case for this when a copyright is *registered*, but please don't make a blanket statement like that without first engaging brain.
Only English? (Score:4, Informative)
Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.
Re:Only English? (Score:2, Interesting)
Their collections policy statement [loc.gov] states that they only keep material specific to their very broad mission statement. This means that they will not keep a copy of a laundry list they received throught the copyright office.
Re:Only English? (Score:2)
Re:Only English? (Score:5, Informative)
[T]he Library assumed a role as a legal repository to guarantee copyright protection. All authors seeking American copyright had to submit two copies of the work to the Library. This requirement is no longer enforced, but copies of many books published in the US still arrive at the Library regularly.
Damn trolls.
Re:Damn YOU, idiot. (Score:2)
If you had actually read my post, you would have realized that I was referring to any non-english works (foreign written or US written) that specifically sought out US copyright protection. Because, you know, it's not like foreign authors never register their copyrights in other countries.
<sarcasm>You may now begin hailing my amazing wisdom and knowledge.</sarcasm> OR, you could just be a nice guy and
Re:Eat a bag of dick, moron. (Score:2)
Look up the Berne convention [wikipedia.org]. The US didn't join it until 1989. That leaves over a hundred years of work that had to be explicitly copyrighted in the US.
And FYI, non-English works account for less than 2% of the total volume of the Library of Congress
Which is neither here nor there. And by making this statement, you are agreeing with me that the LOC has a substantial selection of non-english works.
unlike your delusional and paranoid rantings and ravi
Homeland Security Savings (Score:3, Funny)
It would probably pay for itself too since FBI agents would no longer have to travel to libraries to secretly gather records of who borrowed what. They can just use Carnivore to do it instead.
Only 260 Million? (Score:2)
Re:Agreed, what about labour? (Score:2, Informative)
Ametrica! (Score:5, Funny)
1 Library of Congress = $260M
And the 2004 US Federal budget can be spec'd at 0.000243754522 [google.com] LoC:s (Libraries of Congress per second).
Scanning is nice but.... (Score:2)
"Brewster Kahle's idea is to scan as many books as possible and put them online so everyone has access to that huge amount of knowledge."
The plan IS to put it online, after all...
search me (Score:2)
The Whole World (Score:2)
How about the whole world who can find any online translation service that goes from English to Local Dialect.
The theory of everything (Score:3, Funny)
Only the first step (Score:4, Insightful)
Re:Only the first step (Score:2)
Halfbaked (Score:3, Funny)
a real application for internet2 (Score:4, Interesting)
I'm not aware of any PIAA for publishers, but somebody is going to have a problem with this. And by the time this actually happens, I bet there will be an Internet4 that can do it all in 20ms.
More dotcom hype... (Score:3, Insightful)
If this is such a wonderful idea why doesn't he get a bunch of artists, musicians and writers to donate their own work to this project and actually prove the concept works?
I'm tired of all the rhetoric about business models failing and how the web is going to transform the way society learns, works, and entertains themselves. The dotcom era should have taught these so called visionaries one thing, you actually have to have a business plan before you can transform business models.
If these business models are so full of potential he should start one, with his own intellectual property, and prove that the old economy intellectual property businesses they are extinct. If his ideas work then the dinosaurs of the MPAA and RIAA will either have to adapt to the new economy or die. Forcing them to risk their entire business on a gamble like this is wrong from any perspective.
Re:More dotcom hype... (Score:5, Insightful)
Work for who? I think you are still confused from the dotcom era still. You must be thinking that "change society and business" means that scanning the entire LoC can make someone money (advertising??)
The important part in this case is the changing society part of the statement, which is what the vast potential of the net is capable of doing. It won't help you make money based on a bad idea (in fact, it may only help you lose money faster!) but it does have the potential to change the way a society views and deals with information.
Right now there is a vast amount of knowledge in the LoC that is effectively out of the ordinary citizen's hands. That is not how it should be. If knowledge is power, there is a storehouse of power waiting to be unleashsed by giving everyone access to what is being stockpiled. It won't happen over night, or in a few years, but eventually it will have a ripple effect. Historians lament the loss of the Great Library of Alexandria, but what difference would it have made if only a few could actually use the information that was contained?
Fuzzy math on storage reqs (Score:4, Informative)
That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.
What, you want me to starve to death? (Score:3, Funny)
Great (Score:3, Funny)
Now we'll be able to test their notions!
Library of Congress Transfer Rates (Score:3, Funny)
this makes news? (Score:3, Insightful)
So the only surprise to me is that were just now hearing a proposal to do this??? sheesh, if i hadnt thought it so completely obvious to every netizen at those old public library terminals i wouda lost so much seep making it happen!!!
so now who's going to do it? and while its limboing through congress can we just put together a consortium to visit thie library we aready own with our digital camera's and OCR the thing into existence... how many of us woud need to donate our gmail 1g accounts to store it all?
Now I have a real problem... (Score:3, Funny)
"260 million" (Score:2, Flamebait)
Human's Book Pool (Score:4, Insightful)
Not only the Library of Congress of the Unites States of America, we should also scan every big library in the world to create a pool of human work to freely share and preserve.
hmmm.... (Score:3, Funny)
So that leaves out most Americans. Thanks from the rest of the world!
(tongue firmly in cheek)
Better Access for Everyone (Score:3, Interesting)
What a cool idea and, even "if" the dollar estimate is too low, who cares? $260M is chump change for our gov't.
Right now, the only way to access the stuff in LoC is to go there in person. Anyone can do it but you have to travel to WashDC and pass through security and so forth to get into the LoC public reading room. Then you have to ask the librarian to pretty-please bring you the book that you want.
Now imagine that you can access any item in the LoC by simply entering the building and using a public kiosk with a browser. LoC's software would only permit use within the copyright so that is OK. But you don't have to mess with as much security because LoC isn't handing over the physical book.
Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!
My opinion... skip the buy on the next couple of cruise missiles and digitize LoC's books instead.
Oh yeah, before I forget, LoC already has tons of seriously neat stuff online. My favorite is this collection [loc.gov] of tons photos from Russia. These were taken between about 1907 and 1915! I don't know about you, but I never dreamed that I would see color photos that are almost 100 years old.
Cheers,
-- Art Z.
Project Gutenberg (Score:3, Interesting)
Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!
That's the idea of Project Gutenberg [gutenberg.net]. It's been around for quite some time now, and everybody is free to join their distributed proofreading network!
Does the money estimate add up? (Score:3, Interesting)
They tell us that there are:
4.5 million maps.
14 million 'images'
So in round numbers, let's say there are 50 million books and 50 million newspapers, periodicals, comic books, etc.
$260 million to scan all that stuff? $2.60 per book or newspaper? That seems a little unlikely. The book would have to be carried off the shelf to the scanning machine, mounted in the machine (which would clearly have to turn the pages and scan and index them 100% automatically), the title and such would probably have to be typed in manually, then the book carried back to the shelf and placed back in the correct place.
I find it hard to believe that a machine for scanning newspapers could be devised that could turn the pages automatically...but even without that, the project is still possible. At minimum wage, you'd need to pay people to scan a complete newspaper in maybe 20 minutes.
Then some significant fraction of the collection would probably be too fragile for the automatic page turning machines...the cost of hand-scanning those would be FAR more than the bulk of the books. Some books would be *so* fragile and valuable that scanning them would be a considerable expense.
Then there is the cost of the storage media. Suppose those 100 million books and newspapers had just 100 pages each on average. To get a readable image of the page you're going to need to scan at maybe 2000 x 2000 resolution. So we'll have something like 10^16 pixels, let's be generous and allow 100:1 compression ratios - and one byte per pixel. So we have 1000 terabytes. That's a lot - but to put it in context, it's only about a fifth of the amount
that Google is estimated to have in their main cluster. Goggle spent $250 mil to buy that - so maybe only 20% of the LOC's budget needs to be for storage.
OCR'ing and indexing all that data would be an incredibly valuable thing - the extra storage is trivial and the cost can be low if you aren't in a hurry to get the project done. Just stick a few thousand PC's in a room and wait!
Dunno - $260 mil sounds like a low end estimate to me - but it seems do-able.
www.loc.gov (Score:4, Informative)
If you go to the LOC's site, you'll notice American Memory on the front page.
American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.
This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.
The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.
Re:If Bill Gates (Score:2, Insightful)
>contributive, he would fund this.
Yeah, not like funding the B&M Gates foundation is doing anything worthwhile with all that immunization, AIDS research and anti-poverty work.
Darned, useless Microsoft profits. Helping people. Imagine that!
Re:If Bill Gates -- Not So Fast! (Score:2, Interesting)
I would reserve that honor for Andrew Carnegie, who basically sold his empire for $485M and spent the rest of his life giving away all his money to good causes. Bill Gates is a far cry from that so far.
Just as a point of /. interest, what is the conversion factor between ACMs (Andrew Carnegie Millions) and BGBs (Bill Gates Billions)?
Re:If Bill Gates (Score:2, Insightful)
Bill Gates does plenty of worthy things with the PHAT $$$ h
Re:I can see it now.... (Score:2)
Re:No habla francais. (Score:2)
I said could understand - not necessarily write.
DAMN YOU WORD PROCESSING FOR MAKING IT SO EASY TO REVISE A STATEMENT!
Re:As an author... (Score:2)
Re:As an author... (Score:3, Insightful)
A 0-rated post noted that this type of free access is a big deal to people who make an honest living publishing their creations.
This invokes a big, important question. The rise and flourish of the information age has and will continue to provide unbelievable freedom of access to unbelievable amounts of information. Where and how do we draw the line between the freedom of the consumers and the rights of the creators?
I'm a software developer who loves movies: I'm a creator and a consumer, so I see both
I have to ask... (Score:5, Insightful)
Knowledge, even the limited knowledge of an author, does not exist in a vacuum. You read, you learn, you practice, then you create. You could not have done this without the beneficence of others who aren't making a dime off the education they provided you.
To unleash the vast amounts of knowledge stored up in the LOC to the world would be one of the single best things this country could do for mankind. One book, one reader my hairy ass. Why not open the floodgates so everyone can benefit?
I understand the motivation of monetary incentives, but I also know a lot of great authors who died penniless. And they were at least brave enough to sign their names to their ideas.
Re:Government Spending (Score:4, Insightful)
The LOC doesn't just contain nice black and white typed texts. There are hand written documents in organic inks on animal hide and poorly constructed paper. There are paintings in every medium you can imagine and there are sound recordings on just about every media ever used: wax tubes, glass disks, wire spools, open reel, 8-track, cassette, CD, DVD, etc.
Each of these things needs to be digitized, categorized, indexed and offered in a searchable manner. A printed page, for example, will need to be photographed and transcribed/OCRed.
Much of the work needs to be done on delicate objects that may be destroyed if not handled correctly. If you were to play a wax recording disk with too much pressure, or under the wrong environmental conditions, the disk would shatter in to an irreparable pile of small bits.
What formats will you store them in? What formats will you make them available in?