Ask Slashdot: State-of-the-Art In Amateur Book Scanning? 122
An anonymous reader writes: I have a shelf full of books and other book-like things ranging from old to very old that I would like to turn into PDFs (or other similarly portable format), and have been on a slow-burn quest for the right hardware and method to do so on a budget. These are mostly sentimental — things handed down over generations, and they include family bibles, notebooks, and photo albums, as well as some conventional — published, bound — books from the late 19th and early 20th Century. None of them are especially valuable as antiques, as far as I know; my goals in preserving them are a) to make them available to other people in my family who are into genealogy or just nostalgia, and b) so I can read some of those old, interesting books (et cetera) without endangering them any more than it takes to scan them once. I was intrigued by the (funded, but not yet available) scanner mentioned earlier this year on Slashdot; it seems to do a lot of things right, but like any crowdfunded project, the proof is in the pudding, and the pudding hasn't yet arrived. It's also cheap, and that fits my household budget. What methods and hardware are you using to scan old documents? Any tips you have from a similar project, with regard to hardware, treatment of the materials being scanned, light sources, file formats, clean-up and editing tools, file-size-vs-resolution tradeoffs? In the end, I'm likely to err toward high-resolution scans, since they can be knocked down to size later if need be, but I'd be interested in hearing about what tradeoffs you've found to work for you.
One big question that I'd like to have answered: Is there stand-alone Free / Open Source software, or even just cheap software (I am mostly on Linux, by choice, but won't leap onto a sword to keep my Free Software purity) that makes for easy correction of the distortion introduced by camera-based imaging? If I could easily uncurl and keystone-correct pages, then a lot of input methods (even my phone) are suddenly much more attractive. My old Casio camera could do this 10 years ago, but I haven't found a free software desktop utility that lets me turn photos into nicely squared-up pages.
One big question that I'd like to have answered: Is there stand-alone Free / Open Source software, or even just cheap software (I am mostly on Linux, by choice, but won't leap onto a sword to keep my Free Software purity) that makes for easy correction of the distortion introduced by camera-based imaging? If I could easily uncurl and keystone-correct pages, then a lot of input methods (even my phone) are suddenly much more attractive. My old Casio camera could do this 10 years ago, but I haven't found a free software desktop utility that lets me turn photos into nicely squared-up pages.
Don't! (Score:1, Funny)
Scanning is stealing from GOD
Re: (Score:2, Funny)
Why is that being modded down? The copyright gods demand blood. This person is in violation.. Just read that little notice on the first or second page... "All rights reserved. No part of this book may be reproduced in any form..."
Re: (Score:3, Informative)
Why is that being modded down? The copyright gods demand blood. This person is in violation.. Just read that little notice on the first or second page... "All rights reserved. No part of this book may be reproduced in any form..."
Nobody modded him down. He's from Gay Nigger Association of America. I'm not joking, or trolling, or trying to be racist, that's the actual name of the group he represents, more details about them here:
https://en.wikipedia.org/wiki/... [wikipedia.org]
He gets down modded so much that all of his posts are -1 right out the gate.
Re: (Score:2)
It says that, but it's actually a lie. In the US, publishers cannot restrict fair use.
Re: (Score:2)
Remember though that "fair use" (in the US) is not a right, it's a defense. You're still infringing, you're just doing so in a way that results in no damages. If you can sustain the case long enough to convince the judge that the use is indeed fair, that is.
Re: (Score:2)
Keep your books (Score:1)
Re:Keep your books (Score:5, Interesting)
I collected paperback and hard cover books for almost 5 decades. I had storage bins in the attic and garage full of them. They all went to a "friends of the library" benefit sale. About two thousand books gone freeing up space and now I have more than that amount on a hard drive and about 200 on my phone. No more dead tree books and magazines for me. I can pull up something to read any time and any place. Technology is fabulous.
Re: (Score:2)
Dead tree technology (Score:2)
I am an aficionado of dead tree technology. I find reading long documents online is very tiring. That is why I prefer dead tree technology.
Dead tree technology has many benefits:
It never needs to be recharged.
It is very portable. Just toss it into your bag. No cords or power supply.
It is very easy to share with some one. Just hand the book to them.
It has a very user friendly user indexing system called "dog ear".
Simply fold a corner of a page over and you can find your place again.
It is
Re: (Score:2)
Ereader pros:
Never have to find my place
Searchable
All my books on an SD card
No light needed
Page turn with a touch
Ultimately portable
Never wears out
Change fonts type and size with ease
Read on any device
I have to carry a phone anyway
Re: (Score:1)
I can pick up a book printed 150 years ago and as long as I understand the language it was written in I can read it. Do you honestly think that in 150 years someone will be able to do the same with today's ebooks given that even the publishing industry can't agree on a standard file format?
Re: (Score:1)
To many people it doesn't matter. That book will have been filtered through the memory hole at the Ministry of Truth a dozen times before 150 years has passed.
Re: Dead tree technology (Score:2)
Re: (Score:1)
In 150 years, paper books will likely be a tiny niche market, at best, as people with sentimental and institutional attachments to paper die off and people get acclimated to whatever form of digital media consumption is most convenient at the time.
And while I don't own an 8-track player, a VCR, or a punch card reader but if I had media in
Re: (Score:2)
I don't have to carry a phone, so all the other advantages you list cease to be advantages.
Re: (Score:2)
Well then, there's an exception to every rule.
Re: (Score:2)
E.G. the next time you try to pitch something as the "the solution to every [BLAH] users needs", your audience is unlikely to believe you. While if you say "85% of users of X-DEVICE will find this useful," they" are more likely to find your presentation "credible". (The "85%" may be as crude an estimate as you like ; but you do estimate your market, no?)
"Credible" is not the same as "invest-worthy", but it's an appreciable step that way.
Re: Dead tree technology (Score:1)
Explain to the kids that books are like television for really smart people.
Re: (Score:2)
Re: (Score:2)
The IndieGogo project is just a clone of something that already exists:
http://scanners.fcpa.fujitsu.com/scansnap11/features_sv600.html
The ScanSnap SV600 looks good, but you'll have to keep something in mind: The web page for this link says, "* Maximum document scanning thickness is 30 mm (1.18 in.)"
A maximum thickness of 1.18 inches would limit the books you can scan, unless you break apart a thick book, and scan it in sections.
Re: (Score:3)
--
Project? (Score:5, Informative)
If you're looking for a project, what we use at my university library to scan some of the rarest and most delicate books on the planet, is definitely achievable at home. It's simply a table with interchangeable wedge shaped foam pieces, and a rack above with two cameras pointing down. Since the book is on a v cradle, the pages lay flat. You can change the angle and position of the cameras to point squarely at the pages. There's a pedal that will snap a picture with both cameras at once, so once you've got it set up, all you need to do is flip the pages and hit the pedal. You might need to readjust if the book is particularly thick, but that's all pretty intuitive once you're used to the setup.
Re: (Score:2, Informative)
If you're looking for a project, what we use at my university library to scan some of the rarest and most delicate books on the planet, is definitely achievable at home. It's simply a table with interchangeable wedge shaped foam pieces, and a rack above with two cameras pointing down. Since the book is on a v cradle, the pages lay flat. You can change the angle and position of the cameras to point squarely at the pages. There's a pedal that will snap a picture with both cameras at once, so once you've got it set up, all you need to do is flip the pages and hit the pedal. You might need to readjust if the book is particularly thick, but that's all pretty intuitive once you're used to the setup.
Probably an Atiz BookDrive [atiz.com]. Yes, it is possible to homebrew one; Instructables [instructables.com] has directions but it's a lot of work.
The OP's best bet, really, is to Google for "non-destructive book scanning service" and find one to do it for him professionally.
Re: (Score:1)
That's why I said "If you're looking for a project." If the OP wasn't looking for a project, then a service would be a better fit than a project. The OPs focus on DIY methodologies indicated he was looking for a project.
We have a number of commercial ones, including a few dozen BookEye units for quick scan-and-deliver jobs, and some BookDrives for more delicate work, but one of the preservation guys made a couple by hand that are still regularly used.
Re: (Score:2)
The. Very. First. Link. On. Google. http://pro.atiz.com/ [atiz.com]
Re: (Score:2, Funny)
I know some people have already criticized the obtuseness of your comment, so I'm going to try and turn this around to make it a more positive experience by saying some positive things about your contribution to the conversation.
1) You figured out what search engine to use to get more information without me having specify it, and that's pretty great.
2) You read at least some of my comment, because you picked up on key concepts like "V-Cradle", which was given that rather obvious name both by the shape of tw
diybookscanner.org forum (Score:5, Informative)
I would suggest you look here http://www.diybookscanner.org/... [diybookscanner.org]
I'm planning to do much the same thing as you myself, but I've still not decided how to do it and other things have been occupying my attention recently, so I've not kept up with developments for a year or so.
There are plenty of ideas there and suggestions for software and workflows that will do what you want .
Re: (Score:2)
The real problem isn't the hardware though, it's the multiple programs needed to process the images and get everything into a small text searchable pdf file afterwards.
To give you an idea, my workflow usually starts by importing all the left pages into Lightroom, process for things like correcting blacks and whites, keystoning, skewing, and cropping, and then I export everything as jpg files.
Re: (Score:1)
Is this primitive enough? : )
http://byfai.com/content/diy-b... [byfai.com]
Re: (Score:1)
Yes, definitely, it took me quite some time to set up the software environment. But that was a few years ago.
If anyone is interested in the story, it's here:
http://diybookscanner.org/foru... [diybookscanner.org]
The hardware setup:
http://byfai.com/content/diy-b... [byfai.com]
Camera alignment (Score:2)
If you have to correct for keystoning, your cameras aren't aligned well.
You want to use a mirror for alignment, as it allows you to verify that the camera is in the correct place -- a non-reflective target only ensures that you're pointed at the correct place.
The Czur has no platen, and therefore there will be distortion due to curved pages which would have to be corrected for. It also won't be able to image as well closer into the binding -- if you have to spread the book flat, you're going to end up dama
Re: (Score:2)
I built a primitive single-camera scanner using a cardboard box, a piece of glass, and a point-and-shoot camera and tripod I had handy, after reading that site. It was a pain lifting the glass to turn the page, and I spent a lot of time trying different lights (and locations of lights), but It worked well enough that I decided to pursue it seriously.
I started looking at building one of the better scanners using plans on that site. But after a lot of time thinking about it, and reading about the many decis
At home is as at home does (Score:3)
Re:At home is as at home does (Score:5, Funny)
Re: (Score:3)
Sit down with transcription software and read those books aloud. Done.
"Low texting then that we will require what what was asking but our answer will be okay." - Charles Dickens, A Tale of Two Cities
Re: (Score:2)
I've not heard of that one, only the Edmund Welles version.
K.I.S.S : unbind the book, scan the pages (Score:1)
Mod parent up (Score:2)
Agreed. I scanned a bunch books that way, using a commercial-grade Fujitsu scanner, capable of scanning about 60ppm - both sides. I got a little over 20,000 pages in, and I had to quit, because the work was so intense. That was more than 10 years ago, and I still haven't been able to get back to it.
There's more to scanning a book than just scanning. Between preparing the book for scanning and making sure it scanned correctly, there's a lot of work involved.
do it right (Score:3)
Is this out of your budget? Buy one, sell it on eBay it when you're done. Anything else, you'll just be wasting huge amounts of your time. [fujitsu.com]
Re: (Score:2)
Does that matter?
The pictures only have to be high enough res to reliably enable OCR, anything above that is irrelevant.
Re: (Score:2)
It might not matter if you are scanning novels to produce ebooks, but for technical works with equations you want to see the actual text layout in a pdf, and for small subscripts less than 300dpi (preferably 400) is a no-go.
Re: (Score:2)
It might not matter if you are scanning novels to produce ebooks, but for technical works with equations you want to see the actual text layout in a pdf, and for small subscripts less than 300dpi (preferably 400) is a no-go.
That's true. (Heck, I scan everything at 600.) But 1) this scanner is not less than 300dpi, and 2) 10MP is barely over 300dpi. So the grandparent post was still completely silly ;-)
Re: (Score:2)
The Fujitsu SV600 is low res compared to a modern flatbed scanner. He'd be better off using a high quality 10MP+ digital camera to take the scans and post-processing them using software that's dedicated to that kind of thing.
I see simple math is beyond the grasp of this AC ;-)
No TWAIN driver (Score:2)
--
Re: (Score:2)
We have ScanSnap scanners at work and one of the biggest pains is they do NOT support the TWAIN/ISIS driver standard.
True. You want TWAIN or ISIS, you have to move up to fi series scanners. I personally don't care about using standalone scanning software--it gets me what I want.
pdf (Score:1)
"I would like to turn into PDFs (or other similarly portable format)"
What is it about PDF files which you think makes it portable? You'd be better off with PNG format.
Re: (Score:3)
If he wants to put it in a reader he'll need to use a book format such as epub or mobi. PDF would work as well but I think there are better choices nowadays.
Re: (Score:1)
If he wants to put it in a reader he'll need to use a book format such as epub or mobi. PDF would work as well but I think there are better choices nowadays.
PDF would NOT work as well. PDFs do not scale well. You end up either having to scroll through the "pages," which distracts from the reading, or you end up trying to read stuff sized for "bigger than your tablet" in text shrunk to fit the page.
"Oh you need little teeny eyes
For reading little teeny print
Like you need little teeny hands
For milking mice!"
Re: (Score:2)
I've managed to read PDFs in an eReader but it is not the best choice obviously. I have converted all my books to ePub but magazines have pictures which complicates things. Usually I read the magazine PDFs on a laptop. A 15" screen is best for those.
Re: (Score:1)
It's kind of because I use a kindle all the time now that I mentioned the subject. Yes, they support PDF format but they're horrific and almost unusable. There's no reason I can think of to want to create them. Reading them is exactly the same as reading an overly large PNG file; you have to zoom in/out, scroll around etc. It's just a UI disaster.
The proof is not in the pudding (Score:4, Interesting)
Re: (Score:3)
I'm not much of a grammar Nazi, but I'm seeing this error everywhere now and I'm afraid it'll become the norm. The saying is, "The proof of the pudding is in the eating," which makes a lot more sense when you think about it.
Too late. It originated in the 20's and became common in the 50's.
https://en.wiktionary.org/wiki... [wiktionary.org]
Re: (Score:2)
unpaper is the GPL software for curls, etc (Score:5, Informative)
The software piece you mentioned for turning scans into nice clean rectangles exists as "unpaper". Here's one fork: https://www.flameeyes.eu/proje... [flameeyes.eu]
The people who have bothered to fork and improve unpaper probably did so because they did a project similar to yours, so you might ask them about other tips and resources.
As someone else said, while pdf is convenient for READING book, it's not a particularly great format for archiving a collection of images which you may want to convert to another format later. There are several good grayscale image fomats to choose from. To order those images into a cohesive document, perhaps with separate chapters, one could produce html via a tiny Perl or shell script. That would preserve the images in their native format for later conversion as needed in the future.
best graycale formats (Score:2)
I scan B&W pages of historical manuals I have (SunOS 1.1, not Solaris). What would you recommend for grayscale and why?
RLH
Re:best graycale formats (Score:4, Interesting)
Re: (Score:1)
What is the current status of djvu viewing software? I toyed with it a little years ago but never did anything serious with it because finding software was a hassle of some platforms, whereas pdf seems more or less ubiquitous.
easy to convert from djvu to pdf or anything (Score:2)
Although I don't have the answer to the exact question you asked, I can point out on thing. It's easy to convert from djvu to pdf if, at some point in time, you want a pdf copy for some reason. The reverse isn't so true. If you archive it as pdf, you can't readily convert to anything else without losing information.
Overall, pdf is reasonable for viewing (right now), but not good for editing, manipulating, and archiving. Even for viewing, pdf at its heart assumes it is being printed on letter- sized paper
buy a scanner (Score:4, Interesting)
Keystoning is easy to correct in Gimp. But that's going to be pretty labor intensive, and you really would want something automatic. I'd follow what others have said and buy one of the better products, like a professional scanner, and re-sell it once you're done. You can buy the ScanSnap SV600 (which everyone else seems to be recommending) for under $600 -- is that budget-friendly for you? If not, have you looked into renting such a device, or using one at a local library?
As an analogy, if you wanted to scan the old family slides, then the way to go is to buy a used Nikon pro-level slide scanner, do your stuff, and re-sell it with nearly zero loss, with the understanding that you're putting the couple of thousand dollars of purchase price at risk. I'm in the midst of doing exactly that, although given the number of slides I have to scan, I bought the scanner with the expectation that it will be a full write-off, and that's the price of not risking loss of family heirlooms by shipping the slides somewhere to have a minimum-wage flunky do the scanning.
Squaring up photos of pages (Score:1)
ImageMagick can do (most of) what you want for squaring up photos of pages. It is free/open source software. I'm not sure that I'd describe it as "easy" though: you would have to manually mark out the fixes required for each page.
Re: (Score:2)
I wrote a script to do this exact thing. You put the crop, rotate, etc. parameters into a text file which you can save along with the batch of images. It also supports multicropping each image so that you can use it to build indexes of negative and slide pag
Re: (Score:2)
Doing a quick review of the resulting output images allows you to correct any anomalies because the original files don't have to be overwritten. Very convenient and useful for a ~$50 software package.
--
Bittorrent (Score:2)
Seriously. Most popular books are already in electronic formats. While there may be some solid questions about whether it's legal, I think it's a perfectly ethical move. You're going from print to print, not audiobook, play, or movie version of a printed book that you own.
And scanning it is probably illegal anyway so it's not like all the extra work will protect you. Yeah, Google got away with it but they've got millions of dollars worth of lawyers who argued that their work was done for research purpos
Scan Tailor (Score:2)
I often scan in music from bulky books. I find Scan Tailor (http://scantailor.org/) works pretty well. It lets you crop, unbend, despeckle etc. in a wizard like way. The drawback is that it wierdly insists on TIFF format input and output. So you have to be handy with tools like pdf2pnm, pnmtotiff, tiffcp and tiff2pdf, etc.
Works really well apart from that.
1080 webcam... (Score:3)
Donate to Internet Archive (Score:2)
I also hear that you can pay the Scanning Service close to your location to scan your books. but you will need to check on that.
Check if by any chance your books are already digitized on OpenLibrary.org
The Internet Archive Book Drive - https://openlibrary.org/bookdr... [openlibrary.org]
Scanning Services
It is not the scanner - it is the software. (Score:2)
OK - open source has a really good OCR engine - tesseract.
But that is only one part - you need software that can recognize layout - differentiate pictures from text etc.
There are two approaches - put a text layer under a bitmap (searchable image) - or make a real document with fonts and pictures where needed (clear-scan) . (Hopefully a ODT file ).
Even in Windows clear-scan is iffy - diagrams with text confuse the software. Clear-scan to ODT is what we want - but can't have yet..
Notes and links on this: htt [xtronics.com]
Scan Tailor is Opensource, Fopydo for capture (Score:1)
On the path to essential we all take a few detours to learn things.. one of my favorite 'sayings'.
Scan Tailor fits your original description and price range.
http://scantailor.org/ [scantailor.org]
There is a GitHub site for downloading the installer, works on Windows 7 for me, but I see no limitations to prevent it from working on OSX or Linux.
The documentation isn't great, but the software is very good, quite on par with most of the BookDrive or BookScanner types of programs.
Digital Book Collecting, or Scanning or Ripping d
Project Gado -- Open source Document Scanning (Score:2)
You might investigate Project Gado.
A free open source robot for taking pictures of documents without exposing them to danger.
Not sure if it has all the software you want, but there is an open source community developing for it, the Univ of Finland seems to be the hub.
http://projectgado.org/2015/07... [projectgado.org]
linearbookscanner (Score:2)
If you are able to build this thing, look at linearbookscanner [linearbookscanner.org]. This would be my preferred method of digitizing but to build it is above my ability :-(.
Check your local library (Score:2)
Before building something yourself, make sure you don't have access to better equipment locally. The main library here in Cleveland has what they call a Preservation Lab that has library-grade equipment available for public use.
http://cpl.org/clevdpl/ [cpl.org]
Separate digitizing from correction (Score:2)
First digitize using the best solution which is easily available to you now, like a good flat bed scanner [amazon.com], and then look for correcting software later. So long as you have the original JPEGs/PDFs, you can continue enhancing them without putting your documents in danger.
Seems preferable to waiting for perfect hardware/software while your archive deteriorates further.
Re: (Score:2)
Flatbed book scanning? In 2015? 7 seconds per page for 300dpi in greyscale?
Wow, that's terrible.