Amazon Launches Full Text Book Search 241
m00nun1t writes "Amazon have launched a new service that allows you to search the full text of books. This sounds like an incredibly useful function as well as technically impressive at this scale. I wonder if a patent is in the works." Or if a patent is already owned.
Yeah, but.. (Score:4, Funny)
Re:Yeah, but.. (Score:5, Interesting)
Daniel
Re:Indexing mechanism (Score:2)
Daniel
Re:Indexing mechanism (Score:3, Interesting)
Google Catalogs (Beta) [google.com]
It's very probable that they licensed the Catalog Search technology from Google.
Re:A4 pages? (Score:2)
Daniel
Amazon... (Score:1, Insightful)
Sure, you can search for some random phrase. But who's to say it's not out of context, or there's nothing more that's relivent in the book?
Re:Amazon... (Score:5, Interesting)
I was searching for books on Object Role Modeling(ORM), I had first done a search for ORM and did not find anything of interest. They then switched it on while I did a search of 'Object Role Modeling', this poped up a few books with the text where it was being used.
You can see whole pages (Score:5, Informative)
This is equivalent of the facility you have in a physical bookstore to open a book and browse a few pages before purchasing. I can see it might be very useful, if they get the majority of books in a field accessible like this.
I wanted a PHP book the other day, and it is very difficult to decidew which one of the plethora available I wanted. So I went to my physoical bookstore. Smaller choice, but I could open each and get an impression of whther ther were slow, detail by detail, dummies books or the sort of high-speed summary I wanted.
Free Books? (Score:3, Interesting)
Re:You can see whole pages (Score:2)
The difference being that you can't write a computer program to go to the bookstore and systematically find the contents of certain books for you while you sleep. You can't root a bunch of windows boxes and have those zombies doing the whole thing to a physical store's collection.
I'm not saying that amazon's book text search feature is bad. I'm just saying it is different from
Re:Amazon... (Score:3, Funny)
abuse (Score:4, Interesting)
How easy can this service be abused, with automatic webbots doing the searching?
I can imagine there might be filters, time limits, and max searchs/day limits for something of this scale, no?
Re:abuse (Score:5, Interesting)
You can only browse two pages in either direction per search. You also have to be logged in. I suppose someone could script a system to create thousands of account, then use an army of zombie machines to OCR the pages from a variety of different IPs. That is assuming that Amazon has EVERY page of every book available to the service, which I doubt.
It would probably by easier to coble together a robot built around a laptop with an ocr equiped camera and book manipulation software and set it loose in a big library at night. For 50 years.
Re:abuse (Score:5, Informative)
ebooks are a pretty healthy alternative to normal books, but I don't see the publishers worrying too much about piracy. Perhaps it's because the average script kiddie who will spend 2 days downloading Matrix Reloaded from Usenet is just not the type to try and crack open a book, much less crack an ebook.
Re:abuse (Score:2, Funny)
Re:abuse (Score:4, Insightful)
Not so easily. It's easy to see why. The books will be scanned in using OCR. These days a fast and convenient and almost error-free process. But not entirely error-free. Good enough to find documents that are highly relevant to a particular keyword (if "hydraulics" occurs 9 times, what are the odds of OCR getting it wrong all 9 times?) but not good enough for entirely automated book-to-text.
If amazon would display highlighted portions of the books contents if would probably not exceed a few lines, just like google doesn't present entire webpages in it's result screen). If they did want to show more, they'd have to show an image of the scanned in page anyway, since OCR errors would not be very pretty. (A lot of digital archiving products use a similar approach; they index PDF files that contain the OCR'ed text, invisible to the end-user, and the scanned pages as content which the end-user looks at).
Besides, to search for each page of a book, you'd have to search for a keyword on each page of that book. Such keywords would most easily be extracted by scanning in the book via OCR anyway!
From their FAQ... (Score:2)
Our Search Inside the Book feature is designed to help our customers discover new books and ensure that they'll be satisfied with their purchases. To be fair to the publishers and authors who participate in our program, we only allow Search Inside the Book users to read a portion of the book."
Re:abuse - I've abused it. Sort of. (Score:3, Informative)
That,
Re:abuse (Score:2)
um...spell "launches" correctly please (Score:1, Insightful)
It works!!! (Score:5, Funny)
2) It returned a lot of results
Conclusion: It works!!!
Hmmm... (Score:1)
Re:Hmmm... (Score:2)
Daniel
Fine grain searches take the adventure away (Score:4, Insightful)
You really never knew what would turn up as you traversed the Yahoo directory structure. You start searching for blues music and you'd end up with a list of 15 or so good links with
As search techniques are becoming more refined, we are now able to do specific word searches on websites and now books. That's fine if you know exactly what you are looking for. For example if you want to get that book about 'replicants' you'll find Blade Runner, but you won't find anything else. You won't get any information except exactly the thing you are looking for.
And I think that that is where the problem with this kind of search lies for books/music/etc. If you want to find a song or a book, it most likely isn't going to be a specific word you remember, it will be the tune or the plot, both of which are not searchable.
I don't see this improvement in Amazon's search system as that much of an improvement. A better improvement could be made to the 'We thought you'd like' feature. Instead of finding only what I'm looking for, I'd like to find other things I might also be interested in.
Re:Fine grain searches take the adventure away (Score:2)
It'd be okay so long as you could turn it off - search in "pure" mode or "I don't know what I want but I'll recognise it when I see it" mode.
Re:Fine grain searches take the adventure away (Score:2)
There was a chap, at our University possibly, that started a PhD in Computer Science. The subject matter isn't that important, but the method with which he conducted he research was truly novel.
One of the first things he did was goto the library and start by going at random to bookshelves and picking out a book. He would then flick to a random page and start reading. After a week he had amassed a set of random ideas for random subject areas, cho
Re:Fine grain searches take the adventure away (Score:2)
And if you haven't noticed, Amazon has been doing the "here's things you might be interested in" thing for a long, long time. If anything, that's the novel and useful technology for which they would deserve a patent.
Re:Fine grain searches take the adventure away (Score:3, Informative)
1. After a search, it gives you a list of "Customers who bought this also bought:". For instance, see this [amazon.ca].
2. They have the concept of "Listmania" which allows every user to create a list of their own recommended products. If your search aligns with their list, Amazon will suggest that you look at it. Search for something you want and keep an eye open for the listmania section.
Doesn't this meet your criteria for
Re:Fine grain searches take the adventure away (Score:2)
Just like search parties come home from the woods with the missing person, instead of ten random other people who also look like great guys. Searching is done for specific things, browsing might be more what you're after.
it most likely isn't going to be a specific word you remember, it will be the tune or the plot
I've search
Re:Fine grain searches take the adventure away (Score:2)
As far as music goes, you can search for Parsons code [name-this-tune.com].
Potential tool for discovering plagiarism? (Score:4, Insightful)
Granted, Amazon.com's feature will only (for now) include 150,000 books, but this may very well be another way to catch plagiarizers. Just type in a suspicious phrase and see if there are any 'hits'.
Re:Potential tool for discovering plagiarism? (Score:2)
Re:Potential tool for discovering plagiarism? (Score:2)
Re:Potential tool for discovering plagiarism? (Score:2)
Re:Potential tool for discovering plagiarism? (Score:2)
Granted, Amazon.com's feature will only (for now) include 150,000 books, but this may very well be another way to catch plagiarizers. Just type in a suspicious phrase and see if there are any 'hits'.
yeah, because no two people ever come up with the same turn of phrase for the same thing...
Re:Potential tool for discovering plagiarism? (Score:2)
Don't have a cow, man.
No Searching Inside O'Reilly Books (Score:5, Interesting)
Re:No Searching Inside O'Reilly Books (Score:5, Interesting)
Safari is more of a "service" (i.e. renting access to book content) than a "feature" of a retail website, which is all Amazon's "innovation" seems to be.
Basically the only real different between the two (aside from what is cited above) is that Amazon just lets you know the content is mentioned, and shows you a page or two. Safari gives you the entire book. That and that Amazon has a much wider range of books in non-tech genres
In othr news (Score:2, Funny)
no more staying awake! (Score:2, Funny)
No more out-of-print books (Score:4, Insightful)
I'd love to be able to browse a giant back catalog, knowing that an original or facsimile copy could definitely be delivered to me.
One click search. (Score:4, Funny)
When questioned for comment Google CEO Eric Schmidt said "ug".
Re:One click search. (Score:3, Insightful)
Anyways, stay tuned, I believe the Patent Office takes about a year these days to issue a patent?
The story will of course will run here on slashdot.
Re:One click search. (Score:2)
Welcome! You must be new to patenting.
Re:One click search. (Score:2)
Actually, make that closer to three years.
New age youth (Score:3, Funny)
Youth nowadays: lookup 'vagina' in all books on this planet.
Wow! (Score:5, Informative)
I tried the search again today and got nearly 5,000 results, with the capability to actually look inside the book and see if the reference is useful to me. Very impressive indeed, patent or no patent.
Re:Wow! (Score:2, Interesting)
Google Catalogs [google.com]
Various worthwhile uses (Score:5, Informative)
Bash Amazon all you want, but this is a very useful technology.
In five minutes I was able to find three books that talked about findings first listed in two of my own published scientific papers, yet these books did not cite me, or anyone else, as the source of that information. My lawyer is currently preparing three letters.
I also found two other books in which the author used verbatim quotes and original theories from various interviews I have given, yet both authors passed off the statements as their own. My lawyer is now preparing five letters.
Aside from being used to protect my own research rights, I have found the search system useful for finding topics of interest discussed in certain books which are not referenced in any of the descriptions about the books. I just ordered three books I would not otherwise have ever purchased.
While I don't think highly of all of Amazon's practices, I must hand it to them for whatever technical undertaking created this search feature.
Has it occurred to you... (Score:3, Interesting)
Re:Worthwhile uses: Finding defendants? (Score:2, Interesting)
When you pass off somone else's ideas as those of your own it's called plagiarism.
I'm not suing them for any monetary damages. Just a requirement that my own work be attributed to me.
Re:Worthwhile uses: Finding defendants? (Score:2, Insightful)
How is this an Abuse of the legal system???
Just tried it.... (Score:2)
1. Rincewind (Character in Terry Pratchett books)
The only reference I could find where books about fantasy/sci fi fiction and no book extracts from TP's books.
2. Various other British and US authors, again to extracts.
3. The first line of 1984. Success I found extracts, but only the first page and the cover. Also the extract seemed only to exist for one version of the book, not all copies.
This seems to be a good idea for academia/study books but, at the moment at least
Re:Just tried it.... (Score:2)
I think you're spot on with the observation that there is a lack of participation by fiction authors.
Not impressed (Score:2)
ebooks vs CD/DVD (Score:3)
I used to think, like many people, that ebooks just didn't work because 'I like the feel of paper under my fingers'. Since I bought a PDA and discovered the joys of Fictionwise [fictionwise.com], I just can't go back to these clumsy wood pulp apparels.
Amazon is pretty progressive in this regard, making a great number of their collection available electronically. It was probably fairly easy from there to make their stock searchable. And how I wish the MPAA and RIAA could work like the publishing industry...
The existence of ebooks is NOT threatening traditional books, because people see more value in a printed book over an electronic copy. This is clearly not the case with a CD and a DVD, since most people couldn't care less about the jacket if they have the goods on the CD/DVD. I wish the MPAA and RIAA would understand how to make traditional CDs and DVDs "value-added", and make people less inclined to getting a computer file instead of shelling out the money.
Then again, I guess the case with ebooks is that your typical DVD or CD pirate is just not interested in swapping files to get the latest Stephen King and read it on screen. Not only that, but most of History's greatest books are available for free, and one could probably read free books for the rest of their lives if they chose so.
Re:ebooks vs CD/DVD (Score:2, Insightful)
Re:ebooks vs CD/DVD (Score:3, Interesting)
Whereas if I want H.G. Wells' The Time Machine, I am one click away [ebooks3.com] from a quality MS Reader version of it.
Re:ebooks vs CD/DVD (Score:2)
Yes, same here. But that was until I actually used an e-book
And how I wish the MPAA and RIAA could work like the publishing industry.
Well, seeing how r
Why do I need to enter a credit card number? (Score:2, Interesting)
Why do I need to enter a credit card number?
We require credit card information for security purposes only. We will not charge your credit card account any fees for using the Search Inside the Book feature.
Uhuh. Security. Whose?
Yeah, I want to be financially secure too !
whats the point you're making in the first place? (Score:2, Insightful)
You could have robots trolling this section all day.
Uhuh. Security. Whose?
What's your point? You think Amazon is a dishonest porn site that takes your credit card information and disappears the next day?
If that's your mentality, how are you surfing the web?
Yeah, I want to be financially secure too !
What the fsck's your point man? What does amazon demanding your credit card number for security have to do
Re:whats the point you're making in the first plac (Score:2)
Do you really believe that large companies do no wrong, and make no mistakes? Are you really such a bunny that you give your credit card to anyone with a brand name who asks for it?
That's my point, "man". Wake up to the real world. Look around you. Observe corporate malfeance. Now wonder i
Re:whats the point you're making in the first plac (Score:2)
Re:Why do I need to enter a credit card number? (Score:3, Insightful)
Re:Why do I need to enter a credit card number? (Score:2)
Creating new, unique credit cards associated with those accounts: hard
Seems like a mechanism for keeping any one entity from being able to view all pages from a single book.
Anyone else notice this? (Score:4, Insightful)
By publishers' agreement, we are pleased to offer Amazon.com customers with a valid credit card the ability to view copyrighted pages.
Your account will not be charged.
This one-time process enables you to view limited copyrighted material through our Search Inside the Book feature.
So they'll let you browse the search pages, if you can prove your identity on record and provide them with financial information. No thanks.
Re:Anyone else notice this? (Score:3, Insightful)
I plan to check it out for engineering texts, but I suspect my local (relative to Am
How are they indexing and scanning all the books? (Score:2)
But now, this seems to be something that would require an army of people to handle. And that's not to mention the hardware/facilities needed to create this database.
So how
Re:How are they indexing and scanning all the book (Score:2)
Amazon : Hey Mr Publisher, I'm setting up this database of book texts so the mort..err.customers can search for books.
Publisher : Cool, I'll email the [postscript/tex/...other source] right over.
Every one of those books has to have a electronic version somewhere, most likely in a machine-readable (rather than vector/bitmap) format.
Re:How are they indexing and scanning all the book (Score:2)
Why did Amazon take this route? (Score:2, Informative)
Even with a full text search facility I doubt very much if it can come close to matching the experience of flipping through a book at the local book store no matter how effective the searching facility.
I can think of one reason and that has already been mentioned by a few
Re:Why did Amazon take this route? (Score:2)
Scanner problems (Score:4, Interesting)
See this for example... [amazon.com]
Mass-OCR'ing has it's drawbacks..
This technology is called. . . (Score:3, Informative)
I believe there is a body of prior art for scanning in books and greping them. Is that not one of the oft repeated benefits of ebooks?
Whether or not Amazon can get a patent on a shell script to serve up the results . . . on the web oooooooo, remains to be seen I suppose.
They managed to get one on "Give me one of those, put it on my account and drop it by my house" a "technology" my grocer has been offering over the phone for 40 years that I'm personally aware of.
However, since this sort of "technology" is exactly the sort of thing that the web, and the internet itself for that matter, was invented for I'd have to guess there's a lot of prior art. It's certainly obvious and trivial, but that doesn't seem to count for much these days.
The problem with things that are so obvious and trivial that "everyone" has been doing it for decades is that it's hard to demonstrate in court because no one bothers to document it.
Can you prove your grandfather put his pants on one leg at a time?
Common sense tells you he did, but common sense no longer applies in an age that grants patents to perpetual motion machines and peanut butter sandwiches.
KFG
Re:This technology is called. . . (Score:3, Informative)
Prior art on any patent (Score:2)
Searching multiple books- a card catalog
Not to mention a dozen or so library cataloging systems, at least- especially research quality ones. That being said, this sounds like an awesome feature, and I applaud Amazon for putting it together.
Patent (Score:2)
Like that would stop them from trying to patent it again.
Search Dictionary? (Score:2)
I mean, c'mon, it's the first thing I noticed when I fired up slashdot this morning. Have people just stopped paying attention to spelling around here?
Wired article: "The Great Library of Amazonia" (Score:5, Informative)
Now we just need... (Score:5, Funny)
Scott
Biased? (Score:2, Interesting)
Thus, your searches will tend to return more results from books that are fully indexed.
Now that I think about it - this is a major incentive for publishers to get their books indexed.
How do we know _which_ books... (Score:3, Interesting)
A check on "the clocks were striking thirteen" yields seventeen hits, including the Cliff's Notes to Nineteen Eighty-Four and a reference in the Oxford Dictionary of Modern Quotations...
but none to Orwell's Nineteen Eighty-Four itself.
We must conclude that the coverage is spotty.
Re:How do we know _which_ books... (Score:2)
Why..This would be like searching through the LOC! (Score:4, Funny)
Unfair Use? Amazon's Free Book Giveaway Service (Score:2, Informative)
Read all about it here: http://www.nettle.com/archives/000062.html [nettle.com]
Patent Bashing Du Jour (Score:3, Insightful)
This type of editorializing is pathetic in that its only purpose is to stir up the masses. Gee...now let's take a look shall we? 20% of the comments are "patents suck" or "isn't this some example prior art"?
This story is about a new feature people...it's not about a patent. Wipe the froth from your mouths and comment on the merits (of lack of) the feature...not on a completely fabricated hypothetical comment meant to incite you into a frenzy.
Useful, yes. Technically impressive/patentable, no (Score:2)
But technically this isn't impressive. I worked on programs that did full-text document searches about 20 years ago, and they weren't new then. So simply doing full-text searches in documents is just no big deal. But what about the large number of books, you say? Actually, that's nothing more than what they already do. I believe the scale of website text far exceeds the scale of the book text that they can search. The Wikipedia [wikipedia.org] is simply one of millions o
AMZN's FAQ on this feature (Score:2)
In particular:
Why won't Search Inside the Book let me see more pages from a specific book?
Our Search Inside the Book feature is designed to help our customers discover new books and ensure that they'll be satisfied with their purchases. To be fair to the publishers and authors who participate in our program, we only allow Search Inside the Book users to read a portion of the book.
So, you can only see some % of the books pages as
Here's a quote relevant to the parent post (Score:3, Interesting)
Encyclopedia of New Media : An Essential Reference to Communication and Technology -- Steve Jones (Editor); Hardcover
Excerpt from page 0: ". . . post-ranking system used by members the of Web message board Slashdot.org, began as a result of community self- restraint in the face of unrelenting trolls (pointlessly hostile posters). In addition, some cyberspace forums now require . .
See more references to slashdot troll in this book.
Re:Amazon have? (Score:3, Informative)
"Congress have failed to agreee..." because you are talking about a lod of swuablling politicians who are definitely plural.
"Congress has past a bill..." because those politicians have managed to achiueve a consensus and act as as a single entity.
In this case the sungular is correct, because Amnazon as an entity is offering a new service. But you could use the ter
Re:Heh (Score:2)
Re:Those crazy Brits (Score:2)
Re:Those crazy Brits (Score:3, Funny)
Re:Those crazy Brits (Score:2)
Re:Those crazy Brits (Score:2)
My take on it is that generally, yeah, organisations, groups, etc shoulds be treated as singular nouns, and those who don't are woefully ignorant and fair game to be laughed at by those of us who know better, but that "The Police" is an illogical exception.
It's English, it doen't have to obey logical rules.
Re:No limits on pages viewed/searched? (Score:2, Insightful)
Re:No limits on pages viewed/searched? (Score:2, Informative)
You've reached the page-view limit for this book or you've reached the monthly page-view limit for the Search Inside the Book feature. Feel free to return to the pages you've previously viewed. If you want to see more of this copyrighted material, you can purchase this book. You can also search inside other books. Click here for more information or continue shopping.
So evidently they ar
Re:No limits on pages viewed/searched? (Score:2)
It doesn't need to be impossible to read the book without paying for it. It just needs to be enough of a hassle that most people won't bother. If you have so much free time and so little money that you would actually consider reading a book by searching for the individual pages on Amazon, you're probably getting books by going to the library or borrowi
Re:But Will They Make Them Available a eBooks? (Score:2)
As a matter of fact, when I considered buying a PDA because it was just too damn expensive for me to order real books to Shanghai, the availability of ebooks on Amazon convinced me it was the right thing to do.
Re:Already been done (Score:2)
I don't think they'd want the Tommyknockers or the Langoliers at their doorstep