Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Books Media The Internet

Amazon Launches Full Text Book Search 241

m00nun1t writes "Amazon have launched a new service that allows you to search the full text of books. This sounds like an incredibly useful function as well as technically impressive at this scale. I wonder if a patent is in the works." Or if a patent is already owned.
This discussion has been archived. No new comments can be posted.

Amazon Launches Full Text Book Search

Comments Filter:
  • Yeah, but.. (Score:4, Funny)

    by michaelhood ( 667393 ) on Friday October 24, 2003 @03:35AM (#7298147)
    can you do it with one click?
  • Amazon... (Score:1, Insightful)

    by Ianoo ( 711633 )
    How useful is this, considering that we can't see what's in the books before buying?

    Sure, you can search for some random phrase. But who's to say it's not out of context, or there's nothing more that's relivent in the book?

    • Re:Amazon... (Score:5, Interesting)

      by will_die ( 586523 ) on Friday October 24, 2003 @03:50AM (#7298196) Homepage
      It is really nice, I was using amazon right as they switched it one.
      I was searching for books on Object Role Modeling(ORM), I had first done a search for ORM and did not find anything of interest. They then switched it on while I did a search of 'Object Role Modeling', this poped up a few books with the text where it was being used.
    • by AlecC ( 512609 ) <aleccawley@gmail.com> on Friday October 24, 2003 @04:23AM (#7298290)
      You can read the page it is on and +/- two pages.

      This is equivalent of the facility you have in a physical bookstore to open a book and browse a few pages before purchasing. I can see it might be very useful, if they get the majority of books in a field accessible like this.

      I wanted a PHP book the other day, and it is very difficult to decidew which one of the plethora available I wanted. So I went to my physoical bookstore. Smaller choice, but I could open each and get an impression of whther ther were slow, detail by detail, dummies books or the sort of high-speed summary I wanted.
      • Free Books? (Score:3, Interesting)

        by Angram ( 517383 )
        If you can read 5 pages of text per search, couldn't you just continually search for a phrase on the 5th page, allowing you to read any book for free with a decent amount of effort?
      • "This is equivalent of the facility you have in a physical bookstore to open a book and browse a few pages before purchasing."

        The difference being that you can't write a computer program to go to the bookstore and systematically find the contents of certain books for you while you sleep. You can't root a bunch of windows boxes and have those zombies doing the whole thing to a physical store's collection.

        I'm not saying that amazon's book text search feature is bad. I'm just saying it is different from

    • by hdparm ( 575302 )
      Don't know but your search for relivent will return zero results every time.
  • abuse (Score:4, Interesting)

    by technix4beos ( 471838 ) <cshaiku@gmail.com> on Friday October 24, 2003 @03:36AM (#7298153) Homepage Journal
    I can almost hear the screams of joy from the underground book pirates.

    How easy can this service be abused, with automatic webbots doing the searching?

    I can imagine there might be filters, time limits, and max searchs/day limits for something of this scale, no?
    • Re:abuse (Score:5, Interesting)

      by Maskirovka ( 255712 ) on Friday October 24, 2003 @03:49AM (#7298192)
      How easy can this service be abused, with automatic webbots doing the searching?
      You can only browse two pages in either direction per search. You also have to be logged in. I suppose someone could script a system to create thousands of account, then use an army of zombie machines to OCR the pages from a variety of different IPs. That is assuming that Amazon has EVERY page of every book available to the service, which I doubt.

      It would probably by easier to coble together a robot built around a laptop with an ocr equiped camera and book manipulation software and set it loose in a big library at night. For 50 years.

    • Re:abuse (Score:5, Informative)

      by Enoch Root ( 57473 ) on Friday October 24, 2003 @03:59AM (#7298222)
      You 'almost', but not quite, hear the book pirates, most probably because they don't formally exist. ebooks are widely available in unencrypted format, and the latest releases, while in secure formats such as Secure MS Reader or Adobe, are probably much easier to crack than creating a bot to collect a book online page by page.

      ebooks are a pretty healthy alternative to normal books, but I don't see the publishers worrying too much about piracy. Perhaps it's because the average script kiddie who will spend 2 days downloading Matrix Reloaded from Usenet is just not the type to try and crack open a book, much less crack an ebook.
    • uhm, you must log-in with credit card data... ...will pirates risk so much for a book?
    • Re:abuse (Score:4, Insightful)

      by wfberg ( 24378 ) on Friday October 24, 2003 @04:47AM (#7298336)
      How easy can this service be abused, with automatic webbots doing the searching?

      Not so easily. It's easy to see why. The books will be scanned in using OCR. These days a fast and convenient and almost error-free process. But not entirely error-free. Good enough to find documents that are highly relevant to a particular keyword (if "hydraulics" occurs 9 times, what are the odds of OCR getting it wrong all 9 times?) but not good enough for entirely automated book-to-text.

      If amazon would display highlighted portions of the books contents if would probably not exceed a few lines, just like google doesn't present entire webpages in it's result screen). If they did want to show more, they'd have to show an image of the scanned in page anyway, since OCR errors would not be very pretty. (A lot of digital archiving products use a similar approach; they index PDF files that contain the OCR'ed text, invisible to the end-user, and the scanned pages as content which the end-user looks at).

      Besides, to search for each page of a book, you'd have to search for a keyword on each page of that book. Such keywords would most easily be extracted by scanning in the book via OCR anyway!
    • "Why won't Search Inside the Book let me see more pages from a specific book?

      Our Search Inside the Book feature is designed to help our customers discover new books and ensure that they'll be satisfied with their purchases. To be fair to the publishers and authors who participate in our program, we only allow Search Inside the Book users to read a portion of the book."
    • I was stuck when working on a problem set; I Googled for a while and found out that there's a bunch of helpful info in one particular problems and solutions book. Curious about the book, I went on Amazon, and lo and behold, I can actually read the book. So, I look at the table of contents, find the relevant section, and search for the heading of that section. I can now read two pages from it. Not a problem; just pick a phrase on the second page and use it as a search query. Lather, rinse, repeat.

      That,
  • please /. spell your titles right...! launches with an e!
  • It works!!! (Score:5, Funny)

    by jabbadabbadoo ( 599681 ) on Friday October 24, 2003 @03:38AM (#7298159)
    1) I typed 'porn'
    2) It returned a lot of results

    Conclusion: It works!!!

  • It looks like you can see the full page when you do a search. I wonder if searching for (Book Name) 1, (Book Name) 2..., would let you read the book page by page.
  • by Dancin_Santa ( 265275 ) <DancinSanta@gmail.com> on Friday October 24, 2003 @03:41AM (#7298169) Journal
    Back in the early days of the web, when Yahoo was still a catalog of links and not some super news/search/auction/ebusiness/do-it-all website that it is now, searches were much more fun.

    You really never knew what would turn up as you traversed the Yahoo directory structure. You start searching for blues music and you'd end up with a list of 15 or so good links with .wav samples and more than likely an artist you'd never heard of before. That was the best part, getting introduced to things you hadn't even thought to look for.

    As search techniques are becoming more refined, we are now able to do specific word searches on websites and now books. That's fine if you know exactly what you are looking for. For example if you want to get that book about 'replicants' you'll find Blade Runner, but you won't find anything else. You won't get any information except exactly the thing you are looking for.

    And I think that that is where the problem with this kind of search lies for books/music/etc. If you want to find a song or a book, it most likely isn't going to be a specific word you remember, it will be the tune or the plot, both of which are not searchable.

    I don't see this improvement in Amazon's search system as that much of an improvement. A better improvement could be made to the 'We thought you'd like' feature. Instead of finding only what I'm looking for, I'd like to find other things I might also be interested in.
    • all that "related links", "we thought you'd like", "here's what other people searched for" BS just gets in the way.

      It'd be okay so long as you could turn it off - search in "pure" mode or "I don't know what I want but I'll recognise it when I see it" mode.
    • This is a third hand story, but I am going to tell in anyway.

      There was a chap, at our University possibly, that started a PhD in Computer Science. The subject matter isn't that important, but the method with which he conducted he research was truly novel.

      One of the first things he did was goto the library and start by going at random to bookshelves and picking out a book. He would then flick to a random page and start reading. After a week he had amassed a set of random ideas for random subject areas, cho
    • Sounds like a modern version of "back in my day..."

      And if you haven't noticed, Amazon has been doing the "here's things you might be interested in" thing for a long, long time. If anything, that's the novel and useful technology for which they would deserve a patent.
    • Have you actually ever been to Amazon? What you say it lacks is what I like best about it.

      1. After a search, it gives you a list of "Customers who bought this also bought:". For instance, see this [amazon.ca].

      2. They have the concept of "Listmania" which allows every user to create a list of their own recommended products. If your search aligns with their list, Amazon will suggest that you look at it. Search for something you want and keep an eye open for the listmania section.

      Doesn't this meet your criteria for

    • That's fine if you know exactly what you are looking for. For example if you want to get that book about 'replicants' you'll find Blade Runner, but you won't find anything else.

      Just like search parties come home from the woods with the missing person, instead of ten random other people who also look like great guys. Searching is done for specific things, browsing might be more what you're after.

      it most likely isn't going to be a specific word you remember, it will be the tune or the plot

      I've search

    • If you want to find a song or a book, it most likely isn't going to be a specific word you remember, it will be the tune or the plot, both of which are not searchable.

      As far as music goes, you can search for Parsons code [name-this-tune.com].
  • by Anonymous Coward on Friday October 24, 2003 @03:42AM (#7298170)
    I remember a teacher once telling a class I was in that our essays may be compared to other essays published online to check for plagiarism.

    Granted, Amazon.com's feature will only (for now) include 150,000 books, but this may very well be another way to catch plagiarizers. Just type in a suspicious phrase and see if there are any 'hits'.
  • by theodp ( 442580 ) on Friday October 24, 2003 @03:43AM (#7298174)
    Even though he said he was 'blown away' by Amazon's new Search Inside the Book feature, Tim O'Reilly has decided not to participate in the program [wsj.com] for now. 'If they end up being a Google for published content...we need to think better about what publishers get out of it,' he said.
    • by Zeddicus_Z ( 214454 ) on Friday October 24, 2003 @05:18AM (#7298436) Homepage
      As a Safari subscriber, I'd say it's probably because Full Text Search of online book content is also present at O'Reilly's own Safari [oreilly.com]online tech book site. You've been able to do the same thing Amazon is now crowing about, on every book Safari has, since launch quite some time ago (year or two perhaps?)

      Safari is more of a "service" (i.e. renting access to book content) than a "feature" of a retail website, which is all Amazon's "innovation" seems to be.

      Basically the only real different between the two (aside from what is cited above) is that Amazon just lets you know the content is mentioned, and shows you a page or two. Safari gives you the entire book. That and that Amazon has a much wider range of books in non-tech genres
  • The editors of the world reknowned Slashdot has recently proven to the wrld that they are unable to correct small spelling mistackes and grammar issues.
  • Now I can cut+paste my homeworks! yay!
  • by Bushcat ( 615449 ) on Friday October 24, 2003 @03:55AM (#7298211)
    As the digital index builds up, we will rapidly come across the situation where the electronic book is searchable, but the printed form is out of print. If this service ultimately allows single copies to be printed for delivery, it will be an outstanding demonstration of print-on-demand technology as advocated by the Print On Demand Initiative [podi.org] and others.

    I'd love to be able to browse a giant back catalog, knowing that an original or facsimile copy could definitely be delivered to me.

  • by burtonator ( 70115 ) on Friday October 24, 2003 @03:56AM (#7298212)
    In other news... Amazon announced that the USPTO has granted them a patent on their proprietary "one click search" technology.

    When questioned for comment Google CEO Eric Schmidt said "ug".

    • by dracocat ( 554744 )
      Yeah, funny. But, I would bet that there will be a patent on this. I would also bet it has already been applied for. I mean really, this is actually really inovative for them, there must be something patentable in this.

      Anyways, stay tuned, I believe the Patent Office takes about a year these days to issue a patent?

      The story will of course will run here on slashdot.
  • by Anonymous Coward on Friday October 24, 2003 @04:00AM (#7298226)
    Youth in the old days: lookup 'vagina' in a dictionary.
    Youth nowadays: lookup 'vagina' in all books on this planet.
  • Wow! (Score:5, Informative)

    by plasticmillion ( 649623 ) <matthew@allpeers.com> on Friday October 24, 2003 @04:01AM (#7298227) Homepage
    I'm impressed. A couple of days I want onto Amazon to find books about Singular Value Decompositions (a mathematical technique that can be used for efficient statistical analysis of large groups of documents, among other things). I wasn't particularly surprised when it returned 0 results, since anyone who puts the term "Singular Value Decomposition" in their book's title obviously doesn't know much about marketing. Of course I don't actually give a damn if the term is in the title or not; I just want to know if the books talks about this technique.

    I tried the search again today and got nearly 5,000 results, with the capability to actually look inside the book and see if the reference is useful to me. Very impressive indeed, patent or no patent.

  • by emcron ( 455054 ) * on Friday October 24, 2003 @04:07AM (#7298246)

    Bash Amazon all you want, but this is a very useful technology.

    In five minutes I was able to find three books that talked about findings first listed in two of my own published scientific papers, yet these books did not cite me, or anyone else, as the source of that information. My lawyer is currently preparing three letters.

    I also found two other books in which the author used verbatim quotes and original theories from various interviews I have given, yet both authors passed off the statements as their own. My lawyer is now preparing five letters.

    Aside from being used to protect my own research rights, I have found the search system useful for finding topics of interest discussed in certain books which are not referenced in any of the descriptions about the books. I just ordered three books I would not otherwise have ever purchased.

    While I don't think highly of all of Amazon's practices, I must hand it to them for whatever technical undertaking created this search feature.
    • In five minutes I was able to find three books that talked about findings first listed in two of my own published scientific papers, yet these books did not cite me, or anyone else, as the source of that information. My lawyer is currently preparing three letters.
      ...that not everybody reads your papers? You should definitely contact the authors first. They might have had the same ideas by themselves.
  • Ive tried searching for the following.
    1. Rincewind (Character in Terry Pratchett books)
    The only reference I could find where books about fantasy/sci fi fiction and no book extracts from TP's books.
    2. Various other British and US authors, again to extracts.
    3. The first line of 1984. Success I found extracts, but only the first page and the cover. Also the extract seemed only to exist for one version of the book, not all copies.

    This seems to be a good idea for academia/study books but, at the moment at least
    • Funny you chose the first line of 1984... that was the very first search I did there! The second was to look for the phrase 'color of a television, tuned to a dead channel', which is part of the first sentence of Gibson's Neuromancer. No luck. Of course, search for Neuromancer, and you're taken right to it.

      I think you're spot on with the observation that there is a lack of participation by fiction authors.
  • I tried searching for an exact phrase but it doesn't work - you just get lots of matches on the individual words. It is also very broken if you use quotation marks.
  • by Enoch Root ( 57473 ) on Friday October 24, 2003 @04:09AM (#7298255)
    I warmly welcome any initiative that makes more and more books, or parts thereof, available online.

    I used to think, like many people, that ebooks just didn't work because 'I like the feel of paper under my fingers'. Since I bought a PDA and discovered the joys of Fictionwise [fictionwise.com], I just can't go back to these clumsy wood pulp apparels.

    Amazon is pretty progressive in this regard, making a great number of their collection available electronically. It was probably fairly easy from there to make their stock searchable. And how I wish the MPAA and RIAA could work like the publishing industry...

    The existence of ebooks is NOT threatening traditional books, because people see more value in a printed book over an electronic copy. This is clearly not the case with a CD and a DVD, since most people couldn't care less about the jacket if they have the goods on the CD/DVD. I wish the MPAA and RIAA would understand how to make traditional CDs and DVDs "value-added", and make people less inclined to getting a computer file instead of shelling out the money.

    Then again, I guess the case with ebooks is that your typical DVD or CD pirate is just not interested in swapping files to get the latest Stephen King and read it on screen. Not only that, but most of History's greatest books are available for free, and one could probably read free books for the rest of their lives if they chose so.
    • by Knetzar ( 698216 )
      Some people would say that "most of History's greatest" music is also available for free. I for one prefer modern music over Bach, but the classics are free.
      • Re:ebooks vs CD/DVD (Score:3, Interesting)

        by Enoch Root ( 57473 )
        The music itself is free, but quality recordings on MP3s are still being sold only as CDs, and are NOT available online for free. Thus, if I want to find quality classical music, I need to find a good recording, and that implies either piracy or buying a CD.

        Whereas if I want H.G. Wells' The Time Machine, I am one click away [ebooks3.com] from a quality MS Reader version of it.
    • I used to think, like many people, that ebooks just didn't work

      Yes, same here. But that was until I actually used an e-book :) Granted, e-books are not perfect for all situations, and there are times when a printed version is better. I prefer print version when I have to mark the text and reuse it (citing etc.), since I find it easier to skim through the printed pages than to look things up in the PDF reader.

      And how I wish the MPAA and RIAA could work like the publishing industry.

      Well, seeing how r
  • From the FAQ:
    Why do I need to enter a credit card number?

    We require credit card information for security purposes only. We will not charge your credit card account any fees for using the Search Inside the Book feature.

    Uhuh. Security. Whose?

    Yeah, I want to be financially secure too !

    • When you have such a vast array of information at your fingertips, there is a potential for mischief.
      You could have robots trolling this section all day.
      Uhuh. Security. Whose?

      What's your point? You think Amazon is a dishonest porn site that takes your credit card information and disappears the next day?
      If that's your mentality, how are you surfing the web?

      Yeah, I want to be financially secure too !

      What the fsck's your point man? What does amazon demanding your credit card number for security have to do

    • Obviously Amazon is aware of all of the "Mickey Mouse" and "Slashdot" type accounts that the New York Times garners. I would assume that Amazon's intent is that by requesting some information that you would not be prepared to share with others they can avoid this, and thus prevent some abuse. Let's face it, Amazon isn't some dodgy peddler of porn and pills that trades from a different URL each week, plus if you have got an account, it was probably to order something, which means they already *have* your c
    • Scripting/Automating creation of new accounts: easy
      Creating new, unique credit cards associated with those accounts: hard

      Seems like a mechanism for keeping any one entity from being able to view all pages from a single book.
  • by mike_lynn ( 463952 ) on Friday October 24, 2003 @04:34AM (#7298315)
    You have to have an account to view the pages. Fine, great. But then it brought up this screen:

    By publishers' agreement, we are pleased to offer Amazon.com customers with a valid credit card the ability to view copyrighted pages.
    Your account will not be charged.
    This one-time process enables you to view limited copyrighted material through our Search Inside the Book feature.


    So they'll let you browse the search pages, if you can prove your identity on record and provide them with financial information. No thanks.
    • It sounds more like they want to limit it to paying customers, and give non-customers at least one hoop through which to jump. If you've ordered even once with Amazon, you're set. If everyone in the world could do the text search without any requirement, their servers would melt down. In fact, I doubt they really expect many to provide a CC just to do a search. It seems more like an expanded feature for previous customers.

      I plan to check it out for engineering texts, but I suspect my local (relative to Am

  • I was always under the impression that Amazon was so successful because most of the business is handled by the computers unning the site and sales, and they only needed people to work in their warehouses, on their website/software, and some to handle customer support (as opposed to maintaining a chain of retail stores around the world).

    But now, this seems to be something that would require an army of people to handle. And that's not to mention the hardware/facilities needed to create this database.

    So how
  • As I read this rather interesting post, I am trying to figure out why Amazon took this route rather than the many many routes available to them to publicise or provide a richer experience to the average Joe buyer...

    Even with a full text search facility I doubt very much if it can come close to matching the experience of flipping through a book at the local book store no matter how effective the searching facility.

    I can think of one reason and that has already been mentioned by a few /.ers here i.e., the a
    • I can't recite the numbers, but I know that far more non-fiction books are published each year. This new searching technique is wonderful for spot-checking specifics topics that you need a book to cover. I think it's a wonderful service that has already affected my purchases.
  • Scanner problems (Score:4, Interesting)

    by thrill12 ( 711899 ) on Friday October 24, 2003 @05:13AM (#7298421) Journal
    Neat idea, but some excerpts come out all wrong:
    See this for example... [amazon.com]
    Mass-OCR'ing has it's drawbacks..
  • by kfg ( 145172 ) on Friday October 24, 2003 @05:16AM (#7298429)
    "grep."

    I believe there is a body of prior art for scanning in books and greping them. Is that not one of the oft repeated benefits of ebooks?

    Whether or not Amazon can get a patent on a shell script to serve up the results . . . on the web oooooooo, remains to be seen I suppose.

    They managed to get one on "Give me one of those, put it on my account and drop it by my house" a "technology" my grocer has been offering over the phone for 40 years that I'm personally aware of.

    However, since this sort of "technology" is exactly the sort of thing that the web, and the internet itself for that matter, was invented for I'd have to guess there's a lot of prior art. It's certainly obvious and trivial, but that doesn't seem to count for much these days.

    The problem with things that are so obvious and trivial that "everyone" has been doing it for decades is that it's hard to demonstrate in court because no one bothers to document it.

    Can you prove your grandfather put his pants on one leg at a time?

    Common sense tells you he did, but common sense no longer applies in an age that grants patents to perpetual motion machines and peanut butter sandwiches.

    KFG
  • Searching a single book- an index

    Searching multiple books- a card catalog

    Not to mention a dozen or so library cataloging systems, at least- especially research quality ones. That being said, this sounds like an awesome feature, and I applaud Amazon for putting it together.
  • I wonder if a patent is in the works." Or if a patent is already owned.


    Like that would stop them from trying to patent it again.
  • Or maybe a first grade spelling book, which says that the third-person verb conjugation of "launch" is spelled "launches", not "launchs."

    I mean, c'mon, it's the first thing I noticed when I fired up slashdot this morning. Have people just stopped paying attention to spelling around here?
  • by Enigmia Man ( 320896 ) on Friday October 24, 2003 @06:31AM (#7298674)
    Article [wired.com] in December Wired talks about Amazon's book scanning, how they legally do it, who does it, how many books so far, and protections.
  • by s88 ( 255181 ) on Friday October 24, 2003 @07:57AM (#7298966) Homepage
    A full text search of slashdot, so the editors can search for duplicate articles before they post.

    Scott
  • Biased? (Score:2, Interesting)

    by nimrod_me ( 650667 )
    Since this feature is not available for all books (and, in fact, wasn't available for any book in my wishlist) the results are necessarily very biased.

    Thus, your searches will tend to return more results from books that are fully indexed.

    Now that I think about it - this is a major incentive for publishers to get their books indexed.

  • by dpbsmith ( 263124 ) on Friday October 24, 2003 @08:24AM (#7299058) Homepage
    ...are included in the search?

    A check on "the clocks were striking thirteen" yields seventeen hits, including the Cliff's Notes to Nineteen Eighty-Four and a reference in the Oxford Dictionary of Modern Quotations...

    but none to Orwell's Nineteen Eighty-Four itself.

    We must conclude that the coverage is spotty.
  • by op00to ( 219949 ) on Friday October 24, 2003 @09:12AM (#7299358)
    What a feat of computing genius! Using computers to search through large bodies of text!!!! Has ANYONE ever done this before?!
  • I blogged this yesterday. I found that it's relatively easy to copy entire books (time-consuming, but easy), using this new service.

    Read all about it here: http://www.nettle.com/archives/000062.html [nettle.com]

  • by saddino ( 183491 ) on Friday October 24, 2003 @11:21AM (#7300575)
    Or if a patent is already owned.

    This type of editorializing is pathetic in that its only purpose is to stir up the masses. Gee...now let's take a look shall we? 20% of the comments are "patents suck" or "isn't this some example prior art"?

    This story is about a new feature people...it's not about a patent. Wipe the froth from your mouths and comment on the merits (of lack of) the feature...not on a completely fabricated hypothetical comment meant to incite you into a frenzy.
  • This does sound like a useful service. Hooray!

    But technically this isn't impressive. I worked on programs that did full-text document searches about 20 years ago, and they weren't new then. So simply doing full-text searches in documents is just no big deal. But what about the large number of books, you say? Actually, that's nothing more than what they already do. I believe the scale of website text far exceeds the scale of the book text that they can search. The Wikipedia [wikipedia.org] is simply one of millions o

  • http://www.amazon.com/exec/obidos/tg/browse/-/101 9 7041/104-8011833-2636716

    In particular:

    Why won't Search Inside the Book let me see more pages from a specific book?

    Our Search Inside the Book feature is designed to help our customers discover new books and ensure that they'll be satisfied with their purchases. To be fair to the publishers and authors who participate in our program, we only allow Search Inside the Book users to read a portion of the book.

    So, you can only see some % of the books pages as

One man's constant is another man's variable. -- A.J. Perlis

Working...