Forgot your password?
typodupeerror

Lucene and SOLR Get Commercial Support 47

Posted by ScuttleMonkey
from the foss-going-mainstream dept.
ruphus13 writes "Two of the technical leads and core committers of the Lucene Project have launched Lucid Imagination, a venture backed company now offering commercial versions of Lucene and SOLR in the hopes of making it the de facto choice of search technologies used by companies within their products. 'The Lucene search library ranks amongst the top 5 Apache projects, installed at over 4,000 global companies. Although OStatic is primarily Drupal-based, our site's search is based on Lucene. According to Lucid Imagination officials, the Solr search server, which transforms the Lucene search library into a ready-to-use search platform for building applications, is the fastest growing Lucene sub-project...Lucid's business model is roughly comparable to Red Hat's very successful model, in that it centers on support and services for free, open source software.'"
This discussion has been archived. No new comments can be posted.

Lucene and SOLR Get Commercial Support

Comments Filter:
  • oookay. (Score:5, Insightful)

    by girlintraining (1395911) on Friday January 30, 2009 @06:32PM (#26673057)

    Nice press release but.. what does it do? O_o Five million dollars and they couldn't even buy a one sentence description of their product. Standards are slipping.

    • Re:oookay. (Score:4, Insightful)

      by Azar (56604) on Friday January 30, 2009 @06:40PM (#26673153) Homepage

      "...in the hopes of making it the defacto choice of search technologies used by companies within their products. 'The Lucene search library ranks amongst the top 5 Apache projects... According to Lucid Imagination officials, the Solr search server, which transforms the Lucene search library into a ready-to-use search platform for building applications...

      I agree, it could have been more explicit in giving a brief description, but was it really that difficult to glean what it does from the summary?

      • Re: (Score:1, Insightful)

        by Chabo (880571)

        I read the summary twice and it just made my head spin.

        There's a big presumption in the summary that we've heard of Lucene before. I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?

        • by sonsonete (473442)

          I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?

          In short: yes.

          1. Lucene can be set up to search just about anything—the web, a network, your desktop, a database, or anything else you can tell it to read.
          2. Solr provides a web interface to Lucene.
          3. Lucid Imagination contributes to the Lucene and Solr projects and provides commercial support for users of the software.
    • Re: (Score:3, Informative)

      by FooBarWidget (556006)

      Lucene is a full-text indexer and search library. Solr is a full-text indexer and search server, based on Lucene.

    • full-text search (Score:5, Informative)

      by CarpetShark (865376) on Friday January 30, 2009 @07:18PM (#26673493)

      Nice press release but.. what does it do?

      You mentioned SQL SELECTs elsewhere. Full-text search isn't like a SELECT. It's more like what what happens when you google something: many documents are searched in a split second, and complex queries can be done, like documents containing a phrase, but not this one, or documents that mention X with Y within a few sentences of that, or documents that mention X and Y, but not Z. Yes, SQL lets you do that, but not for text, except in very inefficient ways.

      From what I've seen of it (which is very little), Lucene lets you, as a programmer, index data using your own field names. So, say you're indexing word documents and HTML documents. You can extract most of the text and index it as "maincontent", but seperately extract the author, title and subtitle, indexing those individually. This lets you query attributes, like: "space nasa and not genre:sci-fi". Full text search also does ranking based on the occurences of different words you query by, etc. Presumably Lucene would let you specify which fields/attributes are included in a search, and which ones have the highest scores in search results, for instance.

      Yeah, I don't get where $5m USD went on that either. I didn't think it was THAT big a problem. But maybe it is. Personally, I'm holding out for a decent Triple API, which hopefully make all but the indexer of this obsolete.

      • Yeah, I don't get where $5m USD went on that either. I didn't think it was THAT big a problem.

        Getting it right, and doing it as well as Lucene does (which is spectacularly well), really is THAT big a problem.

      • Re: (Score:2, Informative)

        by Wokan (14062)

        Nice press release but.. what does it do?

        From what I've seen of it (which is very little), Lucene lets you, as a programmer, index data using your own field names. So, say you're indexing word documents and HTML documents. You can extract most of the text and index it as "maincontent", but seperately extract the author, title and subtitle, indexing those individually. This lets you query attributes, like: "space nasa and not genre:sci-fi". Full text search also does ranking based on the occurences of different words you query by, etc. Presumably Lucene would let you specify which fields/attributes are included in a search, and which ones have the highest scores in search results, for instance.

        You've certainly hit close to the mark. I work on a site that uses Solr and it does work just as incredibly as others have said. You can tell it what fields you want to search. You can tell it what order you want results sorted in (and you can sort on more than one column in cases of relevancy ties). You can tell it you want matches in one column weighted more than another. You can tell it you want the terms to be within X words of each other. And you can tell it what words should not be in the result

    • One of the most interesting fields where Lucene is useful (probably also for you) is Wikipedia. Remember how painful it was to search something on Wikipedia some months ago?

      Well now, thanks to Lucene, Wikipedia (and its sister projects) don't have to use the in-build MediaWiki search engine (which really is crappy). Probably the best feature Lucene brings is "Did you mean ...". Google is still better, but Lucene was a big step for Wikipedia.

      • Wikipedia has been using Lucene for a few years by now. The recent changes were improvements to how it was used, but it was being used the whole time. Out of the box, MediaWiki uses whatever fulltext search is available from the DBMS being used -- in MySQL's case, that means using MyISAM, which is impossible for a site the size of Wikipedia (all selects, updates, deletes, etc. take out table-level locks).

  • Talk at the water cooler was that Sun was taking an interest in them to expand their open source catalog. All in all, they're probably a lot better off going it alone in the current market. With companies looking to save money by going open source, it's a great time for OS support.
    • That and possible large government projects. With Obama wanting to increase government projects and more transparency, along with save money, OSS is a great way to do it and I believe that Sun has already written to Obama about switching to all OSS. So Sun wanting to acquire more OSS vendors certainly makes sense.
  • Nice going for Lucene (LGPL?), although i've preferred Xapian (GPL) in the past (with python bindings).

    Good to have choice, i guess.
    • by Krischi (61667) on Saturday January 31, 2009 @01:20AM (#26675349) Homepage

      I agree, Xapian is nice, and we considered it for a while. However, in the end, the decision was made to use SOLR because of one overriding factor in its favor: it takes care of all the nasty details to enable concurrent access, which makes developing web applications just so much easier. With SOLR you just don't have to worry about who might currently be reading or writing to the index, and the index replication features are very powerful, too.

      That, and facet searches are very nice, too (e.g., searching for a keyword and then automatically displaying the # of hits per category, and refining per category).

      SOLR has Python bindings, too, by the way. They currently are not in the official repository, but recently maintenance on them has picked up, and they work in a very Pythonic way.

  • by merreborn (853723) on Friday January 30, 2009 @07:01PM (#26673347) Journal

    We're currently using the Zend PHP port of Lucene. It was nice, because we were able to use all our existing code for loading our PHP objects from the database for indexing. It worked fine, as long as are indexes stayed small.

    Now we have several indexes weighing in at around 300+ megabytes, and Zend Lucene has proven to be absolute crap. It takes seconds of CPU time, and hundreds of megs of ram to process simple queries against these indexes. When tested in Luke [getopt.org], the same queries against the same indexes finish in milliseconds with minimal memory usage. Either the Zend port, or PHP itself is clearly unsuitable for production use on large indexes.

    Either way, we're going to switch it out for Solr ASAP, and we anticipate the development overhead should be minimal -- we'll keep using the same code to load our objects, and pass them to Solr via JSON.

    • by Sentry21 (8183)

      Either the Zend port, or PHP itself is clearly unsuitable for production use on large indexes.

      You phrase this in such a way as to imply an exclusion, when really both are often true. We've ported our PHP application to Rails (which provides a different, but workable, set of problems), and we've rid ourselves of the Zend engine in return for Ferret; I'm a proponent of replacing that with SOLR, but we've yet to go down that path.

    • by WoLpH (699064) on Friday January 30, 2009 @07:32PM (#26673581)

      That's because the Zend Lucene library is written in pure PHP, ergo... _really_ slow. Either use a C module or get SOLR to get it fast. In my simple tests the Python lucene libraries were about 100-500 times faster than the Zend PHP version, it's really one of the worst Lucene libraries around (in terms of speed).

      • Re: (Score:1, Informative)

        by Anonymous Coward

        I found the original Java libraries to be plenty fast as well. We index millions of records, and it's always been plenty fast returning even the most complex queries. Granted, it probably isn't as fast as the C library, but it is the most updated and feature rich. And, many of those later features that the C library lacks makes it COMPLETELY worth it.

      • Yes the Lucene php version is very very slow (very)

        I recently switched to sphinx (http://www.sphinxsearch.com/) its written in C and compiles nicely on my linux servers, indexes documents at crazy speeds and theres piles of options

        I highly recommend above (use it on 200,000 queries a day vertical search engine for one of our sites)

        • by tcopeland (32225)

          > I recently switched to sphinx (http://www.sphinxsearch.com/) its written in C

          Minor nit - it's in C++. But yeah, it's totally awesome - fast when indexing, easy to scale horizontally, powerful query language, custom stop word lists, etc, etc. The APIs (I use the Ruby one, Riddle) make it easy to do nifty excerpt formatting (for example, note the highlighting around the word 'battle' [militarypr...glists.com]), and there are a couple of different ways to integrate it into a Ruby on Rails app.

          Speaking of Sphinx and Rails, here's

        • by Ythan (525808)
          As a satisfied user I just wanted to give another shoutout to Sphinx. It really is fantastic, better than Lucene if you want something lightweight and easy to configure, and the speed and relevance of search results are excellent. Commercial support is available and it's being used on Craigslist and The Pirate Bay among other notable sites. Anyone who's struggling with MySQL's anemic fulltext search would do well to give it a look.
    • What I ended up doing for various webapps (PHP and Python, although Python's port of Lucene actually loads the Java runtime, and is fairly fast) is create a simple local server that a PHP script can communicate with over sockets and a trivial protocol.

      This is fairly straightforward for me since most of the time I just want Lucene to return a list of document IDs. I use those IDs to create a temp table that I can do additional queries against in SQL.

      Running it as a separate server allows me to use the origin

  • I've heard great things about Lucene (guy at the company I used to work for swears by it, he used it for anything from searching B2B stores to biological indexing). Both Hibernate [hibernate.org] and Spring [java.net] have support for this library.

    I'm looking into adding search on my site so I should probably check it out. There's a new "In Action" [amazon.com] book out for using the Hibernate Lucene add-on -- I might have to pick that up.

FORTRAN is a good example of a language which is easier to parse using ad hoc techniques. -- D. Gries [What's good about it? Ed.]

Working...