Microsoft Open-Sources a Crucial Algorithm Behind Its Bing Search Services (techcrunch.com) 55
An anonymous reader quotes a report from TechCrunch: Microsoft today announced that it has open-sourced a key piece of what makes its Bing search services able to quickly return search results to its users. By making this technology open, the company hopes that developers will be able to build similar experiences for their users in other domains where users search through vast data troves, including in retail, though in this age of abundant data, chances are developers will find plenty of other enterprise and consumer use cases, too. The piece of software the company open-sourced today is a library Microsoft developed to make better use of all the data it collected and AI models it built for Bing .
With the Space Partition Tree and Graph (SPTAG) algorithm that is at the core of the open-sourced Python library, Microsoft is able to search through billions of pieces of information in milliseconds. Vector search itself isn't a new idea, of course. What Microsoft has done, though, is apply this concept to working with deep learning models. First, the team takes a pre-trained model and encodes that data into vectors, where every vector represents a word or pixel. Using the new SPTAG library, it then generates a vector index. As queries come in, the deep learning model translates that text or image into a vector and the library finds the most related vectors in that index. The library is now available under the MIT license and provides all of the tools to build and search these distributed vector indexes. You can find more details about how to get started with using this library -- as well as application samples -- here.
With the Space Partition Tree and Graph (SPTAG) algorithm that is at the core of the open-sourced Python library, Microsoft is able to search through billions of pieces of information in milliseconds. Vector search itself isn't a new idea, of course. What Microsoft has done, though, is apply this concept to working with deep learning models. First, the team takes a pre-trained model and encodes that data into vectors, where every vector represents a word or pixel. Using the new SPTAG library, it then generates a vector index. As queries come in, the deep learning model translates that text or image into a vector and the library finds the most related vectors in that index. The library is now available under the MIT license and provides all of the tools to build and search these distributed vector indexes. You can find more details about how to get started with using this library -- as well as application samples -- here.
Re: (Score:1)
24 comments as I type this. Find me a single useful comment in the lot that isn't racist or douchebaggy. This site is 1 step up from 4chan.
Re: (Score:2)
at least 4chan is usually pretty funny.
Re:I don't believe it. (Score:4, Funny)
They "hope developers will be able to build similar experience". Trouble is... similar to Bing. I don't know, it's like Yugo open-sourced the design of their cars or something.
I'd say it hurts the open-source community more than anything else. As in perpetuating the "open-source software is free but it's always kind of meh" stigma.
Re: (Score:1)
Bing is hands down the best search engine for porn.
Microsoft is hoping developers can use the technology to accelerate porn searches on other platforms
Re: (Score:2)
Considering every single upper-echelon position has turned over since than, sometimes many times over... yes.
Re: (Score:2)
That view emanates directly from the snowflake-culture internal narrator where FOMO prevails above all things.
I'm here to learn you up on the millennial stigma stigma, concerning people who sigh "meh" over powerful algorithms whose existence was barely suspected on their own date of birth.
I listened to a podcast recently interviewing American historian Jill Lepore. She described her millennial students as follows: able
Re:I don't believe it. (Score:4, Funny)
Re: (Score:2)
Search Your Harddisk Porn Quicker... (Score:1)
SPTAG sounds like a SPAM knock off (Score:2)
Actually sounds useful (Score:3)
I'm not a Microsoft fan in general but code is code and this sounds like a solid contribution and with a bit of modification could be useful in certain deep learning applications.
All the usual disclaimers apply, beware any azure or other MS service, platform hooks, or back doors. In general Kudos MS.
Re: (Score:2)
I'm not a Microsoft fan in general but code is code and this sounds like a solid contribution and with a bit of modification could be useful in certain deep learning applications.
Really? What kind of deep learning applications can you think of, here?
Re:Actually sounds useful (Score:4, Funny)
Re: (Score:2)
Anything that is storing and indexing a large volume of arbitrary data really could potentially make use of this algorithm. A number of deep learning techniques are starting to make use of a memory of previous inputs. For instance if you are training on a massive catalog of the Gutenberg works etc. This isn't just pigeon-holed to the web itself.
Though that works as well. IBM's AI won jeopardy utilizing a technique that indexed and looked up results relating to words in the questions. This provides an effic
Annoying Language (Score:3)
It might seem a bit nitpicky not bring this up, but quotes such as:
.. able to search through billions of pieces of information in milliseconds ...
should be responded to.
Uh, no. What you might be able to do in milliseconds is search through an index (excuse me, a vector index. It sounds more science-y) that represents billions or "pieces of information." Not the information itself. What that means is if the overall data set wasn't inverted with search terms you wanted to use in mind you aren't finding what you want out of billions of records in milliseconds.
Having said that it is amazing how far search engines have come. But I just find over-hype tiresome.
Is it similar to this? (Score:2)
Re: (Score:1)
You've got some real balls, you know that? Do you know how many software patents that code you just posted violates?
I hope you've got a good lawyer.
I don't mind using Bing search (Score:1)
Never been a fan of Cortana but Bing works just fine for most of search uses. Occasionally I go back to Google search for comparison. I definitely do not see much advantage in DuckDuckGo, unless your a privacy focused person its really not great, I would use Bing before DDG. Google stuff is sort of habitual for most people, its what many got used to.
Re: (Score:2)
Very fast Bing library (Score:1)
Great! (Score:2)
As much as i dislike MS (i'm an old guy, i've seen their most ugly side), i can't complain about any company making software available under an approved open source license, even though i will probably never use it myself.