Yandex Search Algorithm Palekh Announced Today

Earlier today in Russia, the local leading search engine, Yandex announced a new search algorithm called “Palekh.” This new Yandex search algorithm is aimed at improving how Yandex handles long-tail search queries by better understanding the meaning behind every query.  Similar to the way Yandex has named its other search algorithms after Russian cities, Palekh is named after the city because of the fire bird on its coat of arms that has a long tail.

Long tail search queries are categorized by searches with several words that can often describe something when a user doesn’t know the exact phrase or word but wants the search engine to produce those results. For instance, writing a description of a movie without knowing the title like “a movie about a guy growing potatoes on some planet.”

Yandex handles over 100 million long tail search queries per day, which is roughly 40% of all the queries.  The new Yandex search algorithm Palekh will improve how Yandex produces search results by using neural networks to understand the queries better instead of looking for similar words. Yandex will be using  neural networks as one of 1500 ranking factors. Yandex taught its neural networks to see the connections between a query and a document even if they don’t contain common words.

This is done by converting the words from billions of search queries into numbers (with groups of 300 each) and putting them in 300-dimensional space – now every document has its own vector in that space.  If the numbers of a query and numbers of a document are near each other in that space, then the result is relevant. This technology is called a “semantic vector” that  helps Yandex understand the meaning behind every query and not just look for similar words. In addition to this, other targets such as long click prediction, CTR, and “click or not click” models are also teaching the neural network.

Yandex plans to use this technology in the future and teach it to see not only headlines of documents but also their texts.