Today, Yandex released a new neural network based search algorithm to better understand users’ intent and handle long-tail queries. The new algorithm is named after Korolyov, a Russian satellite town northeast of Moscow that has long served as the center of Russia’s space exploration.

Korolyov significantly improves upon its predecessor, Palekh, the neural network based algorithm Yandex released in late 2016.  Korolyov matches the meaning of a search query to all of the content of a web page, whereas Palekh only looked at headlines.  Yandex also applies Korolyov to a far greater number of the top relevant pages than Palekh – 200 000 vs 150 per search query.  Like all modern AI-based systems, Korolyov improves itself with the more data it gets from queries.  As the largest search engine in Russia, Yandex users are helping the algorithms to continuously provide a high-quality search experience.

How does Korolyov work?

  • To better understand users’ intent, the Yandex search team trained neural networks with information from billions of search queries and crawled pages which it reduced into numbers.
  • Korolyov creates a semantic map: it assessesthe proximity of the numbers that represent the meanings of words on web pages in its index and then matches those to the numbers that represent the search queries.
  • This algorithm then feeds into MatrixNet, Yandex’s proprietary machine learning ranking algorithm, which considers results from Korolyov and a number of other ranking factors before search results are returned to the user.

How do Yandex users benefit from this new algorithm?

The algorithm is meant to better understand user intent and handle long-tail queries.  The search engine will have an improved understanding of what users mean, allowing it to process queries with missing information better. Here are a few examples: a few ingredients to a recipe the user doesn’t know the name of or a question on who proved that the Earth is round. The more searches the engine receives, the better it gets, as it learns every second from millions of searches.