6 Common Pitfalls In Building And Maintaining Search Engines

We actively seek out information for all sorts of reasons. The role of a good search engine is to understand intent on the fly and attempt to serve up information that aligns with the intent. The approach to deciding which results to bring back (i.e., retrieval) and the order in which they should appear in (i.e., ranking) varies depending on the intent. On one end of the spectrum, if the intent is to obtain answers to Wh-questions such as “date of birth of Usain Bolt”, the focus is on generating a short paragraph from the most precise documents as the result (i.e., question answering). At the other end, if the intent requires a high level of certainty that all relevant information appears in the results, then recall is paramount (e.g., patent search).


Broadly speaking, the pitfalls in search can be grouped into two categories. The first is the failure to recognise that search is about the users and their intent. The second is the lack of appreciation that search is an empirical field defined by gradual progression. This second group of pitfalls often manifests as a search team that does not have proper tracking, measurement and evaluation framework in place, and ideas for moving forward come from the highest-paid person. This prevents hypothesis-driven thinking and experimentation to continuously deliver on user satisfaction. As a side effect of that, the misconception that improvement in search is through one or two silver bullets can be rife. Think about it. If silver bullets exist in search, why would online experimentation be front and center of everything the major Web search engines do?

“At any given time, we run anywhere from 50 to 200 experiments on Google sites all over the world.”

We drill into the 2 categories of pitfalls mentioned above:

Not having tracking of how users behave: A search engine without tracking is one that is not set up for growth. Growth implies trajectory and trajectory requires start and end points. Tracking provides for these data points. More importantly, the tracked data in the form of queries and clicks provide for insights into what’s working and what’s not. These in turn can be used to identify weak areas as opportunities for growth. I explained that tracking is something that has to be put in place at the get-go and illustrated some of the many uses of tracked data in Search Engineering 101.

Not testing the ideas that really matter: Everyone has an opinion about the ways a product can improve. The challenge here is knowing which idea might yield the highest return given the effort. Making changes to the search UI or algorithm without a clue about what they address and what success looks like is wasteful. The recommended approach is to identify the areas of opportunities based on data from tracking or customer feedback and have a view about the things that can be done. The answer to whether the changes to the search engine yield the desired outcome will come out of experimentation. The hypothesis-driven thinking and testing is explained in If You’re Not Keeping Score, You’re Only Practising.

Not looking at the right metrics during experimentation: Having a view of what success looks like is part and parcel of experimentation. The metrics that are used and the segmentation of the metrics should reflect the aspects of search that a change is attempting to improve. For instance, if the change is about introducing synonym expansion, what metrics would best capture if the change is having the intended affect? Looking at visit or search volume is pointless, for example. Clicks per search may be more appropriate. But even that, when we look at clicks per search, we need to know the segment that the synonym expansion will have the most impact on such as searches that are traditionally low in result count. The importance of looking at the right metrics at the right level of granularity is discussed in Not Everything That Can Be Counted Counts.

Not able to differentiate the roles of recall and precision: Recall is not retrieval and retrieval is not recall. While they do overlap, pun intended, they are very different concepts. Recall is about retrieving everything that are relevant. Precision is about ensuring that the things retrieved are relevant. A good ranker ensures that the documents that are more likely to be relevant appear at the top, to offer high precision. In applications where the users are not provided with the tools to interfere with the order and the results in the first few pages are all they need, it is a precision game. Web search is a good example. In other cases such as certain vertical search engines, recall is equally if not more important than precision. The understanding of this distinction underpins the design of retrieval and ranking algorithms. These key concepts in search are discussed in Precise Retrieval For Tuning Ranking and Faceted Search Needs Precise Retrieval.

Not being prudent when relying on machine learning for ranking: Despite the hype around machine learning, it is nothing more than a toolbox of techniques for detecting patterns and learning from examples. In the context of search, clicks and other signals of engagement provide for attractive data to train machines on, e.g., if a user click on a result, it is (perhaps) relevant. It is tempting for businesses to view this combination of big data and machine learning as the silver bullet for search. I am often asked why can’t we just model the types of documents that attract clicks for different queries and use that to rank results. The reason is clicks embody a whole bunch of biases, and with machine learning, the models are only as good as the training data. If the click data come from a search engine that was poorly set up to begin with, the machine learnt ranker will be no better. Unless clicks are a very good predictor of relevance in a domain, machines should rely on editorial assessment for training. More about the use of clicks in ranking and machine learning in search in Using Clicks In Ranking and Artificial Intelligence In Search: Part 1.

Not recognising the logical sequence of improvement in search: The likes of Google spent many years improving the fundamental aspects of their search engine, including evaluation framework, retrieval, index of non-English contentquery expansion, ranking with beyond topical relevance signals such as recency and source quality,  better snippets, etc. These basic building blocks are necessary for any search engines to be able to serve decent results for a wide range of scenarios. At the same time, queries, clicks and other engagement data with the search results are collected for evaluation and improvement. The large volume of data collected, when coupled with machine learning allows search teams to understand the users better and to continuously improve the search experience past the point of diminishing marginal returns.

%d bloggers like this: