I visited a cat shelter one day with the intention to adopt a few cats. The shelter had 50 cats; 25 of them were black, 10 were brown and 15 whites. The shelter worker was colour blind but he had some other techniques to distinguish the colour of the cats. The techniques are, however, not perfect.
I told the worker that I wanted to adopt 10 brown cats. He knew there were 10 and he had to find all 10 for me for a 100% recall. I told him that I was in a hurry and I needed very high precision too. I explained that all the 10 that he retrieved in the first instance had to be all relevant*.
precision = # of relevant (brown cats) retrieved by the worker / # of cats retrieved by the worker
recall = # of relevant (brown cats) retrieved by the worker / # of relevant (brown cats) in total
Find me the brown cats
The worker went off to retrieve the 10 cats he thought were brown. I had a look at what he had retrieved. Only 5 out of the 10 cats were actually brown. I said to the worker that the precision of his retrieval was just 5/10=50% because of the 10 retrieved, only 5 were relevant. The recall was also 5/10=50% because we knew there was a total of 10 brown cats in the shelter and only 5 were retrieved. Both of us were disappointed with the metrics. The aim of 100% recall and high precision was not achieved.
He left the 10 cats with me and told me to hang on. He came back with 5 more which he said should be the remaining 5 brown cats that I asked for. This time, 3 of the 5 were brown. Altogether, the precision of his cat retrieval skill increased by a bit to 8/15=53%. The recall increased substantially to 8/10=80%.
Recall stood at 8 out of 10 cats
He went back and retrieved another 2. This time, none of the 2 was brown. He had so far fetched 17 and only 8 were relevant to my request. The precision dropped to 8/17=47%. Based on what we knew about the number of brown cats in the shelter and how many had been retrieved, we agreed that the recall stood at 8/10=80%.
He repeated the previous attempt 7 times, every single time thinking that he had brought back the remaining 2 brown cats. As a result, 14 more cats were retrieved and there were anything but brown. The precision at that stage dropped further to 8/(17+14)=26%. The recall remained at 8 out of the 10 brown cats.
Give me all the cats
Finally, I told the caretaker that the more he retrieved, the less precise his retrieval became and the process was taking more time than I anticipated. I gave up on the 100% precision a long time ago. At that stage, all I wanted was 100% recall. I suggested that he bring me all the remaining cats in the shelter. This way, we can be sure that the recall will reach 100%. He agreed and came back with the remaining 19. Of course, my remaining 2 brown cats were amongst these last 19 cats from the shelter. The recall was at that point 100%, but the precision was eventually 10/50 = 20%. The precision told me that he had to retrieve all the cats from the shelter for me to find what I wanted.
Retrieval vs recall
I thanked the shelter worker. As I walked away with the 10 brown cats, I thought about how poor the worker’s cat retrieval skill was. Ultimately, we knew that the 100% recall was a must and the high precision became an unrealistic goal to have. Retrieval was just the process to get us to the outcome that we wanted. In other words, recall and precision are just metrics to help us gauge the shelter worker’s retrieval ability. Higher retrieval volume does not equate to higher recall. We saw that recall stood still at 80% several times as the worker retrieved more and more. If the first 10 cats retrieved by the worker were all brown, the recall would have been 100% but with very low retrieval volume. Compare this to the 100% recall that was only achieved after the worker retrieved all the 50 cats.
How does this apply to search?
It should not be too hard to imagine how the concepts of retrieval and recall originated from search. In fact, together with precision, they are some of the most fundamental concepts in search. We can easily draw parallel between the brown cats and the documents that meet one’s needs. Obviously, in the world of search, users’ expression of needs is not as straightforward as saying “I want 10 brown cats”. Similarly, documents do not easily fall into clear categories of colours. For these reasons, establishing precision and recall are not always the easiest in real world search products.
*No cats were harmed in the process of writing this article.