Serengeti logo BLACK white bg w slogan

Python Book of Recipes #4 - TURF Analysis: A Gastronomic Journey into Total Unduplicated Reach and Frequency

Boško Savić, Senior Software Developer

TURF stands for Total Unduplicated Reach and Frequency.

 Just as a skilled chef meticulously selects ingredients to create a harmonious dish, we'll employ Python's magic to explore the delectable realm of TURF analysis, guiding us towards the perfect combination of ingredients.

Imagine a company that can make a lot of possible products, but can only support a small number of them, which should it make? For example, different flavors of beer, heated tabaco, soft drinks, bubble-gums, ice cream...

In our scenario of an ice cream company inspired by Quattro ice cream, various flavors can be produced, but only a limited number can be supported. Which combination will be “best four flavors” that will maximize the proportion of people that like at least one of the flavors?


  • Data from at least 100 people, ideally many more, our synthetic dataset has 1000
  • Pandas – a powerful data manipulation library in Python.
  • itertools – a module that provides various functions for efficient iteration and combination of elements.
  • Ipywidgets – a library for creating interactive widgets in Jupyter notebooks.

Data preparation includes converting each measurement to a binary, where 1 indicates that the person likes product(content), and 0 indicates the opposite. Missing data doesn’t work with TURF and imputation can be problematic.

Our example uses data from purchase intention, where 1 represents “Definitely will buy” and “Probably will buy”, and represents 0  everyone else.

image 5
  • Each row represents an observation (person)
  • Each column represents an alternative (flavors liked)

Apply ‘calculate_combination_per_alternative’ function, and voila!

Full code can be found on the link.

image 6

What we got:

Pandas dataframe that contains 3 columns, combinations of flavors, reach and frequency.

  • Reach (%) - Total Unduplicated Reach
  • Frequency – A value of 1363 tells us that if we offer four flavors, the total number of flavors liked across the total sample of 1000 is 1363.

After analyzing the results, we found that the winning combination with the highest reach was:

                White Chocolate + Strawberry + Mint + Coffee = 82.6%.

The second-best combination was:

                White Chocolate + Mint + Coffee + Pistachio = 82.5%.

However, it's important to consider that the 0.1% difference may not be significant, and further analysis is necessary. It would be bold to confidently assert that the 0.1% difference is meaningful and choose the winning combination.

To gain deeper insights and make a more informed decision, additional techniques can be employed. These include analyzing the duplication matrix, conducting correspondence analysis, applying Principal Component Analysis (PCA) and visualizing data using techniques like Venn diagrams. These approaches can help determine which combinations are truly the best.


In summary, TURF analysis combined with Python's capabilities and relevant analytical techniques allows us to identify the optimal combination of alternatives, such as flavors in our ice cream example, based on reach and frequency metrics. This enables businesses to make data-driven decisions and maximize their target audience's satisfaction.

Let's do business

The project was co-financed by the European Union from the European Regional Development Fund. The content of the site is the sole responsibility of Serengeti ltd.