Transforming Data into Art: The Evolution of Cover Selection at Netflix
I started thinking it would be a post about Causal Machine Learning, a subject I began studying because of Erick Farias who has been pushing this topic at iFood. However, I ended up diving into a series of posts on the Netflix Blog about choosing covers for shows and how they use a LOT of Machine Learning in this process, while keeping artists and specialists in the loop to ensure quality and creativity.
There are three links in total, and I found it important to highlight the date of each one, as this clearly shows the evolution of this topic over time.
1. Selecting the best artwork for videos through A/B testing (2016)
2. AVA: The Art and Science of Image Discovery at Netflix (2018)
3. Causal Machine Learning for Creative Insights (2023)
I won’t be able to go into detail about all the posts and recommend reading them for those interested in the topic, but I will briefly describe each one and I think it will become obvious the path they have been following.
Selecting the best artwork for videos through A/B testing (2016)
This is one of their most famous posts, which shows how important this choice of art is for the product, as depending on the image, the user may click and watch or not. They explain a bit about the importance of A/B testing and show results in a very simple way from the tests with the program’s artworks.
Additionally, the post addresses the concepts of “explore” and “exploit.” The conclusion is that the “best” image for each show not only generated more clicks but also increased overall user engagement on the platform. A crucial point to note, which will be relevant for the next posts, is that the way of comparing the images and their variations was simpler and more direct.
We can then summarize that in 2016 Netflix discovered how important cover images are and that it is possible to automatically choose these covers through A/B tests or Reinforcement Learning (this is indeed a great use case for this type of model). But we can also notice that there wasn’t much work on the representation and metadata of the image and that the insights from the tests were made more in terms of correlation than actual causality.
“Images that feature expressive facial expressions that convey the tone of the title perform particularly well.”
AVA: The Art and Science of Image Discovery at Netflix (2018)
With the exponential increase in available content and the need to automate and scale this choice of covers, Netflix created AVA.
“AVA is a collection of tools and algorithms designed to highlight high-quality images from the videos on our service.”
In this case, the post focuses on the entire flow and models that Netflix had to create to:
- Create features, metadata, and representations of the show’s frames.
- Process these frames and create a way to filter, choose, and rank these frames.
- Show these “good options” to the artists so they can make the final choice and work on these frames to create the cover options.
As a cinema and series lover, I found this post very interesting because it is a great AI problem that depends on some knowledge of the area. Some examples of information that are annotated and then models are created to extract them:
- Face Detection
- Motion estimation (to avoid images “blurred by camera or character movement)
- Identification of camera type — close-up shot or dolly shot, for example.
- Object detection
- Composition metadata — Heuristics to try to capture some of the fundamental principles of photography, cinematography, and visual aesthetic design. Some examples of composition are the rule of thirds, depth of field, and symmetry
- Actors/Actresses — Models that try to automatically identify if the actors in the images are the main ones of that series or movie
In addition to all this information from the frames, there is a job of ranking the best and grouping and choosing diverse frames that will later be shown to specialists who will do the final work of the covers.
I don’t know if it was clear, but there is a huge job here with various AI models and flows. We have examples of classification, clustering, regression, and all this in a complex context of videos and images that depend on a lot of engineering to scale and many computer vision models.
We then realize that until 2018 the focus was on creating a good representation of the frames and creating a scalable flow that makes the artists’ lives easier and the result better for the product.
Causal Machine Learning for Creative Insights (2023)
Finally, we enter the initial topic of the post, which is about causality and machine learning. I intend to make a post just about this later, but I think it’s important to show the example from Netflix’s own post about AI algorithms and correlation and causality.
“Predictive Machine Learning (ML) models are great for finding patterns and associations to predict outcomes, however, they are not good at explaining cause-and-effect relationships, as their model structure does not reflect causality (the relationship between cause and effect).”
The example given in the post about this is a model that finds the correlation between the price of a Broadway ticket and the number of tickets sold. Being naive, you might conclude that the higher the price, the more tickets will be sold. However, this correlation exists because famous and popular shows tend to have a higher price.
Netflix then decided to focus on causal models to find the factors that lead a show’s art to be highly clicked. With these insights, artists can work more focused and create fewer options in less time, but maintaining or even increasing quality.
An interesting point is that this work is possible because of the A/B test data and all the features created previously. That is, we reached this final point from the work presented earlier.
The post explains step by step the flow to reach the causality of the features from the example of the hypothesis:
Images with faces perform better
The math is a bit more complex, but the post is very didactic in explaining the step by step.
Conclusions
These three posts illustrate well the flow of an AI project, where:
- We start with something simple (A/B testing with different covers).
- We scale this solution, even knowing there is room for improvement.
- We study the subject and collaborate with specialists to create better features and representations.
- We use these features and representations to improve our solution.
- Throughout this process, we use data, features, and models, along with specialists, to improve quality and generate valuable insights for the product/business.
It is important to remember that it took years of work to reach this final solution, involving many systems, a lot of engineering, and many people. Just thinking about the scalability challenge of evaluating each frame of Netflix content, you can already see the complexity of this problem, which involves not only machine learning but also computing in general.