Explainable Oscar Predictions

Oscars x XAI

Today AI is being used to predict all kinds of events of public interest, such as the outcome of elections, college basketball games and even things like who would die in Game of Thrones. Machine learning algorithms also exist to predict the winners of the Academy Awards and I recently put such a model, one that I have been fine-tuning and running in the past few years, to the test. On March 27th, 2022, the winners of the 94th Academy Awards were announced, so let’s have a deep dive into the results and evaluate the model’s performance.

The model can be used to predict the winner of six Academy Award categories: Best Picture, Best Director, Best Actress, Best Actor, Best Supporting Actress and Best Supporting Actor.

In 2022, it was able to correctly classify five out these six categories, including Will Smith’s win for King Richards.

Oscar predictions 2022

It can be insightful and interesting to focus on the instances that an AI model did not classify correctly. In the case of the 2022 Oscars, this is the Best Picture category, where the model predicted The Power of the Dog to win. In the end it was Apple’s CODA that won the prize for Best Picture.

Predictions coming from machine learning algorithms of course aren’t correct all the time. When they make mistakes it is good to get an understanding of what went wrong. Does the mistake mean that we cannot use the same model next year? Do we need to re-calibrate something?

Explainable AI

In the case of complicated AI models, these types of questions can be hard to answer. Luckily, at DAIN we are experts at explainable AI, which is a family of methods that can make the outcomes of black-box machine learning models understandable to humans. The visualizations to follow are created by DAIN’s own XAI library.

Before we dive into the details, let me discuss the basics about the machine learning model that was used to predict Oscar winners. To train the model, I used IMDB and Rotten Tomatoes to gather information about movies between 1960 and 2022. The features of the model can be grouped into three categories: characteristics of the film itself, Oscar statistics, and the results of other ceremonies of the awards season.

Film characteristics include genre, popular acclaim (as measured by IMDB rating and Rotten Tomatoes’ Audience score), critical acclaim (as measured by Rotten Tomatoes’ Critics score), release quarter and MPAA rating.

I call the second category of features Oscar statistics. These are pieces of information we can gather simply from looking at (historic) nominations. For example, the award for Best Picture is rarely given to a movie whose director was not nominated for Best Director. The last time this happened was with Green Book in 2019, and it only happened three times before that in the history of the Oscars. The opposite relationship is even stronger: the last time Best Director was awarded to the director of a movie without a Best Picture nomination was in 1929. Films with a higher total number of nominations also have a higher chance of winning in the six categories I examined. This year The Power of the Dog had the most nominations (12), followed by 10 nominations for Dune and 7 for Belfast.

The results (and nominations) of other award ceremonies also play an important role. The Academy Awards is the last show of the awards season, and is therefore preceded by similar award ceremonies such as the Golden Globes and the BAFTA, and also by some category specific awards such as the Screen Actors Guild Award. Very often the same movies, directors and actors receive these awards, so the results of these Oscar precursors are very important predictors in the models.

Fundamental XAI plots

To find out exactly how important each of these variables are, we can take a look at one of the most fundamental XAI plots:the feature importance plot. This plot shows how much each variable contributes to predictions on average. We can see that winning DGA is by far the most important variable. DGA stands for Directors Guild of America, and the DGA Award is given out annually to a director. The second most important feature is the Best Drama category of the Golden Globe awards.

These variables are binary. A film either wins an award or it doesn’t. However, the third most important variable, the number of total Oscar nominations is continuous. But importance alone doesn’t tell the full story. Are more Oscar nominations better? Or does the response have some sort of U-shape? Partial dependence plots are perfect to answer this question.

As we can see, the answer is pretty much “the more the better”, but there seems to be a jump in the probability of winning at around 8 predictions. Partial dependence plots can be useful to look at but they can also be unreliable when we are looking at the response of correlated features. For example, movie ratings are correlated. Looking at the partial dependence plot for Rotten Tomatoes audience score and the critics score one-by-one wouldn’t give us the entire story. In this case, it’s better to look at 2D partial dependencies. The next plot shows such a 2D partial dependence plot where the vertical and horizontal axes represent the two Rotten Tomatoes scores and the colors of the plot show the probability of winning given these scores.

What we can read off this plot is that movies with both a very strong audience and critic score, and movies with a lower critics score but a very high audience score have the highest chance of winning a Best Picture Oscar.

We also saw that genre can play an important role in whether a film will win Best Picture or not. But which genres make winning more likely and which ones should Oscar-aspiring filmmakers stay away from? The next plot tells us that biographies and crime movies tend to be associated with a higher chance of winning, contrary to horrors and action films.

Finally, let’s go into a specific prediction. What went wrong with the model’s prediction this year? Why did the algorithm predict The Power of the Dog to win and not CODA?

Looking at how the individual feature values of The Power of the Dog contribute to the final prediction, we see that winning both the Golden Globe for Best Drama and the Director’s Guild of America Award made this film a very strong contestant. Its middling Rotten Tomatoes audience score somewhat lowers the final prediction, but otherwise this could have been an easy win for The Power of the Dog.

CODA on the other hand, which did not win any major awards besides the SAG Award for Outstanding Performance by a Cast, is a comedy and has much lower Oscar nominations in total than The Power of the Dog. Furthermore, CODA was not nominated for the Best Director award. As mentioned previously the link between a Best Picture win and a Best Director nomination is a very strong signal in the data – 2022 is only the fifth time this exception ever occurs in the history of the Oscars.

After looking at all the model interpretability plots and being able to look into individual predictions we now have a better understanding of how our model is making predictions and what we need to look out for in the future. Hopefully, we also have a better understanding of the factors that influence a film’s chances of winning an Academy Award.

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFlare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non-necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	5 months 27 days	Used to store consent of guests regarding the use of cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
tableau_locale	session	We embed Tableau charts and interactivity on some of our pages. These cookies expire at the end of your session.
tableau_public_negotiated_locale	session	We embed Tableau charts and interactivity on some of our pages. These cookies expire at the end of your session.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
_dc_gtm_UA-111640802-1	1 minute	This cookie is used by Google Tag Manager to support Google Analytics on our Sites. It helps us monitor the use and performance of our Sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_JWW0KP3X8Q	2 years	This cookie is installed by Google Analytics 4.
_gat_UA-111640802-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
ai_session	30 minutes	This is a unique anonymous session identifier cookie set by Microsoft Application Insights software to gather statistical usage and telemetry data for apps built on the Azure cloud platform.
ai_user	1 year	A unique user identifier cookie, set by Microsoft Application Insights software, that enables counting of the number of users accessing the application over time.
AnalyticsSyncHistory	1 month	Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries
prism_252943399	1 month	This cookie is used by Active Campaign for site tracking purposes.
visitorId	1 year	By default, the visitor ID is supplied to Coveo UA using the visitor (string) query parameter and kept in the local storage of the user browser. A third-party cookie can also be used to store the visitor ID if the current user browser accepts these kinds of cookies.
WFESessionId	session	These cookies are used by Microsoft Azure Application Insights, which collects site telemetry information, allowing us to analyze how some of our Sites are performing and to perform optimization.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
LinkedIn
muc_ads	2 years	Collects data on user behaviour and interaction in order to optimize the website and make advertisement on the website more relevant.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.

Cookie	Duration	Description
CONSENT	16 years 7 months 20 days 16 hours 15 minutes	No description
GetLocalTimeZone	session	No description
hid	session	No description available.

Explainable Oscar Predictions

Oscars x XAI

Oscar predictions 2022

Explainable AI

Fundamental XAI plots

About DAIN Studios

Details

Computer Vision: Create an API in 60 minutes

Data Governance Roles and Responsibilities

Guiding C-Level Executives Through Business Ethics in the Data and AI Age

DAIN Studios

Studio HELSINKI

Studio BERLIN

Studio MUNICH