Friday, March 9, 2018

Analyse emotion in OSCAR movies 2018 with Google Apps Script and Google Natural Language API

And OSCAR goes to...

There are a lot of indicators, who predict the winner's movie. One of them, no doubt, is emotions. When the screenplay is written in a way to change emotion from positive to negative and vice versa, it is creating a big impact on peoples mind.

Nowadays you can you Natural Language Processing without deep knowledge of all things like syntax analysis, grammar, external libraries. Especially big companies like Google, Amazon or Microsoft invest a huge amount of money to improve their algorithm and provide as service/API.

I am always interested in easily and fast proof-of-concept, so today I will show you how I put these things together.

I have chosen a category for Best picture (90th Academy Awards) and these movies
  • Call Me by Your Name – Peter Spears, Luca Guadagnino, Emilie Georges, and Marco Morabito
  • Darkest Hour – Tim Bevan, Eric Fellner, Lisa Bruce, Anthony McCarten, and Douglas Urbanski
  • Dunkirk – Emma Thomas and Christopher Nolan
  • Get Out – Sean McKittrick, Jason Blum, Edward H. Hamm Jr., and Jordan Peele
  • Lady Bird – Scott Rudin, Eli Bush, and Evelyn O'Neill
  • Phantom Thread – JoAnne Sellar, Paul Thomas Anderson, Megan Ellison and Daniel Lupi
  • The Post – Amy Pascal, Steven Spielberg, and Kristie Macosko Krieger
  • The Shape of Water – Guillermo del Toro and J. Miles Dale
  • Three Billboards Outside Ebbing, Missouri – Graham Broadbent, Pete Czernin, and Martin McDonagh

1. I downloaded subtitles from server Opensubtitles and saved these files (.srt) to my Google Drive.

2. I created a new project in Google Apps Script and setup Google Natural Language API endpoint in menu Resources -> Cloud Platform -> left menu API & Services -> Library -> Cloud Natural Language API.
Note: Google Apps Script creates a new Google Cloud project for you, so you don't have to create it yourself.

In Google Cloud platform dashboard I also got API key, which identifies applications.

API & Services -> Credentials > Create credentials -> API key
(You should setup Application restrictions for HTTP referrer as a minimum)

2. First I did preprocess. I borrow a term from machine learning - bucketing. I aggregated multiple lines of subtitles into time-framed text "window" of length 2 minutes. Here is a code:

3. The rest is simple - iteration over all "buckets" and send each grouped text to Google Natural Language. This API response with two numbers - sentiment and magnitude. In my case, I used sentiment number in the range <-1;1>. Everything is saved in Google Spreadsheet and charts are rendered directly from there.
4. I put together two functions from above:
You can see how sentiment changes during the movie (time is on the x-axis) for the best OSCAR movie of the year 2018:

Rest nominated movies: