Analyzing 10 Million Inauguration Emoji

During Donald Trump’s inauguration ceremony, we asked our readers to answer a series of questions on BuzzFeed using only emoji. Over the period of a few hours, we received hundreds of thousands of responses, tens of thousands of distinct emoji sequences, and a whopping 10 million emojis.
The Official TRUMP Emoji
The data spells it out loud and clear: The pouting face emoji has been chosen by you all as the official Donald J. Trump emoji.

Graph Analysis
One way for us to gauge the diversity of sentiment among all responses was to model the emoji sequences as a network graph. Graphs give us a great way to understand relationships between items and identify clusters — regions of dense connectivity.
The graph above represents the relationships between all emojis submitted in response to the question “What do you think about Trump’s speech?” Each circle (node) represents an emoji, the lines between them (edges) represent the number of times two emojis appeared together in a sequence, and the colors represent cluster: clear groups of emojis that appeared many more times with each other, compared to the rest.
The green region includes reactions that are clearly supportive of the speech (smiling face, thumbs up, grinning, etc.), while the blue, orange, and purple reactions represent clearly negative sentiment.

What’s especially interesting here is the central position that the US flag holds, effectively acting as a bridge between the two sides of the network. It is the emoji that’s most used with those supporting and opposing Trump’s speech.

Pointwise Mutual Information
Using a measure called pointwise mutual information (PMI), we can start to gauge the relationship between items that appear in sequences. PMI is a measure of association used in information theory and statistics. It is a great way to find collocations and associations between words in sentences — and also emojis in sequences.
A high PMI score between two items means that the probability of co-occurrence is slightly lower than the probabilities of occurrence of each of the items separately. For example, word pairs such as “puerto” + “rico,” “pay” + “attention,” and “nobel” + “prize” have high PMI values. These are combinations of words that are closely affiliated with each other.
By computing the PMI scores for both the thumbs-up and thumbs-down characters in relation to all other emojis, we can effectively organize the range of relationships (similarities and differences) between each sentiment (pro/con) and the rest of the emojis.

The Russian flag, interestingly enough, sees low PMI values, likely due to the fact that it doesn’t consistently appear with one of the sides (content vs. disappointed). With that, you can see how this plot helps us organize the range of emotions and emojis, from the happy, smiling, and joyous to the saddened, disappointed, and confused faces.
Here’s the full post:
Join Our Team!
Come play (with our data)! We’re hiring for full-time positions in NYC/LA as well as summer internships.