Emotions predicted by examining the correlation between tweets and environmental factors
External factors, ranging from weather, news exposure, social network emotion charge, timing, and mood predisposition may have a bearing on one’s emotion level throughout the day.
Twitter, one of the most popular social media platforms, allows users to share their thoughts with the world and interact with other users by posting, responding to, liking, and reposting tweets. Michigan researchers have discovered that you can predict a Twitter user’s emotions more accurately based on their tweets and external environmental factors.
Traditional natural language processing techniques mainly focus on textual context to predict one’s emotions. This technique has found success, but it does not account for environmental factors that may affect a person’s emotions.
Research fellow Carmen Banea, alumna Vicki Liu, and Prof. Rada Mihalcea explored the concept of grounded emotions, focusing on how external factors, ranging from weather, news exposure, social network emotion charge, timing, and mood predisposition may have a bearing on one’s emotion level throughout the day.
By testing the correlation between certain external factors and Twitter sentiment, they explored which of them are most significant in grounding emotions, and therefore gained a deeper understanding of the connections that exist between external factors and one’s internal emotional state.
The researchers collected a set of tweets published between January 18, 2017 and April 14, 2017 via Twitter, which were self-tagged by their author with a #happy or #sad hashtag. For each tweet, they considered the hashtag to represent the label, capturing the instantaneous emotional state of its author. They also collected the tweet’s remaining content, as well as metadata, such as the time it was published, its author, and its location. The set was filtered based on location so that the collected tweets originated from 20 large US metropolitan areas, making sure that no more than three cities were located in the same state, which allowed them to obtain a large representative sample.
Data from External Factors
To collect weather data they used Weather Underground and were able to obtain information such as temperature, humidity, precipitation, etc., as well as a descriptive phrase summarizing atmospheric conditions, such as “heavy thunderstorms and rain.”
They obtained national news data using the New York Times API by searching for the stories that appeared on the front page of the New York Times within the prior 24 hours of a user’s tweet.
Based on the tweet timestamp, they were able to determine the season, month, day of the week, and hour for when a tweet was authored; they also mapped the hours to several time intervals ranging from early morning, morning, afternoon, late afternoon, evening, late evening and night.
For each tweet, they queried all followees, or people that the given user follows, and gathered tweets posted in the 24 hours prior to the original tweet’s timestamp. Each followee tweet was then tagged as positive or negative.
The researchers speculated that users have a relatively consistent emotional state, where fast changes between happiness and sadness are unlikely. For this reason, they also collected the tweets that the user posted within 12 hours prior to the target tweet.
At an individual feature type level, they discovered that the sentiment extracted from a user’s prior textual content exhibits a high correlation with an emotional response experienced twelve hours later, showing that users are consistent in their emotional states. They also found that the cumulative sentiment expressed in news is the second best predictor of user emotion. By combining all grounding signals together, they were able to obtain an emotional predictive accuracy of 66.9%, surpassing the majority class baseline of 59%.
This study not only shows that external factors do prime us toward emotional responses, but also that the performance of such external features in predicting emotion can surpass the predictive accuracy of natural language processing tools that look at text alone.