How Do People Feel About Saving Sea Turtles?

Sentiment Analysis of #savetheturtles tweets using VADER

Photo by Jeremy Bishop on Unsplash

Like many of my peers, climate change and environmental sustainability is something I always knew and cared about, but failed to act on until I saw a particular viral video of a plastic straw being extracted from a sea turtle’s nostril.

(It’s as horrifying as it sounds).

Since, I’ve been forced to think more about my plastic consumption and have avidly adopted the use of metal straws to lessen my waste footprint.

Straws are particularly dangerous because of their small, lightweight structure, which not only get caught in unfortunate turtle nostrils more easily, but also straws to fail getting recycled more.

In fact…

“Because they’re made of relatively thin material, straws break down into smaller plastic particles known as microplastics more quickly. They’re also not easily recyclable in most facilities. According to EcoCycle, roughly 500 million disposable straws are used by Americans daily.”

Thankfully, the viral video has significantly impacted the #StopSucking movement already. Back in July, Seattle became the first U.S. city to ban plastic utensils and straws and Starbucks announced its plan to phase out plastic straws at all its stores by 2020!

High five! // Photo by Tanguy Sauvin on Unsplash

But it also got me wondering — 6 months post-virality, how much impact does the #SavetheTurtles movement still have? What’s the general sentiment around the straw movement? I decided to scrape the Twitter hashtag #savetheturtles and see what I could find.

All code can be found on my Github page.

helenashi95/savetheturtles_sentimentanalysis

Hypothesis

Overall, I thought I’d see a decline in interest over time. The below screenshot of a Google Trends search for “plastic straws” and “sea turtles” shows that there is a spike in searches for “plastic straws” around June-July, which is when the video went viral. However, I believed overall sentiment within the tweets to be positive and support the movement.

As you can see, sea turtles have generated pretty consistent searches across the year, but plastic straws spiked around June — July, which is when the video went viral.

Data & Exploratory Analysis

I used the extremely helpful TwitterScraper to scrape the hashtag #savetheturtles. I was able to get around 1500 tweets from January 2018 to January 2019.

I began my analysis in Jupyter notebooks by preprocessing the code. Using NLTK, I converted text to lowercase, stripped punctuation, and removed stopwords.

From there, I found the most frequently words used in these tweets.

#Calculate frequency.
fdist = nltk.FreqDist(filtered_stopwords)
fdist.most_common(10)

However, there are some redundancies, such as “straw” and “straws”. Therefore, I lemmatized the words to find the root of the word (e.g. “running” and “runs” both get reduced to “run”).

#try again with lemmatized words
from nltk.corpus import wordnet
#create a function that would return WORDNET POS compliance to WORDENT lemmatization (a,n,r,v) 
def get_wordnet_pos(treebank_tag):
 if treebank_tag.startswith(‘J’):
      return wordnet.ADJ
 elif treebank_tag.startswith(‘V’):
      return wordnet.VERB
 elif treebank_tag.startswith(‘N’):
      return wordnet.NOUN
 elif treebank_tag.startswith(‘R’):
      return wordnet.ADV
 else:
      # As default pos in lemmatization is Noun
      return wordnet.NOUN
 
wnl = WordNetLemmatizer()
#create an empty list to store lemmatized words
des_lem = []
def wn_pos(filtered_pos):
 for word,pos in filtered_pos:
      des_lem.append(wnl.lemmatize(word,get_wordnet_pos(pos)))
      #print pos
      #print get_wordnet_pos(pos)
 return des_lem
# Get the 10 most common words
fdist_2 = nltk.FreqDist(wn_pos(filtered_pos))
fdist_2.most_common(10)

As we can see, lemmatizing brings words to the root phrase, allowing us to bypass repetition of words like “straw” and “straws”. This also reveals more of the general theme around these tweets, which include “save” and “help.”

bigrm = nltk.bigrams(filtered_stopwords)
fdist = nltk.FreqDist(bigrm)
fdist.most_common(10)

Looking at the most common bi-grams, it even reveals a bit of a market around businesses that support the movement, such as Deep Blue Decals and Salty Girl!

In addition, I also wanted to see if the viral impact of the video is still affecting tweets 6 months later. To do this, I plotted the number or tweets every day from the beginning of the year to the end.

The number of tweets spike significantly in June, July, and August, reflecting the impact of the viral video. However, we can also see a definite trend afterwards as the number of tweets overall increase in later months to today!

Now onto the Sentiment Analysis…

Based off of various articles, I decided to try the NLTK module VADER to analyze individual tweets on the positive, negative, and neutral sentiment of each tweet.

I found these articles particularly helpful:

nltk.download(‘vader_lexicon’)
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
#showing the sentiment scores for each tweet
for tweet in df[‘text’]:
   print(tweet)
   ss = sid.polarity_scores(tweet)
   for k in sorted(ss):
      print(‘{0}: {1}, ‘.format(k, ss[k]), end=’’)
      print(“\n”)

The resulting output looks like this.

It’s really cool to see this output. The VADER sentiment analyzer outputs four scores:

  • neg: Negative
  • neu: Neutral
  • pos: Positive
  • compound: Compound (i.e. aggregated score)

The neg, neu, and pos scores return a float for sentiment strength based on the input text. The VADER sentiment analysis also returns a compound sentiment score for each individual tweet in the range -1 to 1, from most negative to most positive.

By looking at the compound scores, we can classify each tweet as “positive”, “negative”, or “neutral” (> 0.0, < 0.0, and == 0.0 respectively). We can see the overall distribution of scores with the following code.

summary = {“positive”:0,”neutral”:0,”negative”:0}
for tweet in df[‘text’]: 
   ss = sid.polarity_scores(tweet)
   if ss[“compound”] == 0.0: 
      summary[“neutral”] +=1
   elif ss[“compound”] > 0.0:
      summary[“positive”] +=1
   else:
      summary[“negative”] +=1
import matplotlib.pyplot as pyplot
keys = summary.keys()
values = summary.values()

#add colors
colors = [‘#99ff99’, ‘#66b3ff’,’#ff9999']
pyplot.axis(“equal”) # Equal aspect ratio ensures that pie is drawn as a circle
pyplot.pie(values,labels=keys,colors=colors, autopct=’%1.1f%%’, shadow=True, startangle=90)
pyplot.show()

Positive: 788; Neutral: 413; Negative: 287

By looking at the total distribution of the compound scores of all tweets, we can see that overall, over 50% of the tweets are positive, with 28% neutral and 19% negative.

Although the VADER package is a powerful package, it was not perfect. There are still classification mistakes when you look closely. I believe this is because of the training dictionaries used in many sentiment analyzers to track positivity/negativity and the continued subtleties of the English language.

VADER has many useful approaches for reviews. It can successfully interpret intensity of a positive sentiment, such as when “excellent” is treated as more positive than just “good”. However, these tweets are not explicitly “reviews” and positive tweets may not use such words. For example:

Misspelling of “MAJOR” and use of slang such as “props” means the excited approval in this tweet was lost on our analyzer. Or in this case…

Sarcasm unfortunately not detected!

In conclusion…

Overall, this project showed that the #savetheturtles movement is still going strong! The overall amount of awareness and tweets has increased over time, with an overwhelming majority of positive tweets.

Admittedly, there are a couple of caveats. Scraping the #savetheturtles hashtag implies that most people are retweeting/tweeting to show support for sustainability; it doesn’t make sense that someone hating the straw movement would use this hashtag. In retrospect, it might’ve been better practice to combine tweets of several hashtags or look at occurrence of the hashtag on the main twitter page.

More to try for next time!

Perhaps in the future, I could also implement the retweets and likes columns into the analysis as a measure of positive support. I would greatly appreciate any input or advice into how to do so, or other suggestions you may have!

Liked this article? Please leave a comment and let me know what you think about straws, text analysis, and otherwise!


How Do People Feel About Saving Sea Turtles? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published. Required fields are marked *