Imagine always knowing what to sell, what clothes to wear, what items to stock, where to eat, who to hire, what features to use in your design, etc. Sentiment analysis is an invaluable tool for determining such things. With sentiment analysis, anyone can use statistics and a little programming to gain an edge over the competition. The internet is being populated with unstructured data every second, so why not take advantage of it?
I am willing to bet that you already use sentiment analysis, but you just don't know it. Before buying anything, what do you do? Do you look at reviews? Well there you go, you are the system that is finding the best thing to buy. Now imagine reading ten thousand reviews. In this post, we will be exploring a simple way to automate the sentiment analysis process.
Why is all of this important? Lets say that you want to sell something. Would you want to sell something that people generally dislike? By choosing to sell items that are generally valued, you will be positioning yourself for the best possible return. Additionally, sentiment analysis allows you to gain deep insight into the way customers perceive your brand. By constantly monitoring sentiment and adjusting operations, you will be constantly strengthening your strategy.
In this post I will show you how to use the VADER Lexicon for sentiment analysis. A few things to note about this method:
1. Lexical approach.This method ranks words like "great" as having positive sentiment and words like "horrible" as having negative sentiment. Additionally, this method takes into account capitalization and special characters. For example: "VERY GOOD!" would rank higher in positive sentiment than "very good". This approach does not use any machine learning techniques and does not require any training data, making it extremely simple to use. However, there are several downsides to this non-machine learning approach.
2. Context and Humor.Using the VADER lexicon for sentiment analysis can reduce accuracy in certain situations. For example: "My phone is so thin!" would ideally rank as positive where "The hotel walls are so thin!" would ideally rank as negative. This is not the case when using VADER lexicon. Context and humor are not built into this lexicon and so many comments in a data set may produce inaccurate results.
3. Objective vs Subjective.Subjective statements will tend to produce significant sentiment scores with VADER Lexicon because of the emotions present. Whereas, objective statements are likely to not produce scores that are as significant. Obviously, objective statements are more useful because they are based on facts. This is another drawback to the VADER Lexicon approach for sentiment analysis. For example: "I love this phone" will produce more positive sentiment than "This phone has a fast cpu". Both of these statements should ideally produce positive sentiment, but the objective statement will be labeled as neutral using the VADER Lexicon approach.
Okay, now that you have a basic understanding of the use cases and drawbacks of using the VADER Lexicon, lets talk about how to actually use it. You will need to install nltk, download the VADER lexicon, then you will be ready. See the code below.
pip install nltk
#now enter the python interpreter
>>>import nltk
>>>nltk.download('vader_lexicon')
>>>exit()
#you are ready!! :)
#the lexicon will download into a default location
Now lets practice with a super simple example. Lets make a variable called "statement" and use the VADER lexicon to measure its sentiment score. View the code below and notice how the statement is subjective and straight forward. This statement produces a negative sentiment score as shown in the output below. "neg" is greater than zero and "compound" is less than -0.05 making this a negative statement according to the VADER lexicon. This Blog Post provides a great description on the values returned by the VADER lexicon for reference.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
statement = "the movie was awful"
score = SentimentIntensityAnalyzer().polarity_scores(statement)
print(score)
As you can see, these few lines of code are very handy for analyzing statements from any source. You can use this code to gain insight into how customers perceive your brand or how customers perceive a product.
Now lets go over the drawbacks of using the VADER lexicon with an example. Lets make a negative statement that contains positive words to see if we can trick the VADER lexicon. View the code and output below and notice how the statement should ideally rank as negative, but the output shows otherwise.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
statement = "WOW! The sushi was not good!"
score = SentimentIntensityAnalyzer().polarity_scores(statement)
print(score)
As you can see, the output shows that the probability of a positive statement is greater than the probability of a negative statement. Also, the compound score is greater than 0.05 making this a positive statement according to the VADER lexicon. Keep this in mind when analyzing large data sets because too many statements like this can produce horribly inaccurate results.
I hope you enjoyed this post and learned something. Just remember to be very meticulous when using the VADER lexicon for sentiment analysis and don't forget to check back for more content.