Skip to content

SENTIMENT ANALYSIS USING NATURAL LANGUAGE PROCESSING

Sentiment Analysis – Understanding and Analyzing Opinions Using Natural Language Processing

Sentiment analysis is one of the most popular and useful applications of natural language processing (NLP). It is a type of analysis that helps to identify and categorize opinions expressed in a piece of text as either positive, negative, or neutral. It is essential for companies to understand the sentiment of their customers, as it can be used to make decisions and inform strategy. In this article, we will discuss sentiment analysis and how it is used to understand and analyze opinions expressed in a text.

What is Sentiment Analysis?

Sentiment analysis is an automated process that uses natural language processing (NLP) to identify and categorize opinions expressed in a piece of text. It is used to analyze customer feedback, product reviews, and other types of text-based data to gain insights into customer sentiment. The goal of sentiment analysis is to gain an understanding of the overall opinion expressed in a piece of text.

Sentiment analysis is a process that involves two main steps:

  1. Identifying the sentiment: This involves analyzing the text to identify the overall sentiment, whether it is positive, negative, or neutral.
  2. Categorizing the sentiment: Once the sentiment has been identified, it is then categorized into one of the three categories.

How Does Sentiment Analysis Work?

Sentiment analysis is based on the idea that words can convey sentiment and emotion. The process of sentiment analysis involves using natural language processing (NLP) to identify and classify the sentiment of a piece of text.

The first step in sentiment analysis is to pre-process the text. This involves removing stop words, punctuation, and other irrelevant words. Once the text is pre-processed, the sentiment can be identified. This is done by using natural language processing (NLP) algorithms to identify the sentiment of the text.

Once the sentiment is identified, it is then categorized into one of the three categories: positive, negative, or neutral. This is done by using a set of rules that determine the sentiment of a piece of text. For example, a positive sentiment might be identified if the text contains words such as “happy”, “joyful”, and “excited”.

Sentiment Analysis Using NLTK

Python’s Natural Language Toolkit (NLTK) is a powerful library for performing sentiment analysis. NLTK provides tools for tokenization, part-of-speech tagging, and sentiment analysis. In this section, we will discuss how to use NLTK to perform sentiment analysis on a dataset of Twitter samples.

The first step is to load the NLTK package. To do this, you need to import the necessary libraries.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import nltk
from nltk.corpus import stopwords
from nltk.classify import SklearnClassifier
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
%matplotlib inline
Once the packages are loaded, you can then load the input data. This data should be in the form of a CSV file. To do this, you can use the pandas library to read the data into a dataframe.

data = pd.read_csv(‘../input/sentiment.csv’)
data = data[[‘text’, ‘sentiment’]]

Once the data is loaded, you can then split the data into a test and train set. For this sentiment analysis problem, you should also drop all neutral sentiments. To do this, you can use the train_test_split function from the sklearn library.

train, test = train_test_split(data, test_size=0.1)
train = train[train.sentiment != “Neutral”]

Next, you will need to separate the negative and positive tweets in your training set to easily visualize all contained words. However, prior to doing that, you must clean your text from links, mentions, and hashtags. To do this, you can use the following code:

train_pos = train[train[‘sentiment’] == ‘Positive’]
train_pos = train_pos[‘text’]
train_neg = train[train[‘sentiment’] == ‘Negative’]
train_neg = train_neg[‘text’]

def wordcloud_draw(data, color = ‘black’):
words = ‘ ‘.join(data)
cleaned_word = ” “.join(word for word in words.split() if ‘http’ not in word and not word.startswith(‘@’) and not word.startswith(‘#’) and word != ‘RT’ )
wordcloud = WordCloud(stopwords=STOPWORDS,
background_color=color,
width=2500,
height=200
).generate(cleaned_word)
plt.figure(1,figsize=(13, 13))
plt.imshow(wordcloud)
plt.axis(‘off’)
plt.show()

print(“Positive words”)
wordcloud_draw(train_pos,’white’)
print(“Negative words”)
wordcloud_draw(train_neg)

Once the visualization is complete, you can then remove the mentions, links, hashtags and stopwords from your training set. To do this, you can use the following code:

tweets = []
stopwords_set = set(stopwords.words(“english”))
for index, row in train.iterrows():
words_filtered = [e.lower() for e in row.text.split() if len(e) >= 3]
words_cleaned = word for word in words_filtered if ‘http’ not in word and not word.startswith(‘@’) and not word.startswith(‘#’) and word != ‘RT’
words_without_stopwords = [word for word in words_cleaned if not word in stopwords_set]
tweets.append((words_cleaned, row.sentiment))

test_pos = test[test[‘sentiment’] == ‘Positive’]
test_pos = test_pos[‘text’]
test_neg = test[test[‘sentiment’] == ‘Negative’]
test_neg = test_neg[‘text’]

Extracting Features
The next step is to extract features with NLTK lib. To do this, you can use the following code:

def get_words_in_tweets(tweets):
all = []
for (words, sentiment) in tweets:
all.extend(words)
return all

def get_word_features(wordlist):
wordlist = nltk.FreqDist(wordlist)
features = wordlist.keys()
return features

w_features = get_word_features(get_words_in_tweets(tweets))

def extract_features(document):
document_words = set(document)
features = {}
for word in w_features:
features[‘containts (%s)’ % word] = (word in document_words)
return features

Once the features have been extracted, you can then plot the most frequently used words using the following code:

wordcloud_draw(w_features)

Classifying the Text
With NLTK’s Naive Bayes classifier, you can easily classify the extracted tweet word features. To do this, you can use the following code:

training_set = nltk.classify.apply_features(extract_features, tweets)
classifier = nltk.NaiveBayesClassifier.train(training_set)

To measure how well the algorithm is performing, you need to test it on a set of unseen data. To do this, you can use the following code:

neg_cnt = 0
pos_cnt = 0
for obj in test_neg:
res = classifier.classify(extract_features(obj.split()))
if(res == ‘Negative’):
neg_cnt = neg_cnt + 1
for obj in test_pos:
res = classifier.classify(extract_features(obj.split()))
if(res == ‘Positive’):
pos_cnt = pos_cnt + 1

print(‘[Negative]: %s/%s ‘ % (len(test_neg),neg_cnt))
print(‘[Positive]: %s/%s ‘ % (len(test_pos),pos_cnt))

Conclusion

Sentiment analysis is a powerful tool that can be used to gain insights into customer sentiment. In this article, we discussed sentiment analysis and how it can be used to understand and analyze opinions expressed in a text. We also discussed how to use NLTK to perform sentiment analysis on a dataset of Twitter samples. By following the steps outlined in this article, you can perform sentiment analysis on other datasets, such as movie reviews, product reviews, news comments, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *