Sentiment Analysis with Machine Learning: A Comprehensive Tutorial

Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the emotional tone or subjective information expressed in a piece of text. It's the automated process of tagging data according to their sentiment, such as positive, negative, and neutral. In essence, sentiment analysis allows processing data at scale and in real-time. For example, do you want to analyze thousands of tweets, product reviews, or support tickets? Sentiment analysis invites us to consider the sentence, You’re so smart! and discern what’s behind it. It sounds like quite a compliment, right? Clearly the speaker is raining praise on someone with next-level intelligence. Wow, did you think of that all by yourself, Sherlock? This is one of the reasons why detecting sentiment from natural language (NLP or natural language processing) is a surprisingly complex task. In fact, when presented with a piece of text, sometimes even humans disagree about its tonality, especially if there’s not a fair deal of informative context provided to help rule out incorrect interpretations.

Why Sentiment Analysis Matters

Sentiment analysis is a vital topic in the field of NLP. It has easily become one of the hottest topics in the field because of its relevance and the number of business problems it is solving and has been able to answer. Sentiment analysis with Python has never been easier! Tools such as 🤗Transformers and the 🤗Hub make sentiment analysis accessible to all developers.

Here are some key benefits of sentiment analysis:

Customer Feedback Analysis: Analyze reviews, comments, and survey responses to identify issues and improve satisfaction.
Brand Reputation Management: Monitor mentions on social media, forums, and review platforms in real-time.
Product Development and Innovation: Understand which features are well-received and which need improvement.
Competitor Analysis: Compare sentiment around your products with competitors’ products.
Marketing Campaign Evaluation: Measure the effectiveness of campaigns based on positive or negative reactions.

Types of Sentiment Analysis

There are different flavors of sentiment analysis, but one of the most widely used techniques labels data into positive, negative, and neutral. Sentiment analysis is one of the most popular ways to analyze text. Like any area of natural language processing (NLP), sentiment analysis can get complex. In this blog post, we’ll explore some of the most popular packages for analyzing sentiment in Python, how they work, and how you can train your own sentiment analysis model using state-of-the-art techniques.

Here are some types of sentiment analysis:

Read also: BCS Advanced Diploma explained

Binary: This is where the valence of a document is divided into two categories, either positive or negative, as with the SST-2 dataset.
Fine-Grained Sentiment Analysis: Fine-grained sentiment analysis rates sentiment on multiple levels rather than just positive, negative, or neutral. It can categorize text as very positive, positive, neutral, negative, or very negative, often using a numerical scale like 1-5 stars. For example, a Product review ratings on e-commerce platforms.
Continuous: The valence of a piece of text can also be measured continuously, with scores indicating how positive or negative the sentiment of the writer was.
Emotion-Based: Also known as emotion detection or emotion identification, this approach attempts to detect the specific emotion being expressed in a piece of text. You can approach this in two ways. Categorical emotion detection tries to classify the sentiment expressed by a text into one of a handful of discrete emotions, usually based on the Ekman model, which includes anger, disgust, fear, joy, sadness, and surprise. A number of datasets exist for this type of emotion detection.
Emotion Detection: Emotion detection goes beyond polarity and identifies specific emotions in text such as joy, sadness, anger, fear, or excitement. It is also called the lexicon-based method of sentiment analysis and helps understand deeper emotional context.

Read also: Comprehensive Business Analysis Guide
Aspect-Based Sentiment Analysis: Aspect-based sentiment analysis focuses on specific features or attributes of a product or service. For a smartphone review, it separately analyzes battery, screen, camera, and performance to understand customer sentiment for each aspect.
Multilingual Sentiment Analysis: Multilingual sentiment analysis works on text written in multiple languages. It is highly challenging due to variations in grammar, syntax, and cultural expressions across languages but it is essential for global applications.
Intent-Based Sentiment Analysis: Intent-based sentiment analysis identifies the underlying intention behind the text in addition to its sentiment. For example, Detecting purchase intent from reviews mentioning discounts, deals or offers in e-commerce.

Levels of Text Analysis

We can also consider different levels at which we can analyze a piece of text.

Document-level: This is the most basic level of analysis, where one sentiment for an entire piece of text will be returned. Document-level analysis might be fine for very short pieces of text, such as Tweets, but can give misleading answers if there is any mixed sentiment.
Sentence-level: This is where the sentiment for each sentence is predicted separately. For the coffee machine review, sentence-level analysis would tell us that the reviewer felt positively about some parts of the product but negatively about others.
Aspect-based: This type of sentiment analysis dives deeper into a piece of text and tries to understand the sentiment of users about specific aspects. For our review of the coffee maker, the reviewer mentioned two aspects: appearance and noise. By extracting these aspects, we have more information about what the user specifically did and did not like.
Intent-based: In this final type of sentiment analysis, the text is classified in two ways: in terms of the sentiment being expressed, and the topic of the text. For example, if a telecommunication company receives a ticket complaining about how often their service goes down, they could classify the text intent or topic as service reliability and the sentiment as negative.

Use Cases for Sentiment Analysis

By now, you can probably already think of some potential use cases for sentiment analysis. Basically, it can be used anywhere that you could get text feedback or opinions about a topic. Customer feedback analysis can be used to find out the sentiments expressed in feedback or tickets. Product reviews can be analyzed to see how satisfied or dissatisfied people are with a company’s products.

Read also: Internships in Data Analysis

Approaches to Sentiment Analysis

At a general level, sentiment analysis operates by linking words (or, in more sophisticated models, the overall tone of a text) to an emotion. There are multiple ways of extracting emotional information from text.

Rule-Based Approach: The rule-based approach relies on predefined lexicons and rules to classify text as positive, negative, or neutral. It counts positive and negative words using a sentiment dictionary. It handles simple phrases well, including some negations like “not bad”. It is easy to implement, interpretable, and requires no training. However, it is hard to scale, has limited accuracy for complex sentences, and requires continuous lexicon updates. These methods rely on a lexicon that includes sentiment scores for a range of words. They combine these scores using a set of rules to get the overall sentiment for a piece of text. These methods tend to be very fast and also have the advantage of yielding more fine-grained sentiment scores.
Machine Learning Approach: The machine learning (ML) approach trains models to automatically learn sentiment patterns from labeled data. Algorithms include Naive Bayes, Support Vector Machines (SVM), Random Forest, and others. Text is converted into numeric features using TF-IDF or Bag-of-Words. It can handle large datasets and captures complex patterns and relationships. However, it requires large labeled datasets, domain-specific models, and retraining for new domains. These methods train a machine learning model, most commonly a Naive Bayes classifier, on a dataset that contains text and their sentiment labels, such as movie reviews. In this model, texts are generally classified as positive, negative, and sometimes neutral.
Neural Network / Deep Learning Approach: This approach uses neural networks to capture contextual and sequential information in text. Common architectures include RNN, LSTM, GRU, and Transformers. It excels at handling long sentences and context-aware sentiment. It has high accuracy, captures context and nuances, and state-of-the-art performance. However, it is computationally expensive and hence requires significant training data. These methods rely on fine-tuning a pre-trained transformer-based large language model on the same datasets used to train the machine learning classifiers mentioned earlier.
Hybrid Approach: The hybrid approach combines rule-based and ML/deep learning methods to improve both speed and accuracy. It uses the lexicon-based rules for quick initial classification. It uses ML or deep learning to refine predictions and handle complex sentences. It provides better accuracy than individual approaches and is adaptable. However, it is complex to implement and requires the integration of multiple systems.

Popular Packages for Sentiment Analysis in Python

To understand this, let’s use PyCharm to compare two packages, VADER and TextBlob. Their multiple sentiment scores offer us a few different perspectives on our data.

VADER (Valence Aware Dictionary and Sentiment Reasoner): VADER is a popular lexicon-based sentiment analyzer. Built into the powerful NLTK package, this analyzer returns four sentiment scores: the degree to which the text was positive, neutral, or negative, as well as a compound sentiment score. The positive, neutral, and negative scores range from 0 to 1 and indicate the proportion of the text that was positive, neutral, or negative. VADER works by looking up the sentiment scores for each word in its lexicon and combining them using a nuanced set of rules. VADER’s lexicon includes abbreviations such as “smh” (shaking my head) and emojis, making it particularly suitable for social media text. VADER’s main limitation is that it doesn’t work for languages other than English, but you can use projects such as vader-multi as an alternative.
TextBlob: The Pattern package provides another lexicon-based approach to analyzing sentiment. It uses the SentiWordNet lexicon, where each synonym group (synset) from WordNet is assigned a score for positivity, negativity, and objectivity. The positive and negative scores for each word are combined using a series of rules to give a final polarity score. As WordNet contains part-of-speech information, the rules can take into account whether adjectives or adverbs preceding a word modify its sentiment. However, Pattern as a standalone library is only compatible with Python 3.6. As such, the most common way to use Pattern is through TextBlob.
spaCy: Another option is to use spaCy for sentiment analysis. The first method is by using the spacytextblob plugin to use the TextBlob sentiment analyzer as part of your spaCy pipeline. TextBlob can be used through spaCy’s pipe method, which means we can include it as part of a more complex text processing pipeline, including preprocessing steps such as part-of-speech tagging, lemmatization, and named-entity recognition. A second way we can do sentiment analysis in spaCy is by training our own model using the TextCategorizer class. This allows you to train a range of spaCy created models using a sentiment analysis training set. Finally, you can use large language models to do sentiment analysis through spacy-llm. This approach works slightly differently from the other methods we’ve discussed. Instead of training the model, we can use generalist models like GPT-4 to predict the sentiment of a text.

How Sentiment Analysis Works

Step 1: Preprocessing ensures text is clean and standardized for analysis:

Text Cleaning: Remove HTML tags, special characters, numbers, and emojis.
Tokenization: Split sentences into words or tokens.
Stop-word Removal: Filter out common words like "and", "the", "is."
Stemming/Lemmatization: Reduce words to root forms.
Handling Emojis and Slang: Convert emojis or slang to standard words for analysis.

Convert text to numeric representation using:

Bag of Words: Converts text into word-count vectors based on vocabulary.
TF-IDF (Term Frequency-Inverse Document Frequency): Gives higher weight to important words and lower weight to common ones.
Word Embeddings: Represent words as dense vectors that capture meaning and relationships.

Using Pre-trained Sentiment Analysis Models with Python

Now that we have covered what sentiment analysis is, we are ready to play with some sentiment analysis models! Using pre-trained models publicly available on the Hub is a great way to get started right away with sentiment analysis. These models use deep learning architectures such as transformers that achieve state-of-the-art performance on sentiment analysis and other machine learning tasks.

On the Hugging Face Hub, we are building the largest collection of models and datasets publicly available in order to democratize machine learning 🚀. In the Hub, you can find more than 27,000 models shared by the AI community with state-of-the-art performances on tasks such as sentiment analysis, object detection, text generation, speech recognition, and more. This code snippet uses the pipeline class to make predictions from models available in the Hub. You can use a specific sentiment analysis model that is better suited to your language or use case by providing the name of the model. Are you interested in doing sentiment analysis in languages such as Spanish, French, Italian, or German? On the Hub, you will find many models fine-tuned for different use cases and ~28 languages.

Building Your Own Sentiment Analysis Model

In this section, we'll go over two approaches on how to fine-tune a model for sentiment analysis with your own data and criteria. The first approach uses the Trainer API from the 🤗Transformers, an open-source library with 50K stars and 1K+ contributors and requires a bit more coding and experience.

Fine-tuning model with Python

Fine-tuning is the process of taking a pre-trained large language model (e.g., roBERTa in this case) and then tweaking it with additional training data to make it perform a second similar task (e.g.

In this tutorial, you'll use the IMDB dataset to fine-tune a DistilBERT model for sentiment analysis. The IMDB dataset contains 25,000 movie reviews labeled by sentiment for training a model and 25,000 movie reviews for testing it. DistilBERT is a smaller, faster, and cheaper version of BERT. It is 40% smaller than BERT and runs 60% faster while preserving over 95% of BERT’s performance. You'll use the IMDB dataset to fine-tune a DistilBERT model that is able to classify whether a movie review is positive or negative. Once you train the model, you will use it to analyze new data!

As a first step, let's set up Google Colab to use a GPU (instead of CPU) to train the model much faster. You can do this by going to the menu, clicking on 'Runtime' > 'Change runtime type', and selecting 'GPU' as the Hardware accelerator.
You need data to fine-tune DistilBERT for sentiment analysis.
You will be throwing away the pretraining head of the DistilBERT model and replacing it with a classification head fine-tuned for sentiment analysis. Next, let's log in to your Hugging Face account so you can manage your model repositories. You are almost there! Now, it's time to fine-tune the model on the sentiment analysis dataset! And voila! You fine-tuned a DistilBERT model for sentiment analysis! Training time depends on the hardware you use and the number of samples in the dataset. In our case, it took almost 10 minutes using a GPU and fine-tuning the model with 3,000 samples. In our case, we got 88% accuracy and 89% f1 score.
Now that you have trained a model for sentiment analysis, let's use it to analyze new data and get 🤖 predictions! In the IMDB dataset, Label 1 means positive and Label 0 is negative. Quite good!

Training a sentiment model with AutoNLP

AutoNLP is a tool to train state-of-the-art machine learning models without code. It provides a friendly and easy-to-use user interface, where you can train custom models by simply uploading your data. AutoNLP will automatically fine-tune various pre-trained models with your data, take care of the hyperparameter tuning and find the best model for your use case.

Training a sentiment analysis model using AutoNLP is super easy and it just takes a few clicks 🤯. As a first step, let's get some data! The dataset is quite big; it contains 1,600,000 tweets. As you don't need this amount of data to get your feet wet with AutoNLP and train your first models, we have prepared a smaller version of the Sentiment140 dataset with 3,000 samples that you can download from here. Once you add your dataset, go to the "Trainings" tab and accept the pricing to start training your models. The best model has 77.87% accuracy 🔥 Pretty good for a sentiment analysis model for tweets trained with just 3,000 samples! All these models are automatically uploaded to the Hub and deployed for production.

Analyzing Tweets with Sentiment Analysis and Python

In this last section, you'll take what you have learned so far in this post and put it into practice with a fun little project: analyzing tweets about NFTs with sentiment analysis! Then, you will use a sentiment analysis model from the 🤗Hub to analyze these tweets. Finally, you will create some visualizations to explore the results and find some interesting insights. You can use this notebook to follow this tutorial.

Install dependencies
Search for tweets using Tweepy
print('Reached rate limite.
Now you can put our new skills to work and run sentiment analysis on your data! You will use one of the models available on the Hub fine-tuned for sentiment analysis of tweets.
Are they talking mostly positively or negatively? And that is it! With just a few lines of python code, you were able to collect tweets, analyze them with sentiment analysis and create some cool visualizations to analyze the results!

Overcoming Challenges in Sentiment Analysis

One of the biggest considerations is the language of the texts you’re trying to analyze. As texts increase in complexity, it can also be difficult for lexicon-based analyzers and bag-of-words-based models to correctly detect sentiment. Sarcasm or more subtle context indicators can be hard for simpler models to detect, and these models may not be able to accurately classify the sentiment of such texts.

Finally, when doing sentiment analysis, the same issues also come up as when dealing with any machine learning problem. Your models will only be as good as the training data you use. You should also make sure that your targets are appropriate for your business problem.

PyCharm for Sentiment Analysis

PyCharm Professional is a powerful Python IDE for data science that supports advanced Python code completion, inspections and debugging, rich databases, Jupyter, Git, Conda, and more - all out of the box. In addition to these, you’ll also get incredibly useful features like our DataFrame Column Statistics and Chart View, as well as Hugging Face integrations that make working with LLMs much quicker and easier.

If you’re now ready to get started on your own sentiment analysis project, you can activate your free three-month subscription to PyCharm. Click on the link below, and enter this promo code: PCSA24.

The first thing we need to do is load in the data. The training dataset has 3.6 million observations, and the test dataset contains 400,000. Let’s now get the VADER and TextBlob scores for each of these reviews.

Typically, this would be the point where we’d start creating a bunch of code for exploratory data analysis. This might be done using pandas’ describe method to get summary statistics over our columns, and writing Matplotlib or seaborn code to visualize our results. We can see a button in the top right-hand corner, called Show Column Statistics. Clicking this gives us two different options: Compact and Detailed. Now we have summary statistics provided as part of our column headers! This result indicates that, on average, VADER tends to estimate the same set of reviews more positively than TextBlob does.

Another PyCharm feature we can use is the DataFrame Chart View. When we click on the button, we switch over to the chart editor. Let’s start with VADER’s compound score. Remove the default values for X Axis and Y Axis. Replace the values of both X-Axis and Y-Axis with vader_compound. Finally, select Histogram from the chart icons, just under Series Settings. We likely have a bimodal distribution for the VADER compound score, with a slight peak around -0.8 and a much larger one around 0.9. These peaks likely represent the split of negative and positive reviews. In contrast, TextBlob tends to rate most reviews as neutral, with very few reviews being strongly positive or negative.

tags: #sentiment #analysis #machine #learning #tutorial