Twitter Sentiment Analysis with Tensorflow.js
Sentiment Analysis is the process of analyzing if a piece of online writing (social media posts, comments) is positive, negative or neutral.
Significant progress has been made in the field of Sentiment Analysis in the past few years, this technique has been largely use in Business and Politics.
In this post, we’ll connect to Twitter API, gather tweets by hashtag, compute the sentiment of each tweet, and build a real-time dashboard to show the result.
Twitter Sentiment Analysis Demo
As Twitter shut off its free API since April 2023, this demo could no longer fetch tweets from Twitter. 😢
Search Tweet only available for Twitter API v2 for “Basic” or “Pro” plans, which is at lease $100/month
You can visit another related article below for the demo of YouTube Comment Sentiment Analysis ↓
GitHub repository
You can download the complete code of the above demo in the link below:
Connect to Twitter API, gather tweets by hashtag, compute the sentiment of each tweet, and build a real-time dashboard to show the result.
Implementation
This tool would be pretty useful for business to monitor and understand the social sentiment of their brand, product or services. And I built the demo by using the Tensorflow.js Sentiment Model. If you are curious about how it is built, please follow me below and I would tell you step by step.
# Step 1 : Create a Twitter App
As we would like to pull tweets from Twitter to analyze the sentiment, therefore, we need to create an app in the Twitter’s developer platform.
- Login/Register a Twitter account
- Go to https://developer.twitter.com/en/apps
- Click Create an app
- Fill in the app information form and Create
- Once the app is created, navigate to the Keys and tokens tab
- Generate Consumer API keys and Access token & access token secret
# Step 2 : Get Tweets from Twitter
Once you created the Twitter app and generated the API Keys and tokens, it’s time to utilize Twitter’s search API to pull down some tweets that matching the hashtag you are searching for.
PHP
To make things easier, here I am referencing a PHP Wrapper for Twitter API v1.1 calls twitter-api-php. Download the TwitterAPIExchange.php from the above repository, create a queryTwitter.php with below code. Make sure you replace the Twitter API Keys and Tokens with the ones you generated in Step 1.
<?php
require_once('TwitterAPIExchange.php');
$hashtag = $_GET["q"];
$settings = array(
'oauth_access_token' => "YOUR-ACCESS-TOKEN",
'oauth_access_token_secret' => "YOUR-ACCESS-TOKEN-SECRET",
'consumer_key' => "YOUR-CONSUMER-KEY",
'consumer_secret' => "YOUR-CONSUMER-SECRET"
);
$url = 'https://api.twitter.com/1.1/search/tweets.json';
$getfield = '?q=#'.$hashtag.' AND -filter:retweets AND -filter:replies&lang=en&count=20&tweet_mode=extended';
$requestMethod = 'GET';
$twitter = new TwitterAPIExchange($settings);
$response = $twitter->setGetfield($getfield)
->buildOauth($url, $requestMethod)
->performRequest();
echo $response;
?>
Javascript
Then we write javascript function to pass a request to the above php page and retrieve the tweets.
function getTwitterHashTagData(query, callback) {
$.getJSON( urls.queryTwitter + query, function(result) {
console.log(result);
if(result !== null && result.statuses !== null){
callback(result.statuses);
}
});
}
Each tweet is an object that have properties like id, created_at, user… and the one that we are interested in is the "full_text"
property
# Step 3 : Load the Sentiment model
Simply include the scripts for tfjs
and in the <head> section of the html file.
<html>
<head>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
</head>
Or you can install it via npm for use in a TypeScript / ES6 project
npm install @tensorflow/tfjs
In order to perform sentiment analysis, we first need to load the pre-trained sentiment model and metadata, by calling the API of tf.loadLayersModel(url)
.
const urls = {
model: 'https://storage.googleapis.com/tfjs-models/tfjs/sentiment_cnn_v1/model.json',
metadata: 'https://storage.googleapis.com/tfjs-models/tfjs/sentiment_cnn_v1/metadata.json'
};
async function loadModel(url) {
try {
const model = await tf.loadLayersModel(url);
return model;
} catch (err) {
console.log(err);
}
}
async function loadMetadata(url) {
try {
const metadataJson = await fetch(url);
const metadata = await metadataJson.json();
return metadata;
} catch (err) {
console.log(err);
}
}
The model was trained on a set of 25,000 movie reviews from IMDB, labelled as having positive or negative sentiment. This dataset is provided by Python Keras, and the models were trained in Keras as well, based on the imdb_cnn examples.
# Step 4 : Sentiment Analysis Tweet text
For each tweet, we call the model.predict(input)
API in Tensorflow.js. This would perform a Sentiment Analysis on each tweet text, returning a store between 0 and 1, which indicate whether it is Neutral, Positive or Negative.
function processTwitterData(tweets){
setupSentimentModel().then(
result => {
const twitterData = [];
$.each(tweets, function( index, tweet ) {
const tweet_text = tweet.full_text.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '');
const sentiment_score = getSentimentScore(tweet_text);
let tweet_sentiment = '';
if(sentiment_score > SentimentThreshold.Positive){
tweet_sentiment = 'positive'
}else if(sentiment_score > SentimentThreshold.Neutral){
tweet_sentiment = 'neutral'
}else if(sentiment_score >= SentimentThreshold.Negative){
tweet_sentiment = 'negative'
}
twitterData.push({
sentiment: tweet_sentiment,
score: sentiment_score.toFixed(4),
tweet: tweet_text
});
});
console.log(twitterData);
$('.spinner-border').addClass('d-none');
displayTweets(twitterData.filter(t => t.sentiment == 'positive'), 'positive');
displayTweets(twitterData.filter(t => t.sentiment == 'neutral'), 'neutral');
displayTweets(twitterData.filter(t => t.sentiment == 'negative'), 'negative');
$('#tweet-list').removeClass('d-none');
displayPieChart(twitterData);
}
)
}
function getSentimentScore(text) {
const inputText = text.trim().toLowerCase().replace(/(\.|\,|\!)/g, '').split(' ');
// Convert the words to a sequence of word indices.
const sequence = inputText.map(word => {
let wordIndex = metadata.word_index[word] + metadata.index_from;
if (wordIndex > metadata.vocabulary_size) {
wordIndex = OOV_INDEX;
}
return wordIndex;
});
// Perform truncation and padding.
const paddedSequence = padSequences([sequence], metadata.max_len);
const input = tf.tensor2d(paddedSequence, [1, metadata.max_len]);
const predictOut = model.predict(input);
const score = predictOut.dataSync()[0];
predictOut.dispose();
return score;
}
# Step 5 : Display result in table and chart
Now we had the sentiment result of the tweets. To make it look nice and easy to capture the information, we put that into a table and pie chart.
function displayTweets(twitterData, sentiment){
var tbl = document.createElement('table');
var tr = tbl.insertRow();
for( var j in twitterData[0] ) {
if(j !=='sentiment'){
var td = tr.insertCell();
td.appendChild(document.createTextNode(j));
}
}
for( var i = 0; i < twitterData.length; i++) {
var tr = tbl.insertRow();
for( var j in twitterData[i] ) {
if(j !=='sentiment'){
var td = tr.insertCell();
var text = twitterData[i][j];
td.appendChild(document.createTextNode(text));
}
}
}
tbl.setAttribute('class', 'tweet-table')
$('#'+sentiment).append(tbl);
$('#'+sentiment+'-counter').html('('+ twitterData.length +')');
}
For the pie chart, I am using a jquery chart library canvasjs
function displayPieChart(twitterData){
var sentimentsCounter = {"Negative": 0, "Neutral": 0, "Positive": 0};
for( var i = 0; i < twitterData.length; i++) {
switch(twitterData[i].sentiment) {
case 'positive':
sentimentsCounter["Positive"] += 1;
break;
case 'negative':
sentimentsCounter["Negative"] += 1;
break;
case 'neutral':
sentimentsCounter["Neutral"] += 1;
break;
}
}
var chart = new CanvasJS.Chart("chartContainer", {
theme: "light2", // "light1", "light2", "dark1", "dark2"
exportEnabled: true,
animationEnabled: true,
data: [{
type: "pie",
startAngle: 25,
toolTipContent: "<b>{label}</b>: {y}%",
showInLegend: "true",
legendText: "{label}",
indexLabelFontSize: 16,
indexLabel: "{label} - {y}%",
dataPoints: [
{ y: (sentimentsCounter["Positive"] * 100.00/twitterData.length).toFixed(2), label: "Positive" },
{ y: (sentimentsCounter["Neutral"] * 100.00/twitterData.length).toFixed(2), label: "Neutral" },
{ y: (sentimentsCounter["Negative"] * 100.00/twitterData.length).toFixed(2), label: "Negative" },
]
}]
});
chart.render();
}
And when the enter is press or the search button is click, the above functions are call
$("#tag-input").on('keyup', function (e) {
if (e.keyCode === 13) {
twitterSentiment();
}
});
$(".btn-search").click(function () {
twitterSentiment();
});
function twitterSentiment(){
$('#tweet-list').addClass('d-none');
$('#positive').empty();
$('#neutral').empty();
$('#negative').empty();
$('#chartContainer').empty();
$('.spinner-border').removeClass('d-none');
getTwitterHashTagData($("#tag-input").val(), processTwitterData);
}
That’s it for the code, and congratulations, you had built your Twitter Sentiment Analysis app.
Conclusion
Just by using Tensorflow.js Sentiment CNN model, it is a simple way to do sentiment analysis. But it couldn’t achieve high accuracy, only around 70%. To improve accuracy, you could look into more sophisticated model like LSTM. But still, I think it is a nice and handy model, and do indicate whether the the sentiment are positive or negative in some degree.
Thank you for reading. If you like this article, please share on Facebook or Twitter. Let me know in the comment if you have any questions. Follow me on Medium, GitHub and Linkedin. Support me on Ko-fi.
2 Comments
Hey, I noticed that the metadata.json has a vocabulary_size of 20000. If any word that is used goes above that it throws an error saying
“Error: GatherV2: the index value 54485 is not in [0, 19999]”
Thus your method keeps the words in check, However, is there a way to increase the cap of this to allow word indexes up to the full length of the metadata.json file? (70000+).
I have tried to raise the cap of this number in the file, however it still throws an error. Where is the 20000 index limit being set, and why?
Thanks and great article!
Hi there! Thank you for your comment and for bringing this issue to our attention. The vocabulary size of 20000 is set in the metadata.json file to limit the number of unique words that are used in the model. This is done to keep the model size manageable and to prevent overfitting.
To increase the vocabulary size, you will need to modify the code that generates the metadata.json file. You can try increasing the value of the “vocabulary_size” parameter in the code to a higher number, but keep in mind that this may increase the size of the model and could affect its performance.
If you have already tried increasing the vocabulary size in the code and it still throws an error, it’s possible that there is another limit somewhere in the code that is preventing the model from using more than 20000 words. I suggest checking the code thoroughly to see if there are any other limits that need to be adjusted.
I hope this helps! Let me know if you have any further questions.