Using LIWC for Document-Level Sentiment Analysis

Text Analysis
Author
Affiliation

Jeff Jacobs

Published

December 4, 2023

The Text Files

You can download the .txt files for positive and negative sentiment at the following links (click them to view the contents, or right click and choose “Save Link As…” to download)1:

Converting Into Python Regular Expressions

Using them in this raw format is a bit tricky, however, since they are formatted not as individual words but as regular expressions, which will match entire families of positive and negative words. For example, 032-negemo.txt contains the entry troubl*, which will therefore match the words trouble, troubles, troubling, and so on.

So, to work with these files in Python, we’ll need to load the .txt files but then convert each entry into a regular expression object. This can be done using the following collection of functions:

import re
def load_liwc_list(filepath):
    """
    :return: A list of words loaded from the file at `fpath`
    """
    with open(filepath, 'r', encoding='utf-8') as infile:
        words = infile.read().split()
    return words

def liwc_to_regex(liwc_list):
    """
    Converts LIWC expression list into Python regular expression
    """
    wildcard_reg = [w.replace('*', r'[^\s]*') for w in liwc_list]
    reg_str = r'\b(' + '|'.join(wildcard_reg) + r')\b'
    return reg_str

def num_matches(reg_str, test_str):
    num_matches = len(re.findall(reg_str,test_str))
    return num_matches

def file_to_regex(filepath):
    liwc_list = load_liwc_list(filepath)
    liwc_regex_list = liwc_to_regex(liwc_list)
    return liwc_regex_list

# You can call the following helper function if
# you'd like to see the full regular expression
def print_regex(regex_str, wrap_col=70):
    import textwrap
    print(textwrap.fill(regex_str, wrap_col))

We can use these functions to load the .txt files and create lists of regex (Regular Expression) objects from them:

pos_fpath = "./assets/liwc/031-posemo.txt"
pos_regex = file_to_regex(pos_fpath)
neg_fpath = "./assets/liwc/032-negemo.txt"
neg_regex = file_to_regex(neg_fpath)
# Uncomment this line to see the full regular expression
#print_regex(neg_regex)
print_regex(neg_regex[:140])
\b(dismay[^\s]*|ignorant|poorest|tragic|disreput[^\s]*|ignore|poorly|t
rauma[^\s]*|abandon[^\s]*|diss|ignored|poorness[^\s]*|trembl[^\s]*|abu

Using the Regular Expressions to Generate Sentiment Scores

And now we can use these generated regular expressions to count the number of times “positive” and “negative” words appear in our string! Here we provide two final helper functions for accomplishing this:

def extract_sentiment_data(text):
    # First compute positive sentiment using pos_reg
    pos_count = num_matches(pos_regex, text)
    # Then negative sentiment using neg_reg
    neg_count = num_matches(neg_regex, text)
    # And finally the overall sentiment score as the difference
    sentiment = pos_count - neg_count
    return {
        'pos': pos_count,
        'neg': neg_count,
        'sentiment': sentiment
    }

def compute_sentiment(text):
    full_results = extract_sentiment_data(text)
    # Return just the overall sentiment score
    return full_results['sentiment']

And here we test these helper functions out by creating positive, negative, and neutral test strings and checking the results for these strings:

neg_test_str = "Python is terrible, I hate Python, I despise Python"
neg_str_results = extract_sentiment_data(neg_test_str)
print(f"{neg_test_str}\n{neg_str_results}")
pos_test_str = "Python is wonderful, I love Python, I adore Python"
pos_str_results = extract_sentiment_data(pos_test_str)
print(f"{pos_test_str}\n{pos_str_results}")
neutral_test_str = "Python is ok, Python is mid, I guess I can do Python maybe"
neutral_str_results = extract_sentiment_data(neutral_test_str)
print(f"{neutral_test_str}\n{neutral_str_results}")
Python is terrible, I hate Python, I despise Python
{'pos': 0, 'neg': 3, 'sentiment': -3}
Python is wonderful, I love Python, I adore Python
{'pos': 3, 'neg': 0, 'sentiment': 3}
Python is ok, Python is mid, I guess I can do Python maybe
{'pos': 1, 'neg': 0, 'sentiment': 1}

Computing Sentiment Scores for a DataFrame Column

Even though above we printed out the full results of each sentiment computation (by calling extract_sentiment_data(), which returns a dictionary containing the results), if you have a DataFrame with a text column that you’d like to perform sentiment analysis on, you can just use the simpler compute_sentiment() function to obtain a single number, like in the following code:

import pandas as pd
text_df = pd.DataFrame({
    'text_id': [1,2,3],
    'text': [neg_test_str, pos_test_str, neutral_test_str]
})
text_df
text_id text
0 1 Python is terrible, I hate Python, I despise P...
1 2 Python is wonderful, I love Python, I adore Py...
2 3 Python is ok, Python is mid, I guess I can do ...
text_df['sentiment'] = text_df['text'].apply(compute_sentiment)
text_df
text_id text sentiment
0 1 Python is terrible, I hate Python, I despise P... -3
1 2 Python is wonderful, I love Python, I adore Py... 3
2 3 Python is ok, Python is mid, I guess I can do ... 1

Footnotes

  1. And, in case you want to use it in the future, you can download the entire set of word lists as a zip file here↩︎