Sentiment Analysis

Installation Guide for Homework Environment

Prerequisites:

Ensure that you're using Python version 3.12. Check your Python version by running:

    python --version

    python3 --version

Installing uv (Recommended Python Package Manager):

We recommend using uv as it's much faster than pip and conda for managing Python environments and packages.

What is uv?

uv is a modern, Rust-based package + project manager for Python. It keeps the familiar pip workflow but re-implements the engine for speed and reliability. Concretely: it creates a venv, resolves and installs dependencies with its own fast installer, and deduplicates files via a global cache (copy-on-write on macOS, hardlinks on Linux/Windows). It can also manage Python versions per project (e.g., pin 3.12) so each assignment uses a clean, reproducible interpreter. Think "pip + virtualenv + pip-tools + pyenv/pipx".

Installing uv:

Please refer to the official uv installation documentation for the most up-to-date installation instructions for your platform.

Setting Up the Homework Environment with uv:

Create and activate a virtual environment with the required dependencies:

macOS/Linux:

# Install uv once
curl -LsSf https://astral.sh/uv/install.sh | sh

# Optional: `uv` binary by default goes to `$HOME/.local/bin` on Linux/macOS,
# so you may need to add it to your PATH (uv may have done this for you):
export PATH="$HOME/.local/bin:$PATH"

Windows:

# Install uv once
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

All platforms:

# Download the homework zip and unzip into `hw2_sentiment/`
# In your hw directory
uv init .                     # Initialize project (creates pyproject.toml)
uv python pin 3.12            # Pin Python version
uv add numpy einops torch     # Add dependencies
uv run python grader.py       # Run the local grader

Running on Stanford FarmShare (Optional)

If you cannot run the assignment on your laptop or need additional computing resources, Stanford provides FarmShare, a community computing environment for coursework and unsponsored research. Please follow the instructions at https://docs.farmshare.stanford.edu/ to get started with the computing environment.

We've created a LaTeX template here for you to use that contains the prompts for each question.

Social media platforms like Twitter are rich sources of emotional expression. In this homework, you'll build classifiers to detect emotions in tweets using linear classification with cross-entropy loss and softmax activation. Consider the following dataset of 6 tweets, each labeled with exactly one emotion: joy (J), anger (A), or fear (F):

Tweet	Emotion	One-hot encoding
"amazing day"	Joy	[1, 0, 0]
"scared of spiders"	Fear	[0, 0, 1]
"love this"	Joy	[1, 0, 0]
"so angry"	Anger	[0, 1, 0]
"so so worried about tomorrow"	Fear	[0, 0, 1]
"hate waiting"	Anger	[0, 1, 0]

Each tweet $x$ is mapped to a feature vector $f(x)$ using bag-of-words representation (you can think of it as word counts). Machine learning models don't directly take a string as input; instead, as you have learned, they work with tensors (e.g. expressed in NumPy or PyTorch). One way to convert strings to tensors is “bag-of-words” features, which treat input texts as a "bag of words" and represents them using a vector with the same length as our vocabulary. The resulting representation is a vector, where each position corresponds to a word in the vocabulary, and each value represents how many times that word appears in the input text. If we have a very large vocabulary, we would end up with a very sparse vector with many 0's.

For this problem, let's assume our vocabulary consists of all unique words in the tweets above: {about, amazing, angry, day, hate, love, of, scared, so, spiders, this, tomorrow, waiting, worried}.

To build our multi-class classifier, we use a weight matrix $\mathbf{W} \in \mathbb{R}^{d \times 3}$ where $d$ is the feature dimension (vocabulary size) and 3 is the number of classes. The model computes logits as $\mathbf{z} = \mathbf{W}^T f(x)$ and applies softmax to convert logits into probability values that sum to 1. Let $\mathbf{p}$ be the 3-dimensional vector of output probabilities, and $p_k$ be the probability that the input belongs to class $k$. The softmax probabilities are computed in the following way: $$p_k = \frac{e^{z_k}}{\sum_{j=1}^{3} e^{z_j}}$$

Then, given the softmax probabilities, the classifier will use argmax to choose the prediction class with the highest $p_k$.

The cross-entropy loss measures the difference between a model's predicted probabilities and the true probability distribution of the data. For a single example (with three output classes), it can be computed as follows: $$\text{Loss}_{\text{CE}}(x, \mathbf{y}, \mathbf{W}) = -\sum_{k=1}^{3} y_k \log p_k$$ where $\mathbf{y}$ is the one-hot encoded true label vector.

In the previous problem, we used bag-of-words features for sentiment classification. It is now an outdated method of representing texts in machine learning due to many limitations. For example, it doesn't capture word relationships (e.g., "amazing" and "wonderful" are similar but treated as completely different features) and creates very sparse, high-dimensional vectors. In this problem, we'll explore how neural networks with word embeddings can address these issues.

We'll use the same set of tweets from Problem 1, but now represent each word with a dense 2-dimensional embedding vector. For simplicity, assume we have the following pre-trained word embeddings:

Word	Embedding
amazing	[0.8, 0.6]
day	[0.2, 0.1]
scared	[-0.5, -0.7]
of	[0.0, 0.0]
spiders	[-0.3, -0.4]
love	[0.9, 0.4]
this	[0.0, 0.0]
so	[0.1, 0.0]
angry	[-0.6, -0.8]
worried	[-0.3, -0.6]
about	[0.0, -0.1]
tomorrow	[0.2, -0.2]
hate	[-0.8, -0.5]
waiting	[-0.1, -0.3]

For simplicity, we will focus on binary sentiment classification (positive or negative) instead of three-class classification for this problem only.

Variable	Formula/Computation	Value
$L$

Now, we will implement the first classifier that can classify tweet sentiments. Recall that in our data, each tweet is labeled with one of three emotions (joy, anger, or fear) in one-hot encoding. For example:

Linear classification was a simple attempt at detecting tweet sentiments. In class, we learned that word embeddings and neural networks are powerful tools in natural language processing, and we will now implement them.

To build embeddings, we convert each word into a small, dense "embedding" vector (such as a 32-dimensional array of numbers) that captures the word's meaning. To obtain the embedding for each text, we simply average all the word embeddings to get one fixed-size vector per document - so a 500-word document and a 10-word document both become the same size vector, making them perfect inputs for your multilayer perceptron classifier while preserving much more semantic information than word counts alone.

Submission

text	one-hot label
i feel very happy and excited since i learned so many things	[1,0,0]
i feel angered and firey	[0,1,0]
i remember feeling acutely distressed for a few days	[0,0,1]

Submission is done on Gradescope.

Written: When submitting the written parts, make sure to select all the pages that contain part of your answer for that problem, or else you will not get credit. To double check after submission, you can click on each problem link on the right side, and it should show the pages that are selected for that problem.

Programming: After you submit, the autograder will take a few minutes to run. Check back after it runs to make sure that your submission succeeded. If your autograder crashes, you will receive a 0 on the programming part of the assignment. Note: the only file to be submitted to Gradescope is submission.py.

More details can be found in the Submission section on the course website.

Node/Variable	Formula/Computation	Value
$x$	NA
$\mathbf{h}$
$z$
$\hat{y}$

Gradient	Formula (Chain Rule)	Value
$\frac{\partial L}{\partial \hat{y}}$
$\frac{\partial L}{\partial z}$
$\frac{\partial L}{\partial \mathbf{h}}$
$\frac{\partial L}{\partial \mathbf{W}^{(2)}}$
$\frac{\partial L}{\partial b^{(2)}}$
$\frac{\partial L}{\partial \mathbf{W}^{(1)}}$
$\frac{\partial L}{\partial \mathbf{b}^{(1)}}$