- Load pre-trained word vectors, and measure similarity using cosine similarity.
- Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______.
- Modify word embeddings to reduce their gender bias
We will use 50-dimensional GloVe vectors to represent words.
To measure how similar two words are, we need a way to measure the degree of similarity between two embedding vectors for the two words.
This similarity depends on the angle between u and v. If u and v are very similar, their cosine similarity will be close to 1; if they are dissimilar, the cosine similarity will take a smaller value.
Figure 1 : The cosine of the angle between two vectors is a measure of how similar they are.
In the word analogy task, we complete the sentence "a is to b as c is to ____". An example is 'man is to woman as king is to queen'. In detail, we are trying to find a word d, such that the associated word vectors are related. We will measure the similarity between e_b - e_a and e_d - e_c using cosine similarity.
We will examine gender biases that can be reflected in a word embedding, and explore algorithms for reducing the bias.
The figure below should help you visualize what neutralizing does. If you're using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction g, and the remaining 49 dimensions, which we'll call g⊥.
Figure 2: The word vector for "receptionist" represented before and after applying the neutralize operation.
Next, lets see how debiasing can also be applied to word pairs such as "actress" and "actor." Equalization is applied to pairs of words that you might want to have differ only through the gender property. As a concrete example, suppose that "actress" is closer to "babysit" than "actor." By applying neutralizing to "babysit" we can reduce the gender-stereotype associated with babysitting. But this still does not guarantee that "actor" and "actress" are equidistant from "babysit." The equalization algorithm takes care of this.
The key idea behind equalization is to make sure that a particular pair of words are equi-distant from the 49-dimensional g⊥. The equalization step also ensures that the two equalized steps are now the same distance from ereceptionistdebiased, or from any other work that has been neutralized. In pictures, this is how equalization works:
The debiasing algorithm is from Bolukbasi et al., 2016, Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.
The GloVe word embeddings were due to Jeffrey Pennington, Richard Socher, and Christopher D. Manning.