EMBEDDED

WHAT’S THIS?

EMBEDDED lets you explore how a machine learning model interprets language. By using the motion sensors on your phone, you can move around in a space of 10 000 different words in total. When visiting a word, you will be surrounded by its 50 closest neighbors. These surrounding 50 words will be updated as you move to a new word. Neighboring words are words that often co-occured in the dataset the machine learning model was trained on, causing the algorithm to categorize them as similar.

We think that exploring the space in this way can foster new means of reflecting upon how a machine learning model interprets language.

HOW DO I USE IT?

Keep your phone parallel to the ground and rotate in order to explore your surrounding words. To move to a new word, either step towards it or motion forwards with your hand. Make sure that your movements are determined and precise.

You can choose to explore the space as it is. If you’re looking for a challenge, you can try play – either by yourself or competing with a friend. The aim of the game is to get to a given word by using as few steps as possible.

HOW DOES IT WORK?

EMBEDDED is based on a machine learning generated word map. It is made by taking a large corpus of text gathered from the internet, and then processing this with an algorithm that calculates word vectors. What this algorithm does is to look at the position of every word in every sentence and keep a running count of the relationships between them. Based on this, each word gets a position – a vector – in relation to each other word.

In EMBEDDED, the distance between the current word and its surrounding words represent their relation. The closer a word is to the current word, the more semantically similar it is. The distance in between the surrounding words is arbitrary.

If you want to read more about how the model was generated, you can find more in-depth information here.

DISCLAIMER

As the space contains a dataset of 10 000 words, it also includes words that might be considered profane, vulgar or offensive by users. Some of these words or relations can be strongly tied to racism, homophobia or sexism. We have chosen not to sensor or exclude these words as we want to represent the original dataset.

The relations between words that EMBEDDED visualizes represent the machine learning model’s understanding of language, not ours.

WHY IS IT RELEVANT?

The model that EMBEDDED is based on is an example of the machine learning models that are incorporated into many of the digital tools and services we use daily. Similar models are used to recommend products, for text prediction and translation, speech and sentiment analysis.

Reflecting upon the biases, norms and logic that these systems are built on is important. But doing so is often difficult because it requires specific technical knowledge. By allowing for a playful and intriguing interaction with a machine learning model, EMBEDDED makes it more accessible. Using the tool does not require any prior experience with machine learning models.

Moving around in the space and observing the words you might ask yourself: Why does the algortihm consider certain words to be similar to ‘female’? What about ‘male’? What words appear when you type in a certain nationality? How does it interpret specific names? What might have caused certain relations?

RESOURCES

Bias in Natural Language Processing

Embedding Projector

ml5.js

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation

Common Crawl

A project by Eirunn Kvalnes and Noah von Stietencron

EMBEDDED