Lettuce Evaluate Some Recipe Word Embeddings

A Present for Tasty Fans
BuzzFeed Tech is bringing some presents to the Tasty 2nd Year anniversary. Instead of bringing the second cheapest bottle of champagne available, we’ll be releasing the Tasty App, so our fans can enjoy a seamless Tasty cooking experience.
We want to bring the best possible Tasty experience to our loyal users, and that means helping them find recipes they’ll like more easily. Over the past two years, Tasty has produced +1700 cooking videos and sifting through them could be a chore when you’re looking for that perfect one-pan chicken dish. To improve that process for our fans, we will employ a variety of machine learning techniques to recommend recipes our users will love. But first, we needed an effective way of describing recipes so that even a computer can understand how delicious Avocado Carbonara is.
Enter Word2Vec by Mikolov et al.
(There’s are already from fantastic resources to learn about Word2Vec conveniently collected here. Plus, check out more papers by the creator Mikolov to learn more about the technique and I will restrain myself from trotting out the olde king + woman = queen example.)
Why We Need Word Embeddings
Word2Vec, developed at Google, is a model used to learn vector representations of words, otherwise known as word embeddings. Why would we care about word embeddings when dealing with recipes? Well, we need some way to convert text and categorical data into numeric machine readable variables if we want to compare one recipe with another. Traditionally, this has been accomplished a few ways. A popular technique is turning each potential category into a boolean variable, otherwise known as “dummy coding”. If we wanted to describe that a recipe has chicken in it, we would include a column for chicken with a 1 to indicate chicken was in the recipe. A fish recipe would have a 0 instead of 1 in the chicken column. This technique works up to a point, but each new ingredient or attribute grows your matrix by another column adding more and more sparsity. Additionally, creating these boolean variables can be cumbersome as we have to make sure to map things like 2% Greek yogurt, Greek yogurt, and 2% yoghurt all back to the same representative variable.
Another method is to encode your categorical variables as a number. Unfortunately, this is arbitrary, e.g. who is to say “chicken” is 1, but “steak” is 7? You could end up with “tuna” being much closer numerically to “marshmallow” than to “salmon” since the encoding doesn’t attempt to understand the relationships of the variables to be encoded. Word embeddings solve these issues by limiting the number of required features to the number of dimensions in the vector and by retaining information about the relationships of words.
Training Word2Vec
Training a Word2Vec model requires phrases or sentences. In our case, instead of passing in ingredient lists, we’ll use our recipe preparation steps. This means we just aren’t crafting features based on the recipe ingredients but it’s entire preparation process. Which means we’ll end up with more information on the techniques and methods employed to make the magic happen.
While most of the code is cleaning and formatting the data, the actual training of the model is quite simple. While this model is implemented in many languages, I decided to use gensim’s Python implementation.
#Set Parameters
num_features = 70
min_word_count = 5
context = 10
downsampling = 1e-3
num_workers = 4#Initialize and train the model
model = word2vec.Word2Vec(prepartion_steps, workers=num_workers, \
size=num_features, min_count = min_word_count, \
window = context, sample = downsampling, sg=1)model.init_sims(replace=True)model_name = "prep2vec"
model.save(model_name)
Evaluating Word2Vec
Visualizing more than three dimensions is hard, and we’ve just created embeddings of 70 dimensions with word2vec. Luckily, when we want to visualize our high dimensional word embeddings, we can employ a dimensionality reduction technique.
Below, we can see some of the vector embeddings for common ingredients projected onto two dimensions by t-SNE. t-SNE or t-Distributed Stochastic Neighbor Embedding is a dimensionality reduction method. The positions of the ingredients below represent probability distributions rather than actual positions in space. t-SNE plots can be difficult to interpret as the hyper parameter, perplexity, can drastically change the size and distance between clusters. However, we aren’t trying to interpret clusters, but rather hoping to evaluate whether or not our model learned something useful about our recipes.
(See this post by Wattenberg explaining more about how to use T-SNE effectively.)
Espresso and Coffee are right next to each other in red. Fettuccini, macaroni, pasta, and linguine are also closely clustered in teal. Likewise, tequila, vodka, gin, and champagne are tightly knit together in purple. Rosemary, thyme, and sage in green, narrowly escaping the perfect Simon and Garfunkel reference.
There may be some strange relationships in this plot, and every run of t-SNE will return a slightly different interpretation. That said, it does appear our model is learning quite a bit about the food space.
Another way to evaluate word embeddings is to evaluate the similarities of words for which we have a good understanding of how they are related. Below we can see the vectors for torte and cake are very similar (similarity scores from 0- 1) but not so similar to salad (sadly). Likewise, chocolate is very similar to ganache, but not so much to guacamole.


Creating Recipe2Vec
The beautiful thing about word embeddings is they are composable. Just like when you string words together to form a sentence, combining word embeddings produces a meaningful new embedding. We’ve created these ingredient and preparation embeddings, but we really want recipe embeddings. We can sum, average, or concatenate our word embeddings to create our recipe embeddings. Summing or averaging vectors like that may sound counterintuitive like we’re losing a lot of detail about our subjects, but in practice, it’s been found to be very effective. Below we’ve plotted all our recipe embeddings after summing them up.

Bon Appetit
Now that we have recipe embeddings, we’ll use these as features in content based recommendation systems, related content modules, and user preference clustering. A content based recommendation system will let us make personalized recommendations for users based on their past views and likes. Here’s a sneak peek at what personalized recipe recommendations might look like Tasty App soon!
Happy Cooking!

To keep in touch with us here and find out what’s going on at BuzzFeed Tech, be sure to follow us on Twitter @BuzzFeedExp where a member of our Tech team takes over the handle for a week!