Visualizing embeddings


I’ve pushed several files to the class repo, including two programs to help you visualize the embeddings in your corpus: interact_cooccur.py, which we played with in class on Tuesday, and visualize_cooccur.py, which can produce 2-d (and even 3-d) plots like this showing the embeddings in a reduced-dimensional space:





The next homework assignment (coming soon) will have you running and configuring these programs to help you analyze your own corpus’s embeddings. Stay tuned for that.

Also, I have posted the code we used to play around with standard pre-trained embedding collections (like word2vec and GloVe): you’ll need to first run the download_embeddings.py file (while connected to a good network) and then run either sim_emb_play.py or closest_emb_play.py to find the similarity of pairs of words, or the top-10 closest embeddings to a given word, respectively.


Leave a Reply

Your email address will not be published. Required fields are marked *

DATA 470D3 – Natural Language Processing

stephendavies.org