- Create a dictionary called bigrams. This dictionary will have
key/value pairs whose keys are (unique) words, and whose values are each
dictionaries of words and counts. Each key of
bigrams is the first word in a bigram. Its value will be a
dictionary whose keys are the second word in a bigram, and whose
values are the number of times that bigram appeared in the corpus.
- Go through the list of word pairs, in order. (The only way I
know how to do this is with the "for i in range(len(...))"
pattern instead of the "for word in ..." pattern, and inside the
loop to use "i" and "i+1" as indices into the list to
retrieve the first and second words of each pair. Warning: make sure your
loop doesn't go one-too-far! (If there are 1000 words, there are only 999
pairs of words.)
- If you haven't yet seen a bigram whose first word is the first word of
this pair, enter a new dictionary into the bigrams
corresponding to the key of that first word. This new dictionary should
have just one key/value pair at the moment; namely, the second word
in the bigram and the value 1.
If you do already have an entry in bigrams for this first
word in the pair, update that dictionary to create/increment the value for
the second word in the pair, just like you did when you encountered
a new word in the unigram model.
- Don't forget to add bigrams with <s> and
</s>, as appropriate.
{'i': {'can': 1, 'think': 1}, 'think': {'i': 1},
'<s>: {'i': 1}, 'can': {'</s>': 1} }