<:> N-gram hierarchy
To obtain n-gram probabilities in a way which exploits the chain
rule, work through the sequence building up an n-gram tree
structure.
When this is done, convert the observed frequencies into the
appropriate conditional probabilities.