It finds two probabilities :
1) If the first word in the sentence is in the training data, it finds the probability of that word having each tag. For example, if the first word was 'The' ,the most probable tag for it would be article or 'DD', as seen from training data.
2) The most common tag for the first word in a sentence from the training data. It looks through all sentences in the training data and finds the tag that occurs the most in the first word. For example, for a specific training data set, the most probable word would be an article.
It multiplies these probabilities to give the probability of each tag, and thus finds the most probable tag for the first word.
Added screenshot, edited description