Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
Abstract:
Systems and processes are disclosed for predicting words using a categorical stem and suffix word n-gram language model. A word prediction includes determining a stem probability using a stem language model. The word prediction also includes determining a suffix probability using suffix language model decoupled from the stem model, in view of one or more stem categories. The word prediction also includes determine a probability of the stem belonging to the stem category. A joint probability is determined based on the foregoing, and one or more word predictions having sufficient likelihood. In this way, the categorical stem and suffix language model constraints predicted suffixes to those that would be grammatically valid with predicted stems, thereby producing word predictions with grammatically valid stem and suffix combinations.
Information query
Patent Agency Ranking
0/0