METHOD AND SYSTEM FOR UNSUPERVISED DISCOVERY OF UNIGRAMS IN SPEECH RECOGNITION SYSTEMS
Abstract:
A system and method of automatically discovering unigrams in a speech data element may include receiving a language model that includes a plurality of n-grams, where each n-gram includes one or more unigrams; applying an acoustic machine-learning (ML) model on one or more speech data elements to obtain a character distribution function; applying a greedy decoder on the character distribution function, to predict an initial corpus of unigrams; filtering out one or more unigrams of the initial corpus to obtain a corpus of candidate unigrams, where the candidate unigrams are not included in the language model; analyzing the one or more first speech data elements, to extract at least one n-gram that comprises a candidate unigram; and updating the language model to include the extracted at least one n-gram.
Information query
Patent Agency Ranking
0/0