Abstract:
In a speech recognition platform, a masking unit 17 can be utilized to mask noisy content within an audio sample. By masking such noise in a dynamic but predictable manner, valid content can be preserved while largely overcoming the random and detrimental presence of noise. In one embodiment, speech recognition features are extracted pursuant to a hierarchical process that localizes, at least to some extent, some of the resultant features from other resultant features. As a result, noisy or otherwise unreliable information corresponding to the audio sample will not be leveraged unduly across the entire feature set. In another embodiment, an average energy value for processed samples is calculated with individual energy values that are downwardly weighted when such individual energy values are likely representative of noise.
Abstract:
A dictionary is comprised of a dendroid hierarchy of branches and nodes, wherein each node represents no more than one symbol (which symbol is to be converted to a corresponding sound) and wherein each such symbol as is represented at a given node has only one corresponding sound associated with that symbol at that node. In addition, many of the branches include a plurality of nodes representing a string of the symbols in a particular sequence. The dictionary is used to translate an input comprising a given integral sequence of the symbols into a corresponding integral sequence of sounds. This permits both method and apparatus to convert, for example, text to representative phonemes. Such phonemes can be used, amongst other purposes, to support synthesized speech production.