Abstract:
The present invention is a novel system and method for overcoming the shortcomings of existing speech-to-text systems which relates to the processing of unrecognized words. On encountering words which are not decipherable by it the preferred embodiment of the present invention analyzes the syllables which make up these words and translates them into the appropriate phonetic representations. The method described by the present invention ensures that words which were not uttered clearly would not be lost or distorted in the process of transcribing the text. Additionally, it allows using smaller and simpler speech-to-text applications, which are suitable for mobile devices with limited storage and processing resources, since these applications may use smaller dictionaries and may be designed only to identify commonly used words. Also disclosed are several examples for possible implementations of the described system and method.
Abstract:
A method for verifying at least one sound sample to be used in generating a sound detection model in an electronic device includes receiving a first sound sample; extracting a first acoustic feature from the first sound sample; receiving a second sound sample; extracting a second acoustic feature from the second sound sample; and determining whether the second acoustic feature is similar to the first acoustic feature.
Abstract:
A speech is recognized using ACF factors extracted from running autocorrelation functions calculated from the speech. The extracted ACF factors are a W φ(0) (width of ACF amplitude around zero-delay origin), a W φ(0)max (maximum value of the W φ(0) ), a τ 1 (pitch period), a φ1 (pitch strength), and a Δφ 1 /Δt (rate of the pitch strength change). Syllables in the speech are identified by comparing the ACF factors with templates stored in a database.
Abstract:
The present invention relates to a method and apparatus for producing script data with respect to audio data. The method for producing the script data includes: obtaining the whole time information of an actual sound section of the audio data; obtaining the whole syllable number information with respect to a sound section on the basis of text data; calculating unit syllable time information corresponding to one syllable on the basis of the whole time information and the whole syllable number information; obtaining prediction playback position information with respect to a corresponding sound section of the audio data on the basis of a sound section occupied by a word or paragraph for which prediction is required in the text data and the unit syllable time information; and recording a mute section, which is the closest to a prediction playback position, of mute sections of the audio data located before or after the prediction playback position as actual playback position information.
Abstract:
The present invention , a method for speech recognition, comprises receiving a digital representation of speech, grouping the digital representation of speech into subsets, mapping each subset of the digital representation of speech into a character representation of speech (38) , grouping the character representations of speech into words, determining the number of syllables in the digital representation of each word, and searching a library (44) containing words arranged according to the number of syllables and finding at least one closest match to each word.
Abstract:
The present invention discloses a portable digital mobile communication apparatus with voice operation system and controlling method of voice operation. The feature vector sequences of speech are quantify encoded when the speech is recognized, and in decoding operation, each code in efficiency speech character codes are directly looked up observation probability of on search path from the probability schedule in the decode operation. In association with the present invention, full syllabic speech recognition can be achieved in mobile telephone without the need of training, and input Chinese characters by speech and speech prompting with full syllable. This system comprises semantic analysis, dialogue management and language generation module, and it can also process complicated dialog procedure and feed flexible prompting message back to the user. The present invention can also customize speech command and prompting content by user.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating word pronunciations. One of the methods includes determining, by one or more computers, spelling data that indicates the spelling of a word, providing the spelling data as input to a trained recurrent neural network, the trained recurrent neural network being trained to indicate characteristics of word pronunciations based at least on data indicating the spelling of words, receiving output indicating a stress pattern for pronunciation of the word generated by the trained recurrent neural network in response to providing the spelling data as input, using the output of the trained recurrent neural network to generate pronunciation data indicating the stress pattern for a pronunciation of the word, and providing, by the one or more computers, the pronunciation data to a text-to-speech system or an automatic speech recognition system.
Abstract:
A method of recognizing speech. The method includes the steps of obtaining an auditory signal; processing the auditory signal into a plurality of frequency components; processing the plurality of frequency components using a plurality of feature detectors, each feature detector producing a feature detector response; generating a spike for each instance in which a feature detector response identifies a characteristic auditory feature to produce a test spike pattern for the auditory signal; and comparing the test spike pattern to a plurality of predetermined spike patterns corresponding to a plurality of speech elements to determine whether the auditory signal includes one of the plurality of speech elements.