摘要:
A method and system are provided by which a wireless mobile device takes a vocally entered query and transmits it in a text message format over a wireless network to a search engine; receives search results based on the query from the search engine over the wireless network; and displays the search results.
摘要:
One aspect of the invention involves word recognition that uses scrollable choice lists in which choices are listed in character-order. Another aspect relates to a scrollable, visually-displayed word recognition choice list, where the recognition candidates on the choice list are each associated with a choice-selecting symbol the user can use to select a desired recognition candidate by pressing an associated button, and where the same choice-selecting symbol is used for different choices displayed on the display at different times as a result of scrolling. Another aspect of the invention relates to providing a choice list of best scoring characters for a particular character position in the spelling of a filter that is used to filter word recognition. Another aspect of the invention relates to a choice list used in word recognition in which the choice list can be scrolled horizontally.
摘要:
A method for speech recognition. The method uses a single pronunciation estimator to train acoustic phoneme models and recognize utterances from multiple languages. The method includes accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages. The method also includes, for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages. The method also includes training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words.
摘要:
A method implemented on a mobile device that includes speech recognition functionality involves: receiving an utterance that includes a search request from a user of the device; recognizing that the utterance includes a search request; sending a representation of the search request to a remote server over a wireless data connection; receiving information over the wireless data connection that is responsive to the search request; presenting the information on the mobile device; receiving an input from the user selecting an item present in the received information, the item identifying a remote resource; using the selected item to connect to the remote resource, the connection to the remote resource not involving the remote server; and sending to the remote server an indication that a connection was made to the resource identified by the selected item. The method further involves storing a log of the user's connection to remote resources and sending the log to the server.
摘要:
In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
摘要:
A method of extracting a subset of speech units from a larger set of speech units for use by a speech synthesizer in synthesizing speech, wherein the speech units are stored in a compressed encoded representation that was generated by a codec, the method comprising: selecting members of the subset of speech units based on an overall cost associated with using the speech synthesizer to synthesize a test set of speech, wherein the overall cost includes at least one error introduced by using the codec to decode the stored representations of the speech units; and storing the selected subset of speech units on a speech-enabled device.
摘要:
A method of operating a device that includes speech recognition capabilities includes implementing on a device a plurality of user interfaces, wherein at least one said user interfaces is a voice interface. The method also includes launching a first application, and as part of launching the first application, launching a second application, the second application optionally presenting to a user at least one query using the voice interface and populating an address field in the first application in response to the query using the speech recognition capabilities. The second application is launched either simultaneously or subsequent to the launching of the first application. Populating the address field comprises accessing address information from a plurality of databases resident in the device.
摘要:
Large vocabulary speech recognition can automatically turn recognition off in one or more ways. A user command can turn on recognition that is automatically turned off after the next end of utterance. A plurality of buttons can each be associated with a different speech mode and the touch of a given button can turn on, and then automatically turn off, the given button's associated speech recognition mode. These selectable modes can include large vocabulary and alphabetic entry modes, or continuous and discrete modes. A first user input can start recognition that allows a sequence of vocabulary words to be recognized and a second user input can start recognition that turns off after one word has been recognized. A first user input can start recognition that allows a sequence of utterances to be recognized and a second user input can start recognition that allows only a single utterance to be recognized.
摘要:
A method of constructing a list of alternate transcripts from a recognized transcript includes generating a list of close call records, matching partial sub-histories from the recognized transcript with one of the history pairs stored in each of the records, and substituting the other of the history pairs for the partial sub-history of the recognized transcript. A close call record is generated each time a pair of partial hypotheses attempt to seed a common word. Each close call record includes history information and scoring information associated with a particular pair of partial hypotheses seeding a common word. Alternate transcripts are constructed by substituting close call histories for partial histories of the recognized transcripts, and also by substituting close call histories for partial histories of other alternate transcript.
摘要:
A method implemented on a mobile device that includes speech recognition functionality involves presenting to a user of the mobile device a voice-control interface that supports two types of commands at a common level of the interface, the two types of commands including a first type and a second type, the first type being command and control commands and the second type being search request commands. The method further involves: receiving an utterance from the user that corresponds to a command of either of the first type or the second type; recognizing the utterance; if the received utterance is a command of the first type, performing a corresponding command and control function; and if the received utterance is a command of the second type, generating a representation of a corresponding search request and then using the representation to request a search that is responsive to the search request.