摘要:
An automatic speech recognition engine may generate text or tokens that correspond to audio data. For example, the automatic speech recognition engine may generate first text or first speech tokens corresponding to a first portion of audio data. The automatic speech recognition engine may further generate second text or second speech tokens that correspond to a first portion of the audio data and a second portion of the audio data. The text or speech tokens generated by the automatic speech recognition engine may be provided to a device for presentation thereon. In some embodiments, the automatic speech recognition engine generates the second text or second speech tokens substantially while the first text or first speech tokens are presented on the device.
摘要:
Provided are systems and methods for using hierarchical networks for recognition, such as speech recognition. Conventional automatic recognition systems may not be both efficient and flexible. Recognition systems are disclosed that may achieve efficiency and flexibility by employing hierarchical networks, prefix consolidation of networks, and future consolidation of networks. The disclosed networks may be associated with a network model and the associated network model may be modified during recognition to achieve greater flexibility.
摘要:
A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device.
摘要:
Audio data that includes speech may be transcribed to text by a speech recognition engine. One or more metrics associated with the audio data and/or the text may be determined. An indicator related to a metric may be provided for a portion of the audio data or the text for which the metric was determined. The indicator may be presented in a user-perceptible format.
摘要:
A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device.
摘要:
Audio data that includes speech may be transcribed by a speech recognition engine to generate speech recognition results, such as a transcription. One or more filters may be selected and applied to the speech recognition results to generate filtered speech recognition results. The one or more filters may be selected based at least in part on a characteristic of the speech recognition results, a characteristic of the audio data, or any other characteristic.
摘要:
A communication system includes at least one transmitting device and at least one receiving device, one or more network systems for connecting the transmitting device to the receiving device, and an automatic speech recognition (“ASR”) system, including an ASR engine. A user speaks an utterance into the transmitting device, and the recorded speech audio is sent to the ASR engine. The ASR engine returns intermediate transcription results to the transmitting device, which displays the intermediate transcription results in real-time to the user. The intermediate transcription results are also correlated by utterance fragment to final transcription results and displayed to the user. The user may use the information thus presented to make decisions as to whether to edit the final transcription results or to speak the utterance again, thereby repeating the process. The intermediate transcription results may also be used by the user to edit the final transcription results.
摘要:
A method is provided of providing cues from am electronic communication device to a user while capturing an utterance. A plurality of cues associated with the user utterance are provided by the device to the user in at least near real-time. For each of a plurality of portions of the utterance, data representative of the respective portion of the user utterance is communicated from the electronic communication device to a remote electronic device. In response to this communication, data, representative of at least one parameter associated with the respective portion of the user utterance, is received at the electronic communication device. The electronic communication device provides one or more cues to the user based on the at least parameter. At least one of the cues is provided by the electronic communication device to the user prior to completion of the step of capturing the user utterance.
摘要:
A method for presenting additional content for a word that is part of a message, and that is presented by a mobile communication device, includes the steps of: presenting the message, including emphasizing one or more words for which respective additional content is available for presenting by the mobile communication device; receiving an utterance that includes an emphasized word for which additional content is available for presenting by the mobile communication device; and presenting the additional content for the emphasized word included in the utterance received by the mobile communication device. These steps are performed by the mobile communication device.
摘要:
A system and method of validating an advertisement presented to an advertisement recipient via a mobile communication device includes presenting an advertisement for a product or service to a recipient via a mobile communication device, monitoring the geospatial location of the mobile communication device relative to some predetermined criteria, and inferring information about the reaction of the advertisement recipient to the advertisement on the basis of the monitored geospatial location information.