摘要:
The disclosed subject matter provides a system, computer readable storage medium, and a method providing an audio and textual transcript of a communication. A conferencing services may receive audio or audio visual signals from a plurality of different devices that receive voice communications from participants in a communication, such as a chat or teleconference. The audio signals representing voice (speech) communications input into respective different devices by the participants. A translation services server may receive over a separate communication channel the audio signals for translation into a second language. As managed by the translation services server, the audio signals may be converted into textual data. The textual data may be translated into text of different languages based the language preferences of the end user devices in the teleconference. The translated text may be further translated into audio signals.
摘要:
A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.
摘要:
The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing predictive pre-recording of audio for voice input. In one aspect, a method includes establishing, as input data, state data that references a state of a mobile device and sensor data that is sensed by one or more sensors of the mobile device, applying a rule or a probabilistic model to the input data, inferring, based on applying the rule or the probabilistic model to the input data, that a user of the mobile device is likely to initiate voice input, and invoking one or more functionalities of the mobile device in response to inferring that the user is likely to initiate voice input.
摘要:
A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing predictive pre-recording of audio for voice input. In one aspect, a method includes obtaining sensor data from one or more sensors of a mobile device while the mobile device is operating in an inactive state, determining that a user of the mobile device is interacting with the mobile device based on the sensor data, invoking voice input functionality of the mobile device in response to determining that the user of the mobile device is interacting with the mobile device, detecting a voice input, and activating the mobile device in response to detecting the voice input.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
摘要:
An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.