摘要:
A language model is modified for a local speech recognition system using remote speech recognition sources. In one example, a speech utterance is received. The speech utterance is sent to at least one remote speech recognition system. Text results corresponding to the utterance are received from the remote speech recognition system. A local text result is generated using local vocabulary. The received text results and the generated text result are compared to determine words that are out of the local vocabulary and the local vocabulary is updated using the out of vocabulary words.
摘要:
A language model is modified for a local speech recognition system using remote speech recognition sources. In one example, a speech utterance is received. The speech utterance is sent to at least one remote speech recognition system. Text results corresponding to the utterance are received from the remote speech recognition system. A local text result is generated using local vocabulary. The received text results and the generated text result are compared to determine words that are out of the local vocabulary and the local vocabulary is updated using the out of vocabulary words.
摘要:
A method for context-aware query recognition in an electronic device includes receiving user speech from an input device. A speech signal is generated from the user speech. It is determined if the speech signal includes an action to be performed and if the electronic device is the intended recipient of the user speech. If the recognized speech signal include the action and the intended recipient of the user speech is the electronic device, a command is generated for the electronic device to perform the action.
摘要:
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include implementing frame skipping with approximated skip frames and/or distances on demand such that only those outputs needed by a speech decoder are provided via the neural network or approximation techniques.
摘要:
Methods, apparatus, systems and articles of manufacture are disclosed for distributed automatic speech recognition. An example apparatus includes a detector to process an input audio signal and identify a portion of the input audio signal including a sound to be evaluated, the sound to be evaluated organized into a plurality of audio features representing the sound. The example apparatus includes a quantizer to process the audio features using a quantization process to reduce the audio features to generate a reduced set of audio features for transmission. The example apparatus includes a transmitter to transmit the reduced set of audio features over a low-energy communication channel for processing.
摘要:
This disclosure describes systems, methods, and devices related to automatic personal identifiable information (PII) removal. A system may detect a sound signal received from a vicinity of a machine during the operation of the machine. The system may perform speech detection to detect a segment of the sound signal that comprises a speech signal. The system may modify the sound signal at the segment of the sound signal by performing a segment replacement mechanism. The system may generate a filtered sound signal to be used for monitoring the operation of the machine.
摘要:
An example apparatus for concealing phrases in audio includes a receiver to receive a detected phrase via a network. The detected phrase is based on audio captured near a source of an audio stream. The apparatus also includes a speech recognizer to generate a trigger in response to detecting that a section of the audio stream contains a confirmed phrase. The apparatus further includes a phrase concealer to conceal the section of the audio stream in response to the trigger.
摘要:
Embodiments of a system and method for adapting a phase difference-based noise reduction system are generally described herein. In some embodiments, spatial information associated with a first and second audio signal are determined, wherein the first and second audio signals including a target audio inside a beam and noise from outside the beam. A signal-to-noise ratio (SNR) associated with the audio signals is estimated. A mapping of phase differences to gain factors is adapted for determination of attenuation factors for attenuating frequency bins associated with noise outside the beam. Spectral subtraction is performed to remove estimated noise from the single-channel signal based on a weighting that affects frequencies associated with a target signal less. Frequency dependent attenuation factors are applied to attenuate frequency bins outside the beam to produce a target signal having noise reduced.
摘要:
This disclosure describes systems, methods, and devices related to presenting video conferencing virtual seating arrangements. A method may include generating a first similarity score indicative of a first similarity between a first voice of a first virtual meeting user and a second voice of a second virtual meeting user; generating a second similarity score indicative of a second similarity between the first voice of the first virtual meeting user and a third voice of a third virtual meeting user; determining, based on the first similarity score and the second similarity score, a similarity loss for a virtual seating arrangement; determining that the similarity loss is a minimum similarity loss of respective similarity losses for different virtual seating arrangements; generating presentation data, for the virtual meeting, including virtual representations of the virtual meeting users arranged based on the virtual seating arrangement; and presenting the presentation data.