摘要:
A method for context-aware query recognition in an electronic device includes receiving user speech from an input device. A speech signal is generated from the user speech. It is determined if the speech signal includes an action to be performed and if the electronic device is the intended recipient of the user speech. If the recognized speech signal include the action and the intended recipient of the user speech is the electronic device, a command is generated for the electronic device to perform the action.
摘要:
A language model is modified for a local speech recognition system using remote speech recognition sources. In one example, a speech utterance is received. The speech utterance is sent to at least one remote speech recognition system. Text results corresponding to the utterance are received from the remote speech recognition system. A local text result is generated using local vocabulary. The received text results and the generated text result are compared to determine words that are out of the local vocabulary and the local vocabulary is updated using the out of vocabulary words.
摘要:
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include implementing frame skipping with approximated skip frames and/or distances on demand such that only those outputs needed by a speech decoder are provided via the neural network or approximation techniques.
摘要:
Methods, apparatus, systems and articles of manufacture are disclosed for distributed automatic speech recognition. An example apparatus includes a detector to process an input audio signal and identify a portion of the input audio signal including a sound to be evaluated, the sound to be evaluated organized into a plurality of audio features representing the sound. The example apparatus includes a quantizer to process the audio features using a quantization process to reduce the audio features to generate a reduced set of audio features for transmission. The example apparatus includes a transmitter to transmit the reduced set of audio features over a low-energy communication channel for processing.
摘要:
This disclosure describes systems, methods, and devices related to presenting video conferencing virtual seating arrangements. A method may include generating a first similarity score indicative of a first similarity between a first voice of a first virtual meeting user and a second voice of a second virtual meeting user; generating a second similarity score indicative of a second similarity between the first voice of the first virtual meeting user and a third voice of a third virtual meeting user; determining, based on the first similarity score and the second similarity score, a similarity loss for a virtual seating arrangement; determining that the similarity loss is a minimum similarity loss of respective similarity losses for different virtual seating arrangements; generating presentation data, for the virtual meeting, including virtual representations of the virtual meeting users arranged based on the virtual seating arrangement; and presenting the presentation data.
摘要:
Methods, apparatus, systems and articles of manufacture are disclosed for distributed automatic speech recognition. An example apparatus includes a detector to process an input audio signal and identify a portion of the input audio signal including a sound to be evaluated, the sound to be evaluated organized into a plurality of audio features representing the sound. The example apparatus includes a quantizer to process the audio features using a quantization process to reduce the audio features to generate a reduced set of audio features for transmission. The example apparatus includes a transmitter to transmit the reduced set of audio features over a low-energy communication channel for processing.
摘要:
A language model is modified for a local speech recognition system using remote speech recognition sources. In one example, a speech utterance is received. The speech utterance is sent to at least one remote speech recognition system. Text results corresponding to the utterance are received from the remote speech recognition system. A local text result is generated using local vocabulary. The received text results and the generated text result are compared to determine words that are out of the local vocabulary and the local vocabulary is updated using the out of vocabulary words.
摘要:
Technologies for identifying sounds are disclosed. A sound identification device may capture sound data, and split the sound data into frames. The sound identification device may then determine an acoustic feature vector for each frame, and determine parameters based on how each acoustic feature varies over the duration of time corresponding to the frames. The sound identification device may then determine if the sound matches a pre-defined sound based on the parameters. In one embodiment, the sound identification device may be a baby monitor, and the pre-defined sound may be a baby crying.
摘要:
This disclosure describes systems, methods, and devices related to automatic personal identifiable information (PII) removal. A system may detect a sound signal received from a vicinity of a machine during the operation of the machine. The system may perform speech detection to detect a segment of the sound signal that comprises a speech signal. The system may modify the sound signal at the segment of the sound signal by performing a segment replacement mechanism. The system may generate a filtered sound signal to be used for monitoring the operation of the machine.