Abstract:
A computer-implemented method of speech recognition comprises forming a weighted finite state transducer (WFST) having nodes associated with states and interconnected by arcs, and to identify at least one word or word sequence hypothesis, identifying multiple sub-graphs on the WFST, each sub-graph having the same arrangement of multiple states and at least one arc, and propagating tokens in parallel through the sub-graphs, where each sub-graph is stored as a supertoken each having an array of tokens.
Abstract:
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include processing a node of the neural network by determining a score for the node as a product of weights and inputs such that the weights are fixed point integer values, applying a correction to the score based a correction value associated with at least one of the weights, and generating an output from the node based on the corrected score.
Abstract:
An apparatus to facilitate deepfake detection models utilizing subject-specific libraries is disclosed. The apparatus includes one or more processors to store a plurality of deepfake detection models corresponding to a plurality of subjects of interest; receive a query to identify whether data pertaining to a target subject of interest is a deepfake, the target subject of interest comprised in the plurality of subjects of interest and associated with a subject identifier (ID); identify a deepfake detection model corresponding to the subject ID; extract features for deepfake detection from the data; input the extracted features to the identified deepfake detection model corresponding to the subject ID; and responsive to an output of the deepfake detection model exceeding a determined deepfake threshold, generate a notification, in response to the query, indicating a possible deepfake attack corresponding to the target subject of interest.
Abstract:
An example apparatus for recognizing speech includes an audio receiver to receive a stream of audio. The apparatus also includes a key phrase detector to detect a key phrase in the stream of audio. The apparatus further includes a model adapter to dynamically adapt a model based on the detected key phrase. The apparatus also includes a query recognizer to detect a voice query following the key phrase in a stream of audio via the adapted model.
Abstract:
A method in a computing device for decoding a weighted finite state transducer (WFST) for automatic speech recognition is described. The method includes sorting a set of one or more WFST arcs based on their arc weight in ascending order. The method further includes iterating through each arc in the sorted set of arcs according to the ascending order until the score of the generated token corresponding to an arc exceeds a score threshold. The method further includes discarding any remaining arcs in the set of arcs that have yet to be considered.
Abstract:
A method in a computing device for decoding a weighted finite state transducer (WFST) for automatic speech recognition is described. The method includes sorting a set of one or more WFST arcs based on their arc weight in ascending order. The method further includes iterating through each arc in the sorted set of arcs according to the ascending order until the score of the generated token corresponding to an arc exceeds a score threshold. The method further includes discarding any remaining arcs in the set of arcs that have yet to be considered.
Abstract:
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include implementing frame skipping with approximated skip frames and/or distances on demand such that only those outputs needed by a speech decoder are provided via the neural network or approximation techniques.
Abstract:
An example apparatus for recognizing speech includes an audio receiver to receive a stream of audio. The apparatus also includes a key phrase detector to detect a key phrase in the stream of audio. The apparatus further includes a model adapter to dynamically adapt a model based on the detected key phrase. The apparatus also includes a query recognizer to detect a voice query following the key phrase in a stream of audio via the adapted model.