Abstract:
A matrix is generated that stores sinusoidal components evaluated for a given sample rate corresponding to the matrix. The matrix is then used to convert an audio signal to chroma vectors representing of a set of “chromae” (frequencies of interest). The conversion of an audio signal portion into its chromae enables more meaningful analysis of the audio signal than would be possible using the signal data alone. The chroma vectors of the audio signal can be used to perform analyzes such as comparisons with the chroma vectors obtained from other audio signals in order to identify audio matches.
Abstract:
The present disclosure provides systems and methods that leverage machine learning to refine and/or predict sensor outputs for multiple sensors. In particular, systems and methods of the present disclosure can include and use a machine-learned virtual sensor model that has been trained to receive sensor data from multiple sensors that is indicative of one or more measured parameters in each sensor's physical environment, recognize correlations among sensor outputs of the multiple sensors, and in response to receipt of the sensor data from multiple sensors, output one or more virtual sensor output values. The one or more virtual sensor output values can include one or more of refined sensor output values and one or more predicted future sensor output value.
Abstract:
A method of identifying similar media items is described. The method include identifying a first multiplicity of fingerprints representative of content segments of variable duration for a first media item and a second multiplicity of fingerprints representative of content segments of variable duration for a second media item. The method further includes comparing, by a processing device, a first group of the first multiplicity of fingerprints to a second group of the second multiplicity of fingerprints to generate a first similarity score indicative of a similarity between the first group of fingerprints and the second group of fingerprints. The method also includes determining an alignment score for the first multiplicity of fingerprints and the second multiplicity of fingerprints using the first similarity score.
Abstract:
A match score provides a semantically-meaningful quantification of the aural similarity of two chromae from two corresponding audio sequences. The match score can be applied to the chroma pairs of two corresponding audio sequences, and is independent of the lengths of the sequences, thereby permitting comparisons of matches across subsequences of different length. Accordingly, a single cutoff match score to identify “good” audio subsequence matches can be determined and has both good precision and good recall metrics. A function for determining the match score is determined by establishing a function PM indicating probabilities that chroma correspondence scores indicate semantic correspondences, and a function PR indicating probabilities that chroma correspondence scores indicate random correspondences, repeatedly updating PM and the match function based on existing values of PM and the match function as applied to audio subsequences with known semantic correspondences.
Abstract:
A match score provides a semantically-meaningful quantification of the aural similarity of two chromae from two corresponding audio sequences. The match score can be applied to the chroma pairs of two corresponding audio sequences, and is independent of the lengths of the sequences, thereby permitting comparisons of matches across subsequences of different length. Accordingly, a single cutoff match score to identify “good” audio subsequence matches can be determined and has both good precision and good recall metrics. A function for determining the match score is determined by establishing a function PM indicating probabilities that chroma correspondence scores indicate semantic correspondences, and a function PR indicating probabilities that chroma correspondence scores indicate random correspondences, repeatedly updating PM and the match function based on existing values of PM and the match function as applied to audio subsequences with known semantic correspondences.
Abstract:
A method of identifying similar media items is described. The method include identifying a first multiplicity of fingerprints representative of content segments of variable duration for a first media item and a second multiplicity of fingerprints representative of content segments of variable duration for a second media item. The method further includes comparing, by a processing device, a first group of the first multiplicity of fingerprints to a second group of the second multiplicity of fingerprints to generate a first similarity score indicative of a similarity between the first group of fingerprints and the second group of fingerprints. The method also includes determining an alignment score for the first multiplicity of fingerprints and the second multiplicity of fingerprints using the first similarity score.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for collaboration between multiple voice controlled devices are disclosed. In one aspect, a method includes the actions of identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving audio data that corresponds to an utterance; receiving a transcription of additional audio data outputted by the second computing device in response to the utterance; based on the transcription of the additional audio data and based on the utterance, generating a transcription that corresponds to a response to the additional audio data; and providing, for output, the transcription that corresponds to the response.
Abstract:
The present disclosure provides systems and methods that leverage machine-learned models (e.g., neural networks) to provide enhanced communication assistance. In particular, the systems and methods of the present disclosure can include or otherwise leverage a machine-learned communication assistance model to detect problematic statements included in a communication and/or provide suggested replacement statements to respectively replace the problematic statements. In one particular example, the communication assistance model can include a long short-term memory recurrent neural network that detects an inappropriate tone or unintended meaning within a user-composed communication and provides one or more suggested replacement statements to replace the problematic statements.
Abstract:
A matrix is generated that stores sinusoidal components evaluated for a given sample rate corresponding to the matrix. The matrix is then used to convert an audio signal to chroma vectors representing of a set of “chromae” (frequencies of interest). The conversion of an audio signal portion into its chromae enables more meaningful analysis of the audio signal than would be possible using the signal data alone. The chroma vectors of the audio signal can be used to perform analyzes such as comparisons with the chroma vectors obtained from other audio signals in order to identify audio matches.