Unsupervised learning of semantic audio representations

Invention Grant

US11335328B2 Unsupervised learning of semantic audio representations 有权

Please log in to see more content

Patent Title: Unsupervised learning of semantic audio representations
Application No.: US16758564

Application Date: 2018-10-26
Publication No.: US11335328B2

Publication Date: 2022-05-17
Inventor: Aren Jansen , Manoj Plakal , Richard Channing Moore , Shawn Hershey , Ratheet Pandya , Ryan Rifkin , Jiayang Liu , Daniel Ellis
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: McDonnell Boehnen Hulbert & Berghoff LLP
International Application: PCT/US2018/057734 WO 20181026
International Announcement: WO2019/084419 WO 20190502
Main IPC: G10L15/06
IPC: G10L15/06 ; G10L15/16 ; G10L15/02 ; G10L25/30 ; G06N3/04 ; G06N3/08 ; G10L25/18 ; G10L25/51

Unsupervised learning of semantic audio representations

Abstract:

Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

Public/Granted literature

US20200349921A1 Unsupervised Learning of Semantic Audio Representations Public/Granted day:2020-11-05

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）