Self-supervised pitch estimation

Invention Grant

US11756530B2 Self-supervised pitch estimation 有权

Please log in to see more content

Patent Title: Self-supervised pitch estimation
Application No.: US17640579

Application Date: 2020-09-25
Publication No.: US11756530B2

Publication Date: 2023-09-12
Inventor: Marco Tagliasacchi , Mihajlo Velimirovic , Matthew Sharifi , Dominik Roblek , Christian Frank , Beat Gfeller
Applicant: GOOGLE LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: McDonnell Boehnen Hulbert & Berghoff LLP
International Application: PCT/US2020/052722 2020.09.25
International Announcement: WO2021/076297A 2021.04.22
Date entered country: 2022-03-04
Main IPC: G10L15/06
IPC: G10L15/06 ; G10L21/013 ; G10L25/30 ; G10L25/90

Abstract:

Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

Public/Granted literature

US20220343896A1 SELF-SUPERVISED PITCH ESTIMATION Public/Granted day:2022-10-27

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）