-
公开(公告)号:US20220343896A1
公开(公告)日:2022-10-27
申请号:US17640579
申请日:2020-09-25
Applicant: GOOGLE LLC
Inventor: Marco TAGLIASACCHI , Mihajlo VELIMIROVIC , Matthew SHARIFI , Dominik ROBLEK , Christian FRANK , Beat GFELLER
IPC: G10L15/06 , G10L25/30 , G10L25/90 , G10L21/013
Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.