-
公开(公告)号:US20210217431A1
公开(公告)日:2021-07-15
申请号:US16740440
申请日:2020-01-11
Applicant: SoundHound, Inc.
Inventor: Steve PEARSON
IPC: G10L21/013 , G10L21/0208 , G06N3/08 , G06N20/00
Abstract: A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.
-
公开(公告)号:US20240144910A1
公开(公告)日:2024-05-02
申请号:US18051507
申请日:2022-10-31
Applicant: SoundHound, Inc.
Inventor: Steve PEARSON , Jon GROSSMAN
IPC: G10L13/047 , G10L13/06
CPC classification number: G10L13/047 , G10L13/06
Abstract: A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.
-
公开(公告)号:US20210193159A1
公开(公告)日:2021-06-24
申请号:US16740378
申请日:2020-01-10
Applicant: SoundHound, Inc.
Inventor: Steve PEARSON
Abstract: Systems and methods for training a voice morphing apparatus are described. The voice morphing apparatus is trained to morph input audio data to mask an identity of a speaker. Training is performed by evaluating an objective function that is a function of the input audio data and an output of the voice morphing apparatus. The objective function may have a first term that is based on speaker identification and a second term that is based on audio fidelity. By optimizing the objective function, parameters of the voice morphing apparatus may be adjusted so as to reduce a confidence of speaker identification and maintain an audio fidelity of the morphed audio data. The voice morphing apparatus, once trained, may be used as part of an automatic speech recognition system.
-
-