Patent search ap:("Google LLC") AND inv:"Ravi Ganti" Page 1

1.

发明公开
Diffusion Models for Generation of Audio Data Based on Descriptive Textual Prompts 审中-公开

公开(公告)号：US20240282294A1

公开(公告)日：2024-08-22

申请号：US18651296

申请日：2024-04-30

Applicant: Google LLC

Inventor： Qingqing Huang , Daniel Sung-Joon Park , Aren Jansen , Timo Immanuel Denk , Yue Li , Ravi Ganti , Dan Ellis , Tao Wang , Wei Han , Joonseok Lee

IPC: G10L15/06 , G10L15/16

CPC classification number: G10L15/063 , G10L15/16

Abstract: A corpus of textual data is generated with a machine-learned text generation model. The corpus of textual data includes a plurality of sentences. Each sentence is descriptive of a type of audio. For each of a plurality of audio recordings, the audio recording is processed with a machine-learned audio classification model to obtain training data including the audio recording and one or more sentences of the plurality of sentences closest to the audio recording within a joint audio-text embedding space of the machine-learned audio classification model. The sentence(s) are processed with a machine-learned generation model to obtain an intermediate representation of the one or more sentences. The intermediate representation is processed with a machine-learned cascaded diffusion model to obtain audio data. The machine-learned cascaded diffusion model is trained based on a difference between the audio data and the audio recording.

Patent Agency Ranking