-
公开(公告)号:US20250104693A1
公开(公告)日:2025-03-27
申请号:US18474484
申请日:2023-09-26
Applicant: Amazon Technologies, Inc.
Inventor: Constantinos Papayiannis , Roberto Barra Chicote , Trevor Michael Wood , James Garnet Droppo
IPC: G10L13/10 , G10L13/047 , G10L25/18
Abstract: Techniques for using a language model (e.g., a large language model (LLM)) to generate a natural language response to a user input and prosody information (e.g., voice characteristics associated with a synthetic voice to output the natural language response to the user) are described. The prosody information may correspond to a natural language (e.g., text or tokenized) description, a spectrogram, and/or a latent representation of the voice characteristic(s) associated with the natural language response. In some embodiments, the natural language response and the prosody information may be generated by different portions of layers of the language model. In such embodiments, the output of the layer(s) of the language model configured to generate the natural language response may be provided to the layer(s) of the language model configured to generate the prosody information and the output may be used to generate the prosody information, and vice versa.