NATURAL LANGUAGE GENERATION
    5.
    发明申请

    公开(公告)号:US20250104693A1

    公开(公告)日:2025-03-27

    申请号:US18474484

    申请日:2023-09-26

    Abstract: Techniques for using a language model (e.g., a large language model (LLM)) to generate a natural language response to a user input and prosody information (e.g., voice characteristics associated with a synthetic voice to output the natural language response to the user) are described. The prosody information may correspond to a natural language (e.g., text or tokenized) description, a spectrogram, and/or a latent representation of the voice characteristic(s) associated with the natural language response. In some embodiments, the natural language response and the prosody information may be generated by different portions of layers of the language model. In such embodiments, the output of the layer(s) of the language model configured to generate the natural language response may be provided to the layer(s) of the language model configured to generate the prosody information and the output may be used to generate the prosody information, and vice versa.

    Endpointing in speech processing
    6.
    发明授权

    公开(公告)号:US12211517B1

    公开(公告)日:2025-01-28

    申请号:US17475699

    申请日:2021-09-15

    Abstract: A speech-processing system may determine potential endpoints in a user's speech. Such endpoint prediction may include determining a potential endpoint in a stream of audio data, and may additionally including determining an endpoint score representing a likelihood that the potential endpoint represents an end of speech representing a complete user input. When the potential endpoint has been determined, the system may publish a transcript of speech that preceded the potential endpoint, and send it to downstream components. The system may continue to transcribe audio data and determine additional potential endpoints while the downstream components process the transcript. The downstream components may determine whether the transcript is complete; e.g., represents the entirety of the user input. Final endpoint determinations may be made based on the results of the downstream processing including automatic speech recognition, natural language understanding, etc.

Patent Agency Ranking