-
公开(公告)号:US20230074406A1
公开(公告)日:2023-03-09
申请号:US17532794
申请日:2021-11-22
Applicant: GOOGLE LLC
Inventor: Martin Baeuml , Thushan Amarasiriwardena , Roberto Pieraccini , Vikram Sridar , Daniel De Freitas Adiwardana , Noam M. Shazeer , Quoc Le
IPC: G10L15/183 , G10L15/22
Abstract: As part of a dialog session between a user and an automated assistant, implementations can receive a stream of audio data that captures a spoken utterance including an assistant query, determine, based on processing the stream of audio data, a set of assistant outputs that are each predicted to be responsive to the assistant query, process, using large language model (LLM) output(s), the assistant outputs and context of the dialog session to generate a set of modified assistant outputs, and cause given modified assistant output, from among the set of modified assistant outputs, to be provided for presentation to the user in response to the spoken utterance. In some implementations, the LLM output(s) can be generated in an offline manner for subsequent use in an online manner. In additional or alternative implementations, the LLM output(s) can be generated in an online manner when the spoken utterance is received.
-
12.
公开(公告)号:US20210193146A1
公开(公告)日:2021-06-24
申请号:US17192230
申请日:2021-03-04
Applicant: Google LLC
Inventor: Ulas Kirazci , Adam Coimbra , Abraham Lee , Wei Dong , Thushan Amarasiriwardena
IPC: G10L15/22 , G06F9/448 , G06F3/16 , G10L13/027
Abstract: Techniques are described herein for multi-modal interaction between users, automated assistants, and other computing services. In various implementations, a user may engage with the automated assistant in order to further engage with a third party computing service. In some implementations, the user may advance through dialog state machines associated with third party computing service using both verbal input modalities and input modalities other than verbal modalities, such as visual/tactile modalities.
-
13.
公开(公告)号:US20200294497A1
公开(公告)日:2020-09-17
申请号:US15774950
申请日:2018-05-07
Applicant: Google LLC
Inventor: Ulas Kirazci , Adam Coimbra , Abraham Lee , Wei Dong , Thushan Amarasiriwardena
IPC: G10L15/22 , G10L13/027 , G06F3/16 , G06F9/448
Abstract: Techniques are described herein for multi-modal interaction between users, automated assistants, and other computing services. In various implementations, a user may engage with the automated assistant in order to further engage with a third party computing service. In some implementations, the user may advance through dialog state machines associated with third party computing service using both verbal input modalities and input modalities other than verbal modalities, such as visual/tactile modalities.
-
公开(公告)号:US20250037711A1
公开(公告)日:2025-01-30
申请号:US18912175
申请日:2024-10-10
Applicant: GOOGLE LLC
Inventor: Martin Baeuml , Thushan Amarasiriwardena , Roberto Pieraccini , Vikram Sridar , Daniel De Freitas Adiwardana , Noam M. Shazeer , Quoc Le
IPC: G10L15/183 , G06F16/9032 , G10L15/22
Abstract: As part of a dialog session between a user and an automated assistant, implementations can receive a stream of audio data that captures a spoken utterance including an assistant query, determine, based on processing the stream of audio data, a set of assistant outputs that are each predicted to be responsive to the assistant query, process, using large language model (LLM) output(s), the assistant outputs and context of the dialog session to generate a set of modified assistant outputs, and cause given modified assistant output, from among the set of modified assistant outputs, to be provided for presentation to the user in response to the spoken utterance. In some implementations, the LLM output(s) can be generated in an offline manner for subsequent use in an online manner. In additional or alternative implementations, the LLM output(s) can be generated in an online manner when the spoken utterance is received.
-
15.
公开(公告)号:US20240428793A1
公开(公告)日:2024-12-26
申请号:US18827655
申请日:2024-09-06
Applicant: GOOGLE LLC
Inventor: Ulas Kirazci , Adam Coimbra , Abraham Lee , Wei Dong , Thushan Amarasiriwardena
IPC: G10L15/22 , G06F3/16 , G06F9/448 , G10L13/027
Abstract: Techniques are described herein for multi-modal interaction between users, automated assistants, and other computing services. In various implementations, a user may engage with the automated assistant in order to further engage with a third party computing service. In some implementations, the user may advance through dialog state machines associated with third party computing service using both verbal input modalities and input modalities other than verbal modalities, such as visual/tactile modalities.
-
公开(公告)号:US12148421B2
公开(公告)日:2024-11-19
申请号:US17532794
申请日:2021-11-22
Applicant: GOOGLE LLC
Inventor: Martin Baeuml , Thushan Amarasiriwardena , Roberto Pieraccini , Vikram Sridar , Daniel De Freitas Adiwardana , Noam M. Shazeer , Quoc Le
IPC: G10L15/22 , G06F16/9032 , G10L15/183
Abstract: As part of a dialog session between a user and an automated assistant, implementations can receive a stream of audio data that captures a spoken utterance including an assistant query, determine, based on processing the stream of audio data, a set of assistant outputs that are each predicted to be responsive to the assistant query, process, using large language model (LLM) output(s), the assistant outputs and context of the dialog session to generate a set of modified assistant outputs, and cause given modified assistant output, from among the set of modified assistant outputs, to be provided for presentation to the user in response to the spoken utterance. In some implementations, the LLM output(s) can be generated in an offline manner for subsequent use in an online manner. In additional or alternative implementations, the LLM output(s) can be generated in an online manner when the spoken utterance is received.
-
17.
公开(公告)号:US11347801B2
公开(公告)日:2022-05-31
申请号:US16240609
申请日:2019-01-04
Applicant: Google LLC
Inventor: Adam Coimbra , Ulas Kirazci , Abraham Lee , Wei Dong , Thushan Amarasiriwardena
IPC: G06F16/9032 , G06F3/0485 , G10L13/02 , G10L15/22
Abstract: Techniques are described herein for multi-modal interaction between users, automated assistants, and other computing services. In various implementations, a user may engage with the automated assistant in order to further engage with a third party computing service. In some implementations, the user may advance through dialog state machines associated with third party computing service using both verbal input modalities and input modalities other than verbal modalities, such as visual/tactile modalities.
-
18.
公开(公告)号:US20190341040A1
公开(公告)日:2019-11-07
申请号:US16269275
申请日:2019-02-06
Applicant: Google LLC
Inventor: Ulas Kirazci , Adam Coimbra , Abraham Lee , Wei Dong , Thushan Amarasiriwardena , Yudong Sun , Xiao Gao
IPC: G10L15/22 , G10L13/02 , G06F3/16 , G06F3/0485
Abstract: Techniques are described herein for multi-modal interaction between users, automated assistants, and other computing services. In various implementations, a user may engage with the automated assistant in order to further engage with a third party computing service. In some implementations, the user may advance through dialog state machines associated with third party computing service using both verbal input modalities and input modalities other than verbal modalities, such as visual/tactile modalities.
-
-
-
-
-
-
-