-
1.
公开(公告)号:US20230343323A1
公开(公告)日:2023-10-26
申请号:US17726244
申请日:2022-04-21
Applicant: GOOGLE LLC
Inventor: Martin Baeuml , Thushan Amarasiriwardena , Roberto Pieraccini , Gianluca Martini
CPC classification number: G10L13/10 , G10L15/22 , G10L15/1815 , G10L2015/223
Abstract: Implementations relate to dynamically adapting a given assistant output based on a given persona, from among a plurality of disparate personas, assigned to an automated assistant. In some implementations, the given assistant output can be generated and subsequently adapted based on the given persona assigned to the automated assistant. In other implementations, the given assistant output can be generated specific to the given persona and without having to subsequently adapt the given assistant output to the given persona. Notably, the given assistant output can include a stream of textual content to be synthesized for audible presentation to the user, and a stream of visual cues utilized in controlling a display of a client device and/or in controlling a visualized representation of the automated assistant. Various implementations utilize large language models (LLMs), or output previously generated utilizing LLMs, to reflect the given persona in the given assistant output.
-
公开(公告)号:US20230274729A1
公开(公告)日:2023-08-31
申请号:US18312587
申请日:2023-05-04
Applicant: Google LLC
Inventor: Olga Kapralova , Evgeny A. Cherepanov , Dmitry Osmakov , Martin Baeuml , Gleb Skobeltsyn
CPC classification number: G10L15/063 , G10L15/06 , G10L15/22 , G10L15/32 , G10L15/01 , G10L15/10 , G10L2015/0635 , G10L2015/0638
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
-
公开(公告)号:US10482882B2
公开(公告)日:2019-11-19
申请号:US15825919
申请日:2017-11-29
Applicant: Google LLC
Inventor: Vladimir Vuskovic , Stephan Wenger , Zineb Ait Bahajji , Martin Baeuml , Alexandru Dovlecel , Gleb Skobeltsyn
Abstract: Methods, apparatus, and computer readable media are described related to automated assistants that proactively incorporate, into human-to-computer dialog sessions, unsolicited content of potential interest to a user. In various implementations, based on content of an existing human-to-computer dialog session between a user and an automated assistant, an entity mentioned by the user or automated assistant may be identified. Fact(s)s related to the entity or to another entity that is related to the entity may be identified based on entity data contained in database(s). For each of the fact(s), a corresponding measure of potential interest to the user may be determined. Unsolicited natural language content may then be generated that includes one or more of the facts selected based on the corresponding measure(s) of potential interest. The automated assistant may then incorporate the unsolicited content into the existing human-to-computer dialog session or a subsequent human-to-computer dialog session.
-
公开(公告)号:US12183342B2
公开(公告)日:2024-12-31
申请号:US18230581
申请日:2023-08-04
Applicant: GOOGLE LLC
Inventor: Vladimir Vuskovic , Stephan Wenger , Zineb Ait Bahajji , Martin Baeuml , Alexandru Dovlecel , Gleb Skobeltsyn
IPC: G10L15/22 , G06F40/295 , G06F40/35 , G06F40/56 , G10L15/18
Abstract: Methods, apparatus, and computer readable media are described related to automated assistants that proactively incorporate, into human-to-computer dialog sessions, unsolicited content of potential interest to a user. In various implementations, based on content of an existing human-to-computer dialog session between a user and an automated assistant, an entity mentioned by the user or automated assistant may be identified. Fact(s)s related to the entity or to another entity that is related to the entity may be identified based on entity data contained in database(s). For each of the fact(s), a corresponding measure of potential interest to the user may be determined. Unsolicited natural language content may then be generated that includes one or more of the facts selected based on the corresponding measure(s) of potential interest. The automated assistant may then incorporate the unsolicited content into the existing human-to-computer dialog session or a subsequent human-to-computer dialog session.
-
5.
公开(公告)号:US20240311405A1
公开(公告)日:2024-09-19
申请号:US18337316
申请日:2023-06-19
Applicant: GOOGLE LLC
Inventor: Seungyeon Kim , Ankit Singh Rawat , Wittawat Jitkrittum , Hari Narasimhan , Sashank Reddi , Neha Gupta , Srinadh Bhojanapalli , Aditya Menon , Manzil Zaheer , Tal Schuster , Sanjiv Kumar , Toby Boyd , Zhifeng Chen , Emanuel Taropa , Vikram Kasivajhula , Trevor Strohman , Martin Baeuml , Leif Schelin , Yanping Huang
IPC: G06F16/332
CPC classification number: G06F16/3329
Abstract: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified. This, in turn, can mitigate occurrences of computational and/or network inefficiencies that result from a user issuing a follow-up request to cure the inaccuracies and/or under-specification of a generated response.
-
公开(公告)号:US11887592B2
公开(公告)日:2024-01-30
申请号:US17411532
申请日:2021-08-25
Applicant: Google LLC
Inventor: Vladimir Vuskovic , Stephan Wenger , Zineb Ait Bahajji , Martin Baeuml , Alexandru Dovlecel , Gleb Skobeltsyn
IPC: G10L15/22 , G06F40/35 , G06F40/56 , G06F40/295 , G10L15/18
CPC classification number: G10L15/22 , G06F40/295 , G06F40/35 , G06F40/56 , G10L15/1815 , G10L15/222 , G10L2015/227
Abstract: Methods, apparatus, and computer readable media are described related to automated assistants that proactively incorporate, into human-to-computer dialog sessions, unsolicited content of potential interest to a user. In various implementations, based on content of an existing human-to-computer dialog session between a user and an automated assistant, an entity mentioned by the user or automated assistant may be identified. Fact(s)s related to the entity or to another entity that is related to the entity may be identified based on entity data contained in database(s). For each of the fact(s), a corresponding measure of potential interest to the user may be determined. Unsolicited natural language content may then be generated that includes one or more of the facts selected based on the corresponding measure(s) of potential interest. The automated assistant may then incorporate the unsolicited content into the existing human-to-computer dialog session or a subsequent human-to-computer dialog session.
-
公开(公告)号:US11682381B2
公开(公告)日:2023-06-20
申请号:US17457421
申请日:2021-12-02
Applicant: Google LLC
Inventor: Olga Kapralova , Evgeny A. Cherepanov , Dmitry Osmakov , Martin Baeuml , Gleb Skobeltsyn
CPC classification number: G10L15/063 , G10L15/01 , G10L15/06 , G10L15/10 , G10L15/22 , G10L15/32 , G10L2015/0635 , G10L2015/0638
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
-
公开(公告)号:US20230074406A1
公开(公告)日:2023-03-09
申请号:US17532794
申请日:2021-11-22
Applicant: GOOGLE LLC
Inventor: Martin Baeuml , Thushan Amarasiriwardena , Roberto Pieraccini , Vikram Sridar , Daniel De Freitas Adiwardana , Noam M. Shazeer , Quoc Le
IPC: G10L15/183 , G10L15/22
Abstract: As part of a dialog session between a user and an automated assistant, implementations can receive a stream of audio data that captures a spoken utterance including an assistant query, determine, based on processing the stream of audio data, a set of assistant outputs that are each predicted to be responsive to the assistant query, process, using large language model (LLM) output(s), the assistant outputs and context of the dialog session to generate a set of modified assistant outputs, and cause given modified assistant output, from among the set of modified assistant outputs, to be provided for presentation to the user in response to the spoken utterance. In some implementations, the LLM output(s) can be generated in an offline manner for subsequent use in an online manner. In additional or alternative implementations, the LLM output(s) can be generated in an online manner when the spoken utterance is received.
-
9.
公开(公告)号:US20230343324A1
公开(公告)日:2023-10-26
申请号:US17744440
申请日:2022-05-13
Applicant: GOOGLE LLC
Inventor: Martin Baeuml , Thushan Amarasiriwardena , Roberto Pieraccini , Gianluca Martini
IPC: G06V40/20 , G10L25/57 , G10L15/06 , H04N5/04 , G06F40/169 , G10L15/183 , G06T7/20 , G10L13/08 , G10L15/22 , G06V20/40 , G10L13/02
CPC classification number: G10L15/22 , G06F40/169 , G06T7/20 , G06V20/40 , G06V40/20 , G10L13/02 , G10L13/08 , G10L15/063 , G10L15/183 , G10L25/57 , H04N5/04 , G06T2207/10016 , G06T2207/30196
Abstract: Implementations relate to dynamically adapting a given assistant output based on a given persona, from among a plurality of disparate personas, assigned to an automated assistant. In some implementations, the given assistant output can be generated and subsequently adapted based on the given persona assigned to the automated assistant. In other implementations, the given assistant output can be generated specific to the given persona and without having to subsequently adapt the given assistant output to the given persona. Notably, the given assistant output can include a stream of textual content to be synthesized for audible presentation to the user, and a stream of visual cues utilized in controlling a display of a client device and/or in controlling a visualized representation of the automated assistant. Various implementations utilize large language models (LLMs), or output previously generated utilizing LLMs, to reflect the given persona in the given assistant output.
-
公开(公告)号:US11200887B2
公开(公告)日:2021-12-14
申请号:US16837393
申请日:2020-04-01
Applicant: Google LLC
Inventor: Olga Kapralova , Evgeny A. Cherepanov , Dmitry Osmakov , Martin Baeuml , Gleb Skobeltsyn
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
-
-
-
-
-
-
-
-
-