Patent search ap:("Google LLC") AND inv:"Agoston Weisz" Page 1

1.

发明公开
Voice-history Based Speech Biasing 审中-公开

公开(公告)号：US20240194188A1

公开(公告)日：2024-06-13

申请号：US18063118

申请日：2022-12-08

Applicant: Google LLC

Inventor： Agoston Weisz , Mikhail Dektiarev

IPC: G10L15/07 , G10L15/30

CPC classification number: G10L15/07 , G10L15/30

Abstract: A method of using voice query history to improve speech recognition includes receiving audio data corresponding to a current query spoken by a user and processing the audio data to generate a lattice of candidate hypotheses. The method also includes obtaining voice query history data associated with the user that includes n-grams extracted from transcriptions of previous queries spoken by the user, and generating, using a biasing context model configured to receive the voice query history data, a biasing context vector. The biasing context vector indicates a likelihood that each n-gram from the n-grams extracted from the transcriptions of the previous queries spoken by the user will appear in the current query. The method also includes augmenting the lattice of candidate hypotheses based on the biasing context vector and determining a transcription for the current query based on the augmented lattice of candidate hypotheses.

2.

发明申请
GENERATING MULTI-MODAL RESPONSE(S) THROUGH UTILIZATION OF LARGE LANGUAGE MODEL(S) AND OTHER GENERATIVE MODEL(S) 有权

公开(公告)号：US20250139379A1

公开(公告)日：2025-05-01

申请号：US18385270

申请日：2023-10-30

Applicant: GOOGLE LLC

Inventor： Sanil Jain , Wei Yu , Alessandro Agostini , Agoston Weisz , Michael Andrew Goodman , Attila Dankovics , Elle Chae , Evgeny Sluzhaev , Amin Ghafouri , Golnaz Ghiasi , Igor Petrovski , Konstantin Shagin , Marcelo Menegali , Oscar Akerlund , Rakesh Shivanna , Thang Luong , Tiffany Chen , Vikas Peswani , Yifeng Lu

IPC: G06F40/40 , G06F16/483

Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)) and other generative model(s). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input to generate LLM output, and determine, based on the LLM output, textual content and generative multimedia content for inclusion in the multi-modal response. In some implementations, the generative multimedia content can be generated by another generative model (e.g., an image generator, a video generator, an audio generator, etc.) based on generative multimedia content prompt(s) included in the LLM output and that is indicative of the generative multimedia content. In various implementations, the generative multimedia content can be interleaved between segments of the textual content.

3.

发明申请
GENERATING MULTI-MODAL RESPONSE(S) THROUGH UTILIZATION OF LARGE LANGUAGE MODEL(S) 有权

公开(公告)号：US20250053751A1

公开(公告)日：2025-02-13

申请号：US18413495

申请日：2024-01-16

Applicant: GOOGLE LLC

Inventor： Oscar Akerlund , Evgeny Sluzhaev , Golnaz Ghiasi , Thang Luong , Yifeng Lu , Igor Petrovski , Agoston Weisz , Wei Yu , Rakesh Shivanna , Michael Andrew Goodman , Apoorv Kulshreshtha , Yu Du , Amin Ghafouri , Sanil Jain , Dustin Tran , Vikas Peswani , YaGuang Li

IPC: G06F40/40

Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.

4.

发明公开
Training Speech Recognizers Based On Biased Transcriptions 审中-公开

公开(公告)号：US20240257799A1

公开(公告)日：2024-08-01

申请号：US18161608

申请日：2023-01-30

Applicant: Google LLC

Inventor： Dragan Zivkovic , Agoston Weisz

IPC: G10L15/06 , G10L15/08 , G10L15/22

CPC classification number: G10L15/063 , G10L15/08 , G10L15/22 , G10L2015/0636 , G10L2015/088 , G10L2015/223

Abstract: A method includes receiving a biased transcription for a voice command spoken by a user and captured by a user device, the biased transcription biased to include a biasing phrase from a set of biasing phrases specific to the user. The method also includes instructing an application executing on the user device to perform an action specified by the biased transcription for the voice command, and receiving one or more user behavior signals responsive to the application performing the action specified by the biased transcription. The method further includes generating, as output from a confidence model, a confidence score of the biased transcription based on the one or more user behavior signals input to the confidence model and, based on the confidence score output from the confidence model, training a speech recognizer on the biased transcription.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification