-
公开(公告)号:US20240153297A1
公开(公告)日:2024-05-09
申请号:US18501982
申请日:2023-11-03
Applicant: Google LLC
Inventor: Zizhao Zhang , Zifeng Wang , Vincent Perot , Jacob Devlin , Chen-Yu Lee , Guolong Su , Hao Zhang , Tomas Jon Pfister
IPC: G06V30/24 , G06F16/21 , G06V30/19 , G06V30/412
CPC classification number: G06V30/24 , G06F16/211 , G06V30/19147 , G06V30/412
Abstract: A method for extracting entities comprises obtaining a document that includes a series of textual fields that includes a plurality of entities. Each entity represents information associated with a predefined category. The method includes generating, using the document, a series of tokens representing the series of textual fields. The method includes generating an entity prompt that includes the series of tokens and one of the plurality of entities and generating a schema prompt that includes a schema associated with the document. The method includes generating a model query that includes the entity prompt and the schema prompt and determining, using an entity extraction model and the model query, a location of the one of the plurality of entities among the series of tokens. The method includes extracting, from the document, the one of the plurality of entities using the location of the one of the plurality of entities.
-
公开(公告)号:US20240290320A1
公开(公告)日:2024-08-29
申请号:US18585020
申请日:2024-02-22
Applicant: Google LLC
Inventor: Wenqian Huang , Hao Zhang , Shankar Kumar , Shuo-yiin Chang , Tara N. Sainath
CPC classification number: G10L15/063 , G06F40/30 , G10L15/26
Abstract: A joint segmenting and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame. The model also includes a decoder to generate based on the higher order feature representation at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of segment (EOS). The model is trained on a set of training samples, each training sample including audio data characterizing multiple segments of long-form speech; and a corresponding transcription of the long-form speech, the corresponding transcription annotated with ground-truth EOS labels obtained via distillation from a language model teacher that receives the corresponding transcription as input and injects the ground-truth EOS labels into the corresponding transcription between semantically complete segments.
-
公开(公告)号:US20240153495A1
公开(公告)日:2024-05-09
申请号:US18494984
申请日:2023-10-26
Applicant: Google LLC
Inventor: Weiran Wang , Ding Zhao , Shaojin Ding , Hao Zhang , Shuo-yiin Chang , David Johannes Rybach , Tara N. Sainath , Yanzhang He , Ian McGraw , Shankar Kumar
IPC: G10L15/06 , G06F40/284 , G10L15/26
CPC classification number: G10L15/063 , G06F40/284 , G10L15/26
Abstract: A method includes receiving a training dataset that includes one or more spoken training utterances for training an automatic speech recognition (ASR) model. Each spoken training utterance in the training dataset paired with a corresponding transcription and a corresponding target sequence of auxiliary tokens. For each spoken training utterance, the method includes generating a speech recognition hypothesis for a corresponding spoken training utterance, determining a speech recognition loss based on the speech recognition hypothesis and the corresponding transcription, generating a predicted auxiliary token for the corresponding spoken training utterance, and determining an auxiliary task loss based on the predicted auxiliary token and the corresponding target sequence of auxiliary tokens. The method also includes the ASR model jointly on the speech recognition loss and the auxiliary task loss determined for each spoken training utterance.
-
公开(公告)号:US20220335049A1
公开(公告)日:2022-10-20
申请号:US17720862
申请日:2022-04-14
Applicant: Google LLC
Inventor: Vahit Hakan Hacigumus , Ankur Agiwal , Kevin I. Lai , Gokulnath Babu Manoharan , Indrajit Roy , Jagan Sankaranarayanan , Hao Zhang , Tao Zou , Rajesh Sambavarvadakarai Rajagopalan
IPC: G06F16/2457 , G06F16/2455 , G06F16/23 , G06F16/22 , G06F9/46
Abstract: The present disclosure describes an analytical data management system (ADMS) that serves critical dashboards, applications, and internal users. This ADMS has high scalability, and availability through replication and failover, high user query load, and large data volumes. The ADMS provides continuous ingestion and high performance querying with tunable freshness. It further advances the idea of disaggregation by decoupling its architectural components: ingestion, indexing, and querying. As a result, the impact of a slow down in indexing on the query performance is minimized by either trading off data freshness or incurring higher costs.
-
公开(公告)号:US11367432B2
公开(公告)日:2022-06-21
申请号:US16830996
申请日:2020-03-26
Applicant: Google LLC
Inventor: Charles Caleb Peyser , Hao Zhang , Tara N. Sainath , Zelin Wu
Abstract: A method for generating final transcriptions representing numerical sequences of utterances in a written domain includes receiving audio data for an utterance containing a numeric sequence, and decoding, using a sequence-to-sequence speech recognition model, the audio data for the utterance to generate, as output from the sequence-to-sequence speech recognition model, an intermediate transcription of the utterance. The method also includes processing, using a neural corrector/denormer, the intermediate transcription to generate a final transcription that represents the numeric sequence of the utterance in a written domain. The neural corrector/denormer is trained on a set of training samples, where each training sample includes a speech recognition hypothesis for a training utterance and a ground-truth transcription of the training utterance. The ground-truth transcription of the training utterance is in the written domain. The method also includes providing the final transcription representing the numeric sequence of the utterance in the written domain for output.
-
公开(公告)号:US20250068847A1
公开(公告)日:2025-02-27
申请号:US18453236
申请日:2023-08-21
Applicant: Google LLC
Inventor: Vincent Perot , Florian Luisier , Kai Kang , Ramya Sree Boppana , Jiaqi Mu , Xiaoyu Sun , Carl Elie Saroufim , Guolong Su , Hao Zhang , Nikolay Alexeevich Glushnev , Nan Hua , Yun-Hsuan Sung , Michael Yiupun Kwong
IPC: G06F40/295 , G06V30/19
Abstract: Systems and methods for performing document entity extraction are described herein. The method can include receiving an inference document and a target schema. The method can also include generating one or more document inputs from the inference document and one or more schema inputs from the target schema. The method can further include, for each combination of the document input and schema input, obtaining one or more extraction inputs by generating a respective extraction input based on the combination, providing the respective extraction input to the machine-learned model, and receiving a respective output of the machine-learned model based on the respective extraction. The method can also include validating the extracted entity data based on reference spatial locations and inference spatial locations and outputting the validated extracted entity data.
-
公开(公告)号:US20200349922A1
公开(公告)日:2020-11-05
申请号:US16830996
申请日:2020-03-26
Applicant: Google LLC
Inventor: Charles Caleb Peyser , Hao Zhang , Tara N. Sainath , Zelin Wu
Abstract: A method for generating final transcriptions representing numerical sequences of utterances in a written domain includes receiving audio data for an utterance containing a numeric sequence, and decoding, using a sequence-to-sequence speech recognition model, the audio data for the utterance to generate, as output from the sequence-to-sequence speech recognition model, an intermediate transcription of the utterance. The method also includes processing, using a neural corrector/denormer, the intermediate transcription to generate a final transcription that represents the numeric sequence of the utterance in a written domain. The neural corrector/denormer is trained on a set of training samples, where each training sample includes a speech recognition hypothesis for a training utterance and a ground-truth transcription of the training utterance. The ground-truth transcription of the training utterance is in the written domain. The method also includes providing the final transcription representing the numeric sequence of the utterance in the written domain for output.
-
-
-
-
-
-