ZERO-SHOT FORM ENTITY QUERY FRAMEWORK
    1.
    发明公开

    公开(公告)号:US20240153297A1

    公开(公告)日:2024-05-09

    申请号:US18501982

    申请日:2023-11-03

    Applicant: Google LLC

    CPC classification number: G06V30/24 G06F16/211 G06V30/19147 G06V30/412

    Abstract: A method for extracting entities comprises obtaining a document that includes a series of textual fields that includes a plurality of entities. Each entity represents information associated with a predefined category. The method includes generating, using the document, a series of tokens representing the series of textual fields. The method includes generating an entity prompt that includes the series of tokens and one of the plurality of entities and generating a schema prompt that includes a schema associated with the document. The method includes generating a model query that includes the entity prompt and the schema prompt and determining, using an entity extraction model and the model query, a location of the one of the plurality of entities among the series of tokens. The method includes extracting, from the document, the one of the plurality of entities using the location of the one of the plurality of entities.

    Semantic Segmentation With Language Models For Long-Form Automatic Speech Recognition

    公开(公告)号:US20240290320A1

    公开(公告)日:2024-08-29

    申请号:US18585020

    申请日:2024-02-22

    Applicant: Google LLC

    CPC classification number: G10L15/063 G06F40/30 G10L15/26

    Abstract: A joint segmenting and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame. The model also includes a decoder to generate based on the higher order feature representation at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of segment (EOS). The model is trained on a set of training samples, each training sample including audio data characterizing multiple segments of long-form speech; and a corresponding transcription of the long-form speech, the corresponding transcription annotated with ground-truth EOS labels obtained via distillation from a language model teacher that receives the corresponding transcription as input and injects the ground-truth EOS labels into the corresponding transcription between semantically complete segments.

    End-to-end automated speech recognition on numeric sequences

    公开(公告)号:US11367432B2

    公开(公告)日:2022-06-21

    申请号:US16830996

    申请日:2020-03-26

    Applicant: Google LLC

    Abstract: A method for generating final transcriptions representing numerical sequences of utterances in a written domain includes receiving audio data for an utterance containing a numeric sequence, and decoding, using a sequence-to-sequence speech recognition model, the audio data for the utterance to generate, as output from the sequence-to-sequence speech recognition model, an intermediate transcription of the utterance. The method also includes processing, using a neural corrector/denormer, the intermediate transcription to generate a final transcription that represents the numeric sequence of the utterance in a written domain. The neural corrector/denormer is trained on a set of training samples, where each training sample includes a speech recognition hypothesis for a training utterance and a ground-truth transcription of the training utterance. The ground-truth transcription of the training utterance is in the written domain. The method also includes providing the final transcription representing the numeric sequence of the utterance in the written domain for output.

    End-To-End Automated Speech Recognition on Numeric Sequences

    公开(公告)号:US20200349922A1

    公开(公告)日:2020-11-05

    申请号:US16830996

    申请日:2020-03-26

    Applicant: Google LLC

    Abstract: A method for generating final transcriptions representing numerical sequences of utterances in a written domain includes receiving audio data for an utterance containing a numeric sequence, and decoding, using a sequence-to-sequence speech recognition model, the audio data for the utterance to generate, as output from the sequence-to-sequence speech recognition model, an intermediate transcription of the utterance. The method also includes processing, using a neural corrector/denormer, the intermediate transcription to generate a final transcription that represents the numeric sequence of the utterance in a written domain. The neural corrector/denormer is trained on a set of training samples, where each training sample includes a speech recognition hypothesis for a training utterance and a ground-truth transcription of the training utterance. The ground-truth transcription of the training utterance is in the written domain. The method also includes providing the final transcription representing the numeric sequence of the utterance in the written domain for output.

Patent Agency Ranking