Semantic Segmentation With Language Models For Long-Form Automatic Speech Recognition

    公开(公告)号:US20240290320A1

    公开(公告)日:2024-08-29

    申请号:US18585020

    申请日:2024-02-22

    Applicant: Google LLC

    CPC classification number: G10L15/063 G06F40/30 G10L15/26

    Abstract: A joint segmenting and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame. The model also includes a decoder to generate based on the higher order feature representation at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of segment (EOS). The model is trained on a set of training samples, each training sample including audio data characterizing multiple segments of long-form speech; and a corresponding transcription of the long-form speech, the corresponding transcription annotated with ground-truth EOS labels obtained via distillation from a language model teacher that receives the corresponding transcription as input and injects the ground-truth EOS labels into the corresponding transcription between semantically complete segments.

Patent Agency Ranking