SYSTEMS AND METHODS FOR CONFIGURING AND USING AN AUDIO TRANSCRIPT CORRECTION MACHINE LEARNING MODEL

    公开(公告)号:US20230360652A1

    公开(公告)日:2023-11-09

    申请号:US18214336

    申请日:2023-06-26

    摘要: A system, method, and computer-program product includes constructing a transcript correction training data corpus that includes a plurality of labeled audio transcription training data samples, wherein each of the plurality of labeled audio transcription training data samples includes: an incorrect audio transcription of a target piece of audio data; a correct audio transcription of the target piece of audio data; and a transcript correction identifier that, when applied to a model input that includes a likely incorrect audio transcript, defines a text-to-text transformation objective causing an audio transcript correction machine learning model to predict a corrected audio transcript based on the likely incorrect audio transcript; configuring the audio transcript correction machine learning model based on a training of a machine learning text-to-text transformer model using the transcript correction training data corpus; and executing the audio transcript correction machine learning model within a speech-to-text post-processing sequence of a speech-to-text service.

    Optimal view selection in a teleconferencing system with cascaded cameras

    公开(公告)号:US11803984B2

    公开(公告)日:2023-10-31

    申请号:US17310571

    申请日:2020-06-04

    申请人: PLANTRONICS, INC.

    IPC分类号: G06T7/70 G10L25/78 H04L65/403

    摘要: A method (1000) for operating cameras (202) in a cascaded network (100), comprising: capturing a first view (1200) with a first lens (326) having a first focal point (328) and a first centroid (352), the first view (1200) depicting a subject (1106); capturing a second view (1202) with a second lens (326) having a second focal point (328) and a second centroid (352); detecting a first location of the subject (1106), relative the first lens (326), wherein detecting the first location of the subject (1106), relative the first lens (326), is based on audio captured by a plurality of microphones (204); estimating a second location of the subject (1106), relative the second lens (326), based on the first location of the subject (1106) relative the first lens (326); selecting a portion (1206) of the second view (1202) as depicting the subject (1106) based on the estimate of the second location of the subject (1106) relative the second lens (326).