Ensemble of machine learning models for automatic scene change detection

    公开(公告)号:US11776273B1

    公开(公告)日:2023-10-03

    申请号:US17107514

    申请日:2020-11-30

    CPC classification number: G06V20/49 G06F18/213 G06N5/04 G06N20/20 G10L25/78

    Abstract: Techniques for automatic scene change detection are described. As one example, a computer-implemented method includes receiving a request to train an ensemble of machine learning models on a training dataset of videos having labels that indicate scene changes to detect a scene change in a video, partitioning each video file of the training dataset of videos into a plurality of shots, training the ensemble of machine learning models into a trained ensemble of machine learning models based at least in part on the plurality of shots of the training dataset of videos and the labels that indicate scene changes, receiving an inference request for an input video, partitioning the input video into a plurality of shots, generating, by the trained ensemble of machine learning models, an inference of one or more scene changes in the input video based at least in part on the plurality of shots of the input video, and transmitting the inference to a client application or to a storage location.

    LANGUAGE AGNOSTIC DRIFT CORRECTION
    13.
    发明公开

    公开(公告)号:US20230282006A1

    公开(公告)日:2023-09-07

    申请号:US18175044

    申请日:2023-02-27

    Abstract: Systems, methods, and computer-readable media are disclosed for language-agnostic subtitle drift detection and correction. A method may include determining subtitles and/or captions from media content (e.g., videos), the subtitles and/or captions corresponding to dialog in the media content. The subtitles may be broken up into segments which may be analyzed to determine a likelihood of drift (e.g., a likelihood that the subtitles are out of synchronization with the dialog in the media content) for each segment. For segments with a high likelihood of drift, the subtitles may be incrementally adjusted to determine an adjustment that eliminates and/or reduces the amount of drift, and the drift in the segment may be corrected based on the drift amount detected. A linear regression model and/or human blocks determined by human operators may be used to otherwise optimize drift correction.

    Language agnostic automated voice activity detection

    公开(公告)号:US11869537B1

    公开(公告)日:2024-01-09

    申请号:US17523777

    申请日:2021-11-10

    CPC classification number: G10L25/84 G10L15/063 G10L15/16 G10L15/22 G10L25/18

    Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating audio segments using the audio file, the audio segments including a first segment and a second segment, and determining that the first segment includes first voice activity. Methods may include determining that the second segment comprises second voice activity, determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment, and generating text data representing the voice activity that is present between the first timestamp and the second timestamp.

    Media classification using local and global audio features

    公开(公告)号:US11617008B1

    公开(公告)日:2023-03-28

    申请号:US17218009

    申请日:2021-03-30

    Abstract: Methods, systems, and computer-readable media for media classification using local and global audio features are disclosed. A media classification system determines local features of an audio input using an audio event detector model that is trained to detect a plurality of audio event classes descriptive of objectionable content. The local features are extracted using maximum values of audio event scores for individual audio event classes. The media classification system determines one or more global features of the audio input using the audio event detector model. The global feature(s) are extracted using averaging of clip-level descriptors of a plurality of clips of the audio input. The media classification system determines a content-based rating for media comprising the audio input based (at least in part) on the local features of the audio input and based (at least in part) on the global feature(s) of the audio input.

Patent Agency Ranking