AUTOMATIC SMOOTHED CAPTIONING OF NON-SPEECH SOUNDS FROM AUDIO

    公开(公告)号:US20170278525A1

    公开(公告)日:2017-09-28

    申请号:US15245152

    申请日:2016-08-23

    Applicant: Google Inc.

    Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.

    Filtering wind noises in video content

    公开(公告)号:US10356469B2

    公开(公告)日:2019-07-16

    申请号:US15826622

    申请日:2017-11-29

    Applicant: Google Inc.

    Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.

    AUTOMATIC DETERMINATION OF TIMING WINDOWS FOR SPEECH CAPTIONS IN AN AUDIO STREAM

    公开(公告)号:US20170316792A1

    公开(公告)日:2017-11-02

    申请号:US15225513

    申请日:2016-08-01

    Applicant: Google Inc.

    CPC classification number: G10L25/27 G10L25/48 G10L25/87 G11B27/031

    Abstract: A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.

Patent Agency Ranking