-
公开(公告)号:US10037313B2
公开(公告)日:2018-07-31
申请号:US15245152
申请日:2016-08-23
Applicant: Google Inc.
Inventor: Fangzhou Wang , Sourish Chaudhuri , Daniel Ellis , Nathan Reale
CPC classification number: G06F17/241 , G10L15/20 , G10L25/78 , G10L25/84 , G10L2021/065 , G10L2025/783
Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.
-
公开(公告)号:US20180174600A1
公开(公告)日:2018-06-21
申请号:US15497497
申请日:2017-04-26
Applicant: Google Inc.
Inventor: Sourish Chaudhuri , Kenneth Hoover
IPC: G10L25/57 , G11B27/10 , G10L17/00 , G06K9/00 , G10L25/21 , G10L15/26 , G10L25/93 , G06K9/66 , G10L15/06 , H04N21/488
CPC classification number: G10L25/57 , G06K9/00288 , G06K9/00744 , G06K9/00765 , G06K9/66 , G10L15/063 , G10L15/265 , G10L17/005 , G10L17/04 , G10L17/10 , G10L21/0272 , G10L25/21 , G10L25/30 , G10L25/78 , G10L25/93 , G11B27/031 , G11B27/10 , G11B27/28 , H04N21/233 , H04N21/23418 , H04N21/4394 , H04N21/44008 , H04N21/4666 , H04N21/4884 , H04N21/8549
Abstract: A computer-implemented method for speech diarization is described. The method comprises determining temporal positions of separate faces in a video using face detection and clustering. Voice features are detected in the speech sections of the video. The method further includes generating a correlation between the determined separate faces and separate voices based at least on the temporal positions of the separate faces and the separate voices in the video. This correlation is stored in a content store with the video.
-
公开(公告)号:US20180084301A1
公开(公告)日:2018-03-22
申请号:US15826622
申请日:2017-11-29
Applicant: Google Inc.
Inventor: Elad Eban , Aren Jansen , Sourish Chaudhuri
IPC: H04N21/439 , H04N21/44 , H04N21/233 , G10L21/0208 , H04N5/60 , H04H60/58 , G10L25/57 , H04N5/911
CPC classification number: H04N21/4398 , G10L21/0208 , G10L25/57 , H04H60/12 , H04H60/58 , H04H60/65 , H04N5/602 , H04N5/911 , H04N9/802 , H04N21/233 , H04N21/4394 , H04N21/44016
Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.
-
公开(公告)号:US20170324990A1
公开(公告)日:2017-11-09
申请号:US15147040
申请日:2016-05-05
Applicant: Google Inc.
Inventor: Elad Eban , Aren Jansen , Sourish Chaudhuri
IPC: H04N21/439 , H04N5/60
CPC classification number: H04N21/4398 , G10L21/0208 , G10L25/57 , H04H60/12 , H04H60/58 , H04H60/65 , H04N5/602 , H04N5/911 , H04N9/802 , H04N21/233 , H04N21/4394 , H04N21/44016
Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying duration of the wind noise artifact and intensity of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified duration and intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.
-
公开(公告)号:US20170278525A1
公开(公告)日:2017-09-28
申请号:US15245152
申请日:2016-08-23
Applicant: Google Inc.
Inventor: Fangzhou Wang , Sourish Chaudhuri , Daniel Ellis , Nathan Reale
CPC classification number: G06F17/241 , G10L15/20 , G10L25/78 , G10L25/84 , G10L2021/065 , G10L2025/783
Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.
-
公开(公告)号:US10356469B2
公开(公告)日:2019-07-16
申请号:US15826622
申请日:2017-11-29
Applicant: Google Inc.
Inventor: Elad Eban , Aren Jansen , Sourish Chaudhuri
IPC: H04N5/60 , G10L25/57 , H04H60/12 , H04H60/58 , H04H60/65 , H04N21/44 , H04N5/911 , H04N9/802 , H04N21/233 , H04N21/439 , G10L21/0208
Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.
-
公开(公告)号:US09838737B2
公开(公告)日:2017-12-05
申请号:US15147040
申请日:2016-05-05
Applicant: Google Inc.
Inventor: Elad Eban , Aren Jansen , Sourish Chaudhuri
IPC: H04N5/60 , H04N21/439
CPC classification number: H04N21/4398 , G10L21/0208 , G10L25/57 , H04H60/12 , H04H60/58 , H04H60/65 , H04N5/602 , H04N5/911 , H04N9/802 , H04N21/233 , H04N21/4394 , H04N21/44016
Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying duration of the wind noise artifact and intensity of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified duration and intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.
-
公开(公告)号:US20170316792A1
公开(公告)日:2017-11-02
申请号:US15225513
申请日:2016-08-01
Applicant: Google Inc.
Inventor: Sourish Chaudhuri , Nebojsa Ciric , Khiem Pham
IPC: G10L25/93 , G10L19/022 , G10L21/055 , G10L25/27
CPC classification number: G10L25/27 , G10L25/48 , G10L25/87 , G11B27/031
Abstract: A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.
-
-
-
-
-
-
-