-
公开(公告)号:US10037313B2
公开(公告)日:2018-07-31
申请号:US15245152
申请日:2016-08-23
Applicant: Google Inc.
Inventor: Fangzhou Wang , Sourish Chaudhuri , Daniel Ellis , Nathan Reale
CPC classification number: G06F17/241 , G10L15/20 , G10L25/78 , G10L25/84 , G10L2021/065 , G10L2025/783
Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.
-
公开(公告)号:US20170278525A1
公开(公告)日:2017-09-28
申请号:US15245152
申请日:2016-08-23
Applicant: Google Inc.
Inventor: Fangzhou Wang , Sourish Chaudhuri , Daniel Ellis , Nathan Reale
CPC classification number: G06F17/241 , G10L15/20 , G10L25/78 , G10L25/84 , G10L2021/065 , G10L2025/783
Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.
-