-
公开(公告)号:US20230412760A1
公开(公告)日:2023-12-21
申请号:US17841564
申请日:2022-06-15
申请人: Netflix, Inc.
发明人: Yadong Wang , Shilpa Jois Rao
IPC分类号: H04N5/93 , G10L15/00 , G10L15/04 , G10L15/26 , G10L25/57 , G10L25/81 , G10L15/22 , H04N5/278
CPC分类号: H04N5/9305 , G10L15/005 , G10L15/04 , G10L15/26 , G10L25/57 , G10L25/81 , G10L15/22 , H04N5/278
摘要: The disclosed computer-implemented method may include systems and methods for automatically generating sound event subtitles for digital videos. For example, the systems and methods described herein can automatically generate subtitles for sound events within a digital video soundtrack that includes sounds other than speech. Additionally, the systems and methods described herein can automatically generate sound event subtitles as part of an automatic and comprehensive approach that generates subtitles for all sounds within a soundtrack of a digital video—thereby avoiding the need for any manual inputs as part of the subtitling process.
-
2.
公开(公告)号:US20230306943A1
公开(公告)日:2023-09-28
申请号:US18249913
申请日:2020-10-22
发明人: Jianwen ZHENG , Shao-Fu SHIH , Kai LI , Cheng CHI
IPC分类号: G10H1/36 , G10L21/028 , G10L25/81 , G10L21/06 , G10L25/30
CPC分类号: G10H1/366 , G10L21/028 , G10L25/81 , G10L21/06 , G10L25/30
摘要: A vocal removal method and a system thereof are provided. In the vocal removal method, a voice separation model is generated and trained to process a real-time input music to separate the voice and the accompaniment. The vocal removal method further comprises the steps of feature extraction and reconstruction to obtain the voice minimized music.
-
3.
公开(公告)号:US11756576B2
公开(公告)日:2023-09-12
申请号:US17692640
申请日:2022-03-11
发明人: Zhe Wang
摘要: An audio signal classification method includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, and updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.
-
公开(公告)号:US20230196809A1
公开(公告)日:2023-06-22
申请号:US18172755
申请日:2023-02-22
申请人: Roku, Inc.
发明人: Jose Pio PEREIRA , Sunil Suresh KULKARNI , Mihailo M. STOJANCIC , Shashank MERCHANT , Peter WENDT
IPC分类号: G06V30/18 , G06T7/246 , G06T7/215 , G06F16/00 , G06T7/254 , G06F16/45 , G06F16/48 , G06V20/40 , G06F18/22 , G06V20/62 , G10L15/02 , G10L15/06 , G10L15/10 , G10L15/14 , G10L15/20 , G10L21/0232 , G10L25/81
CPC分类号: G06V30/18086 , G06T7/248 , G06T7/215 , G06F16/00 , G06T7/254 , G06F16/45 , G06F16/48 , G06V20/41 , G06F18/22 , G06V20/46 , G06V20/49 , G06V20/635 , G10L15/02 , G10L15/063 , G10L15/10 , G10L15/142 , G10L15/20 , G10L21/0232 , G10L25/81 , G06T2207/20004 , G06T2207/10016 , G06T2207/20224 , G06F16/906
摘要: Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.
-
公开(公告)号:US20180025732A1
公开(公告)日:2018-01-25
申请号:US15215259
申请日:2016-07-20
申请人: NXP B.V.
摘要: The disclosure relates to an audio classifier comprising: a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal; and a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity.
-
公开(公告)号:US09842605B2
公开(公告)日:2017-12-12
申请号:US14779322
申请日:2014-03-25
发明人: Lie Lu , Alan J. Seefeldt , Jun Wang
摘要: Apparatus and methods for audio classifying and processing are disclosed. In one embodiment, an audio processing apparatus includes an audio classifier for classifying an audio signal into at least one audio type in real time; an audio improving device for improving experience of audience; and an adjusting unit for adjusting at least one parameter of the audio improving device in a continuous manner based on the confidence value of the at least one audio type.
-
公开(公告)号:US20170352349A1
公开(公告)日:2017-12-07
申请号:US15536827
申请日:2015-12-24
发明人: Sacha VRAZIC
IPC分类号: G10L15/20 , H04R3/00 , G10L15/28 , G10L25/84 , G10L25/81 , G10L21/0232 , H04R1/40 , G10L21/0216
CPC分类号: G10L15/20 , G10L15/00 , G10L15/28 , G10L21/0208 , G10L21/0232 , G10L25/81 , G10L25/84 , G10L2021/02085 , G10L2021/02087 , G10L2021/02166 , H04R1/406 , H04R3/005 , H04R2499/13
摘要: A voice processing device includes plural microphones 22 disposed in a vehicle, a voice source direction determination portion 16 determining a direction of a voice source by handling a sound reception signal as a spherical wave in a case where the voice source serving as a source of a voice included in the sound reception signal obtained by each of the plural microphones is disposed at a near field, the voice source direction determination portion determining the direction of the voice source by handling the sound reception signal as a plane wave in a case where the voice source is disposed at the far field, and a beamforming processing portion 12 performing beamforming so as to suppress a sound arriving from a direction range other than a direction range including the direction of the voice source.
-
公开(公告)号:US20170316791A1
公开(公告)日:2017-11-02
申请号:US15603922
申请日:2017-05-24
申请人: Yobe, Inc
IPC分类号: G10L21/0364 , G10L21/0208 , G10L25/81 , H03G5/00
CPC分类号: G10L21/0364 , G06F21/32 , G10L17/02 , G10L21/0208 , G10L25/81 , H03G3/32 , H03G5/005 , H03G5/22 , H04R1/1008 , H04R1/1041 , H04R5/033
摘要: Systems and methods for isolating audio content and biometric authentication include receiving, with an audio receiver, an audio signal spanning a plurality of frequency bands, identifying a speech signal carried by a voice frequency band selected from the plurality of frequency bands, enhancing the speech signal relative to other audio content within the audio signal, and extracting a voice profile key that uniquely identifies the speech signal.
-
公开(公告)号:US20170229114A1
公开(公告)日:2017-08-10
申请号:US15491468
申请日:2017-04-19
申请人: Sony Corporation
发明人: Tetsuo Ikeda , Ken Miyashita , Tatsushi Nashida
IPC分类号: G10L13/08 , G10L21/055 , G10L13/04
CPC分类号: G10L13/08 , G10L13/043 , G10L21/02 , G10L21/055 , G10L25/81
摘要: There is provided a speech processing apparatus including: a data obtaining unit which obtains music progression data defining a property of one or more time points or one or more time periods along progression of music; a determining unit which determines an output time point at which a speech is to be output during reproducing the music by utilizing the music progression data obtained by the data obtaining unit; and an audio output unit which outputs the speech at the output time point determined by the determining unit during reproducing the music.
-
10.
公开(公告)号:US09672841B2
公开(公告)日:2017-06-06
申请号:US14754714
申请日:2015-06-30
申请人: ZTE Corporation
发明人: Dongping Jiang , Hao Yuan , Changbao Zhu
IPC分类号: G10L15/00 , G10L21/00 , G10L21/02 , G10L25/84 , G10L25/81 , G10L25/18 , G10L15/02 , G10L25/21 , G10L25/48 , G10L21/0224 , G10L21/0232
CPC分类号: G10L21/0205 , G10L15/02 , G10L21/0224 , G10L21/0232 , G10L25/18 , G10L25/21 , G10L25/48 , G10L25/78 , G10L25/81 , G10L25/84
摘要: The present document relates to a voice activity detection (VAD) method and methods used for voice activity detection and apparatus thereof, the VAD method includes: obtaining sub-band signals and spectrum amplitudes of a current frame; computing values of a energy feature and a spectral centroid feature of the current frame according to the sub-band signals; computing a signal to noise ratio parameter of the current frame according to a background noise energy estimated from a previous frame, an energy of SNR sub-bands and a energy feature of the current frame; computing a VAD decision result according to a tonality signal flag, a signal to noise ratio parameter, a spectral centroid feature, and a frame energy feature. The methods and apparatus of the present document can improve the accuracy of non-stationary noise (such as office noise) and music detection.
-
-
-
-
-
-
-
-
-