-
公开(公告)号:US20230412760A1
公开(公告)日:2023-12-21
申请号:US17841564
申请日:2022-06-15
申请人: Netflix, Inc.
发明人: Yadong Wang , Shilpa Jois Rao
IPC分类号: H04N5/93 , G10L15/00 , G10L15/04 , G10L15/26 , G10L25/57 , G10L25/81 , G10L15/22 , H04N5/278
CPC分类号: H04N5/9305 , G10L15/005 , G10L15/04 , G10L15/26 , G10L25/57 , G10L25/81 , G10L15/22 , H04N5/278
摘要: The disclosed computer-implemented method may include systems and methods for automatically generating sound event subtitles for digital videos. For example, the systems and methods described herein can automatically generate subtitles for sound events within a digital video soundtrack that includes sounds other than speech. Additionally, the systems and methods described herein can automatically generate sound event subtitles as part of an automatic and comprehensive approach that generates subtitles for all sounds within a soundtrack of a digital video—thereby avoiding the need for any manual inputs as part of the subtitling process.
-
2.
公开(公告)号:US20230306943A1
公开(公告)日:2023-09-28
申请号:US18249913
申请日:2020-10-22
发明人: Jianwen ZHENG , Shao-Fu SHIH , Kai LI , Cheng CHI
IPC分类号: G10H1/36 , G10L21/028 , G10L25/81 , G10L21/06 , G10L25/30
CPC分类号: G10H1/366 , G10L21/028 , G10L25/81 , G10L21/06 , G10L25/30
摘要: A vocal removal method and a system thereof are provided. In the vocal removal method, a voice separation model is generated and trained to process a real-time input music to separate the voice and the accompaniment. The vocal removal method further comprises the steps of feature extraction and reconstruction to obtain the voice minimized music.
-
3.
公开(公告)号:US11756576B2
公开(公告)日:2023-09-12
申请号:US17692640
申请日:2022-03-11
发明人: Zhe Wang
摘要: An audio signal classification method includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, and updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.
-
公开(公告)号:US20230196809A1
公开(公告)日:2023-06-22
申请号:US18172755
申请日:2023-02-22
申请人: Roku, Inc.
发明人: Jose Pio PEREIRA , Sunil Suresh KULKARNI , Mihailo M. STOJANCIC , Shashank MERCHANT , Peter WENDT
IPC分类号: G06V30/18 , G06T7/246 , G06T7/215 , G06F16/00 , G06T7/254 , G06F16/45 , G06F16/48 , G06V20/40 , G06F18/22 , G06V20/62 , G10L15/02 , G10L15/06 , G10L15/10 , G10L15/14 , G10L15/20 , G10L21/0232 , G10L25/81
CPC分类号: G06V30/18086 , G06T7/248 , G06T7/215 , G06F16/00 , G06T7/254 , G06F16/45 , G06F16/48 , G06V20/41 , G06F18/22 , G06V20/46 , G06V20/49 , G06V20/635 , G10L15/02 , G10L15/063 , G10L15/10 , G10L15/142 , G10L15/20 , G10L21/0232 , G10L25/81 , G06T2207/20004 , G06T2207/10016 , G06T2207/20224 , G06F16/906
摘要: Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.
-
公开(公告)号:US11574633B1
公开(公告)日:2023-02-07
申请号:US16586457
申请日:2019-09-27
发明人: Sandra Lemon , Nancy Yi Liang
IPC分类号: G10L15/22 , G06F3/04817 , G06F3/0482 , G06F3/0488 , G10L25/81 , G10L25/90 , H04N7/18 , G10L15/26 , G06V40/20 , G06F40/47 , G06F40/40 , G06F40/58
摘要: Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.
-
公开(公告)号:US11532319B2
公开(公告)日:2022-12-20
申请号:US16820533
申请日:2020-03-16
发明人: Stefan Marti , Joseph Verbeke
摘要: One or more embodiments include an emotion analysis system for computing and analyzing emotional state of a user. The emotion analysis system acquires, via at least one sensor, sensor data associated with a user. The emotion analysis system determines, based on the sensor data, an emotional state associated with a user. The emotion analysis system determines a first component of the emotional state that corresponds to media content being accessed by the user. The emotion analysis system applies a first function to the emotional state to remove the first component from the emotional state.
-
公开(公告)号:US11514923B2
公开(公告)日:2022-11-29
申请号:US17494655
申请日:2021-10-05
发明人: Hequn Bai
IPC分类号: G10L21/0272 , G10H1/00 , G10L25/81
摘要: Provided are a method and device for processing a music file, a terminal and a storage medium. The method comprises: in response to a received sound effect adjustment instruction, acquiring a music file, the adjustment of which is indicated by the sound effect adjustment instruction; carrying out vocals and accompaniment separation on the music file to obtain vocal data and accompaniment data in the music file; carrying out first sound effect processing on the vocal data to obtain target vocal data, and carrying out second sound effect processing on the accompaniment data to obtain target accompaniment data; and synthesizing the target vocal data and the target accompaniment data to obtain a target music file.
-
公开(公告)号:US20220199074A1
公开(公告)日:2022-06-23
申请号:US17604379
申请日:2020-04-13
摘要: The present application relates to a method of extracting audio features in a dialog detector in response to an input audio signal, the method comprising dividing the input audio signal into a plurality of frames, extracting frame audio features from each frame, determining a set of context windows, each context window including a number of frames surrounding a current frame, deriving, for each context window, a relevant context audio feature for the current frame based on the frame audio features of the frames in each respective context, and concatenating each context audio feature to form a combined feature vector to represent the current frame. The context windows with the different length can improve the response speed and improve robustness.
-
公开(公告)号:US11328741B2
公开(公告)日:2022-05-10
申请号:US16962987
申请日:2018-01-18
申请人: ASK INDUSTRIES GMBH
发明人: Daniel Kotulla
摘要: Method for outputting an audio signal reproducing at least part of a piece of music containing part of at least one main voice, in particular a singing voice, into an interior forming part of a passenger compartment of a motor vehicle via an audio output device having a left and a right audio output channel. The method includes providing an audio signal reproducing at least part of a piece of music containing at least one main voice, extracting an audio signal component, containing the at least one main voice, of the audio signal from the audio signal, attenuating the audio signal component containing the at least one main voice, and outputting the audio signal via the left and right audio output channels, of the audio output device, wherein the audio signal component containing the at least one main voice is output in attenuated fashion.
-
公开(公告)号:US20210256994A1
公开(公告)日:2021-08-19
申请号:US17135119
申请日:2020-12-28
申请人: Spotify AB
摘要: A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.
-
-
-
-
-
-
-
-
-