专利检索 ipc:"G10L25/81" 第 1 页

1.

发明公开
SYSTEMS AND METHODS FOR AUTOMATICALLY GENERATING SOUND EVENT SUBTITLES 审中-公开

公开(公告)号：US20230412760A1

公开(公告)日：2023-12-21

申请号：US17841564

申请日：2022-06-15

申请人： Netflix, Inc.

发明人： Yadong Wang , Shilpa Jois Rao

IPC分类号： H04N5/93 , G10L15/00 , G10L15/04 , G10L15/26 , G10L25/57 , G10L25/81 , G10L15/22 , H04N5/278

CPC分类号： H04N5/9305 , G10L15/005 , G10L15/04 , G10L15/26 , G10L25/57 , G10L25/81 , G10L15/22 , H04N5/278

摘要： The disclosed computer-implemented method may include systems and methods for automatically generating sound event subtitles for digital videos. For example, the systems and methods described herein can automatically generate subtitles for sound events within a digital video soundtrack that includes sounds other than speech. Additionally, the systems and methods described herein can automatically generate sound event subtitles as part of an automatic and comprehensive approach that generates subtitles for all sounds within a soundtrack of a digital video—thereby avoiding the need for any manual inputs as part of the subtitling process.

2.

发明公开
VOCAL TRACK REMOVAL BY CONVOLUTIONAL NEURAL NETWORK EMBEDDED VOICE FINGER PRINTING ON STANDARD ARM EMBEDDED PLATFORM 审中-公开

公开(公告)号：US20230306943A1

公开(公告)日：2023-09-28

申请号：US18249913

申请日：2020-10-22

申请人： HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED

发明人： Jianwen ZHENG , Shao-Fu SHIH , Kai LI , Cheng CHI

IPC分类号： G10H1/36 , G10L21/028 , G10L25/81 , G10L21/06 , G10L25/30

CPC分类号： G10H1/366 , G10L21/028 , G10L25/81 , G10L21/06 , G10L25/30

摘要： A vocal removal method and a system thereof are provided. In the vocal removal method, a voice separation model is generated and trained to process a real-time input music to separate the voice and the accompaniment. The vocal removal method further comprises the steps of feature extraction and reconstruction to obtain the voice minimized music.

3.

发明授权
Classification of audio signal as speech or music based on energy fluctuation of frequency spectrum 有权

公开(公告)号：US11756576B2

公开(公告)日：2023-09-12

申请号：US17692640

申请日：2022-03-11

申请人： Huawei Technologies Co., Ltd.

发明人： Zhe Wang

IPC分类号： G10L25/81 , G10L25/78 , G10L25/18 , G10L19/06 , G10L19/12

CPC分类号： G10L25/81 , G10L19/06 , G10L19/12 , G10L25/18 , G10L25/78 , G10L2025/783

摘要： An audio signal classification method includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, and updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.

4.

发明公开
ROBUST AUDIO IDENTIFICATION WITH INTERFERENCE CANCELLATION 审中-公开

公开(公告)号：US20230196809A1

公开(公告)日：2023-06-22

申请号：US18172755

申请日：2023-02-22

申请人： Roku, Inc.

发明人： Jose Pio PEREIRA , Sunil Suresh KULKARNI , Mihailo M. STOJANCIC , Shashank MERCHANT , Peter WENDT

IPC分类号： G06V30/18 , G06T7/246 , G06T7/215 , G06F16/00 , G06T7/254 , G06F16/45 , G06F16/48 , G06V20/40 , G06F18/22 , G06V20/62 , G10L15/02 , G10L15/06 , G10L15/10 , G10L15/14 , G10L15/20 , G10L21/0232 , G10L25/81

CPC分类号： G06V30/18086 , G06T7/248 , G06T7/215 , G06F16/00 , G06T7/254 , G06F16/45 , G06F16/48 , G06V20/41 , G06F18/22 , G06V20/46 , G06V20/49 , G06V20/635 , G10L15/02 , G10L15/063 , G10L15/10 , G10L15/142 , G10L15/20 , G10L21/0232 , G10L25/81 , G06T2207/20004 , G06T2207/10016 , G06T2207/20224 , G06F16/906

摘要： Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.

5.

发明授权
Enhanced graphical user interface for voice communications 有权

公开(公告)号：US11574633B1

公开(公告)日：2023-02-07

申请号：US16586457

申请日：2019-09-27

申请人： Amazon Technologies, Inc.

发明人： Sandra Lemon , Nancy Yi Liang

IPC分类号： G10L15/22 , G06F3/04817 , G06F3/0482 , G06F3/0488 , G10L25/81 , G10L25/90 , H04N7/18 , G10L15/26 , G06V40/20 , G06F40/47 , G06F40/40 , G06F40/58

摘要： Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.

6.

发明授权
Techniques for separating driving emotion from media induced emotion using an additive/subtractive, conjunctive, disjunctive, or Bayesian technique in a driver monitoring system 有权

公开(公告)号：US11532319B2

公开(公告)日：2022-12-20

申请号：US16820533

申请日：2020-03-16

申请人： HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED

发明人： Stefan Marti , Joseph Verbeke

IPC分类号： G10L25/63 , G06F3/01 , G06N3/04 , H04N21/454 , G10L25/81 , G06V20/59

摘要： One or more embodiments include an emotion analysis system for computing and analyzing emotional state of a user. The emotion analysis system acquires, via at least one sensor, sensor data associated with a user. The emotion analysis system determines, based on the sensor data, an emotional state associated with a user. The emotion analysis system determines a first component of the emotional state that corresponds to media content being accessed by the user. The emotion analysis system applies a first function to the emotional state to remove the first component from the emotional state.

7.

发明授权
Method and device for processing music file, terminal and storage medium 有权

公开(公告)号：US11514923B2

公开(公告)日：2022-11-29

申请号：US17494655

申请日：2021-10-05

申请人： BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.

发明人： Hequn Bai

IPC分类号： G10L21/0272 , G10H1/00 , G10L25/81

摘要： Provided are a method and device for processing a music file, a terminal and a storage medium. The method comprises: in response to a received sound effect adjustment instruction, acquiring a music file, the adjustment of which is indicated by the sound effect adjustment instruction; carrying out vocals and accompaniment separation on the music file to obtain vocal data and accompaniment data in the music file; carrying out first sound effect processing on the vocal data to obtain target vocal data, and carrying out second sound effect processing on the accompaniment data to obtain target accompaniment data; and synthesizing the target vocal data and the target accompaniment data to obtain a target music file.

8.

发明申请
A DIALOG DETECTOR 有权

公开(公告)号：US20220199074A1

公开(公告)日：2022-06-23

申请号：US17604379

申请日：2020-04-13

申请人： Dolby Laboratories Licensing Corporation

发明人： Lie LU , Xin LIU

IPC分类号： G10L15/197 , G10L15/04 , G10L15/02 , G10L15/22 , G10L25/18 , H04S1/00 , G10L25/81

摘要： The present application relates to a method of extracting audio features in a dialog detector in response to an input audio signal, the method comprising dividing the input audio signal into a plurality of frames, extracting frame audio features from each frame, determining a set of context windows, each context window including a number of frames surrounding a current frame, deriving, for each context window, a relevant context audio feature for the current frame based on the frame audio features of the frames in each respective context, and concatenating each context audio feature to form a combined feature vector to represent the current frame. The context windows with the different length can improve the response speed and improve robustness.

9.

发明授权
Method for outputting an audio signal reproducing a piece of music into an interior via an output device 有权

公开(公告)号：US11328741B2

公开(公告)日：2022-05-10

申请号：US16962987

申请日：2018-01-18

申请人： ASK INDUSTRIES GMBH

发明人： Daniel Kotulla

IPC分类号： G10L25/81 , G10L25/48 , H04R5/04

摘要： Method for outputting an audio signal reproducing at least part of a piece of music containing part of at least one main voice, in particular a singing voice, into an interior forming part of a passenger compartment of a motor vehicle via an audio output device having a left and a right audio output channel. The method includes providing an audio signal reproducing at least part of a piece of music containing at least one main voice, extracting an audio signal component, containing the at least one main voice, of the audio signal from the audio signal, attenuating the audio signal component containing the at least one main voice, and outputting the audio signal via the left and right audio output channels, of the audio output device, wherein the audio signal component containing the at least one main voice is output in attenuated fashion.

10.

发明申请
SINGING VOICE SEPARATION WITH DEEP U-NET CONVOLUTIONAL NETWORKS 有权

公开(公告)号：US20210256994A1

公开(公告)日：2021-08-19

申请号：US17135119

申请日：2020-12-28

申请人： Spotify AB

发明人： Andreas Simon Thore JANSSON , Angus William Sackfield , Ching Chuan Sung

IPC分类号： G10L25/81 , G06N3/08 , G06N5/04 , G10L15/16 , G10L21/10

摘要： A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类