专利检索 ipc:"G10L25/93" 第 1 页

1.

发明公开
SPEECH FEATURES-BASED SINGLE CHANNEL VOICE ACTIVITY DETECTION METHOD AND SYSTEM FOR REDUCING NOISE FROM AN AUDIO SIGNAL 审中-公开

公开(公告)号：US20240355351A1

公开(公告)日：2024-10-24

申请号：US18302079

申请日：2023-04-18

申请人： GM GLOBAL TECHNOLOGY OPERATIONS LLC

发明人： Moshe Tzur , Elior Hadad

IPC分类号： G10L25/84 , G10L15/02 , G10L15/04 , G10L21/0232 , G10L25/18 , G10L25/90 , G10L25/93

CPC分类号： G10L25/84 , G10L15/02 , G10L15/04 , G10L21/0232 , G10L25/18 , G10L25/90 , G10L25/93

摘要： The single-channel, Speech Features-Based Voice Activity Detection (SFVAD) system is a robust, low-latency system that generates per-frame speech and noise indications, along with calculating a pair of speech and noise time-frequency masks. The SFVAD system controls an adaptation mechanism for a Beam-Forming system control module and improves the speech quality and noise reduction capabilities of Automatic Speech Recognition applications, such as Virtual Assistance (VA) and Hands-Free (HF) calls, by robustly handling transient noises. The system extracts speech-like patterns from an input audio signal and it is invariant to the power-level of the input audio signal. Noise calculation is controlled by a pair of speech features-based detectors (voiced and unvoiced). A Cepstral-based pitch detector and a Centrum calculation method are used to prevent contamination of the calculated noise by speech content. The SFVAD system robustly handles instant changes of background noise level and has dramatically lower false detection rates.

2.

发明公开
A SYSTEM AND METHOD FOR DELIVERING DOMAIN OR USE-CASE SWITCH SUGGESTION FOR AN ONGOING CONVERSATION 审中-公开

公开(公告)号：US20240321267A1

公开(公告)日：2024-09-26

申请号：US18573542

申请日：2022-06-23

申请人： HISHAB INDIA PRIVATE LIMITED

发明人： Christopher VOIGT , Alam Aseequl KHAN , Kai Samuel David Erik KARREN , Michael SCHMITS , Mohammad Fayadan HOSSAIN

IPC分类号： G10L15/18 , G10L15/22 , G10L25/93

CPC分类号： G10L15/1815 , G10L15/22 , G10L25/93

摘要： A system and method are described for dialogue management by determining a transition from a domain or a use case to another and initiating a switch suggestion in a human-computer conversation for rendering services to a user based on the updated domain or use case for a dialogue. The method comprises determining a first use-case in the human-computer conversation and assigning a first use-case score to the said first use-case. The method then comprises determining a second use-case in the human-computer conversation and assigning a second use-case score to the said second use-case. Further, the method comprises determining whether to make the use-case switch suggestion to the said second use-case based on the said first use-case score and the said second use-case score. The invention aligns the system to extract intended information from/for the user thereby making the system more efficient, approachable and user-friendly by saving time and cost.

3.

发明授权
Sound source tracking system and method thereof 有权

公开(公告)号：US12046255B2

公开(公告)日：2024-07-23

申请号：US17572100

申请日：2022-01-10

申请人： AVER INFORMATION INC.

发明人： Fu-En Tsai , Feng Wen Hung , Chao-I Li

IPC分类号： G10L25/93 , G10L25/78 , H04N7/15 , H04N23/695

CPC分类号： G10L25/93 , G10L25/78 , H04N7/15 , H04N23/695 , G10L2025/783

摘要： A sound source tracking method adapted to an ongoing video conference comprising: obtaining a streaming signal of the video conference from an internet; performing a video conference procedure to obtain an audio signal from the streaming signal and send the audio signal to a speaker; performing an audio tracking procedure to obtain the audio signal outputted from the video conference procedure to the communication device and send the audio signal to a sound source tracking camera; playing the audio signal to generate a far-end sound; recording a field sound comprising at least one of the far-end sound and a local-end sound; and performing a comparing procedure to determine a shooting direction of the sound source tracking camera, wherein the shooting direction is adjusted so as not to shoot the speaker when a similarity of the far-end sound and the audio signal is greater than a threshold.

4.

发明公开
METHODS AND SYSTEMS FOR COMPUTER-GENERATED VISUALIZATION OF SPEECH 审中-公开

公开(公告)号：US20240087591A1

公开(公告)日：2024-03-14

申请号：US18451736

申请日：2023-08-17

申请人： SomniQ, Inc.

发明人： Rikko Sakaguchi , Hidenori Ishikawa

IPC分类号： G10L21/12 , G10L21/10 , G10L21/14 , G10L25/93

CPC分类号： G10L21/12 , G10L21/10 , G10L21/14 , G10L25/93

摘要： Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.

5.

发明授权
Voice recognition device, control method of voice recognition device, content reproducing device, and content transmission/reception system 有权

公开(公告)号：US11922953B2

公开(公告)日：2024-03-05

申请号：US17414194

申请日：2018-12-18

申请人： Nissan Motor Co., Ltd. , RENAULT S.A.S.

发明人： Hideo Omura

IPC分类号： G10L17/26 , G10L25/69 , G10L25/81 , G10L25/84 , G10L15/22 , G10L25/93

CPC分类号： G10L17/26 , G10L25/69 , G10L25/81 , G10L25/84 , G10L2015/223 , G10L25/93

摘要： A voice analyzer analyzes whether a voice signal input into a voice input unit includes a specific characteristic component. A voice recognizer recognizes a voice represented by the voice signal input into the voice input unit. A response instruction unit instructs a response to a response operation unit that operates in response to the voice recognized by the voice recognizer. A controller controls the voice recognizer not to execute voice recognition processing by the voice recognizer or controls the response instruction unit not to instruct the response operation unit about an instruction content by the voice recognized by the voice recognizer, when the voice analyzer analyzes that the voice signal includes the specific characteristic component.

6.

发明公开
PROVIDING TEXTUAL REPRESENTATIONS FOR A COMMUNICATION SESSION 审中-公开

公开(公告)号：US20230419967A1

公开(公告)日：2023-12-28

申请号：US17889110

申请日：2022-08-16

申请人： Apple Inc.

发明人： Erik D. HORNBERGER , James A. FORREST , Christopher M. GARRIDO , Patrick MIAUTON , Bradley F. PATTERSON , Karthick SANTHANAM , Luciano M. VERGER

IPC分类号： G10L15/26 , G10L25/78 , G10L25/93

CPC分类号： G10L15/26 , G10L25/93 , G10L25/78

摘要： Systems and processes for providing textual representations for a communication session are provided. For example, at least one audio input is received at an electronic device, wherein each audio input of the at least one audio input is associated with a respective priority level. A priority level of an audio input detected at a microphone of the electronic device is determined, wherein a highest priority level among the determined priority level and each received priority level corresponding to the at least one audio input is identified. A textual representation of a respective audio input corresponding to the identified highest priority level is obtained, wherein the obtained textual representation is displayed on a display of the electronic device.

7.

发明公开
PRE-WAKEWORD SPEECH PROCESSING 审中-公开

公开(公告)号：US20230386458A1

公开(公告)日：2023-11-30

申请号：US17804544

申请日：2022-05-27

申请人： SoundHound, Inc.

发明人： Karl STAHL , Bernard MONT-REYNAUD

IPC分类号： G10L15/22 , G10L15/08 , G10L25/93

CPC分类号： G10L15/22 , G10L15/08 , G10L25/93 , G10L2015/088

摘要： Methods and systems for pre-wakeword speech processing are disclosed. Speech audio, comprising command speech spoken before a wakeword, may be stored in a buffer in oldest to newest order. Upon detection of the wakeword, reverse acoustic models and language models, such as reverse automatic speech recognition (R-ASR) can be applied to the buffered audio, in newest to oldest order, starting from before the wakeword. The speech is converted into a sequence of words. Natural language grammar models, such as natural language understanding (NLU), can be applied to match the sequence of words to a complete command, the complete command being associated with invoking a computer operation.

8.

发明公开
SYSTEM AND METHOD FOR HANDLING UNSPLIT SEGMENTS IN TRANSCRIPTION OF AIR TRAFFIC COMMUNICATION (ATC) 审中-公开

公开(公告)号：US20230352042A1

公开(公告)日：2023-11-02

申请号：US17806565

申请日：2022-06-13

申请人： HONEYWELL INTERNATIONAL INC.

发明人： Jitender Kumar Agarwal

IPC分类号： G10L15/26 , G10L25/78 , G10L21/10 , G10L25/93

CPC分类号： G10L21/10 , G10L15/26 , G10L25/78 , G10L25/93

摘要： Systems and methods are provided for a transcription system with voice activity detection (VAD). The system includes a VAD module to receive incoming audio and generate an audio segment; and a speech decoder with a split predictor to perform, in a first pass, a decode operation to transcribe text from an audio segment into a message; wherein in the first pass, if the message is determined not to contain a split point based on a content-based analysis performed by the split predictor, the speech decoder forwards the message for display and if the message is determined based on the content-based analysis to contain the split point, the speech decoder performs in a second pass, a re-decode operation to transcribe text from the audio segment based on the split point wherein the split point is configured within an audio domain of the audio segment by the split predictor and forward the message for display.

9.

发明公开
APPARATUS, METHOD AND COMPUTER PROGRAM FOR ENCODING AN AUDIO SIGNAL OR FOR DECODING AN ENCODED AUDIO SCENE 审中-公开

公开(公告)号：US20230306975A1

公开(公告)日：2023-09-28

申请号：US18160894

申请日：2023-01-27

申请人： Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

发明人： Guillaume FUCHS , Archit TAMARAPU , Andrea EICHENSEER , Srikanth KORSE , Stefan DOEHLA , Markus MULTRUS

IPC分类号： G10L19/032 , G10L19/012 , G10L25/78 , G10L25/93

CPC分类号： G10L19/032 , G10L19/012 , G10L25/78 , G10L25/93

摘要： There are disclosed an apparatus for generating an encoded audio scene, and an apparatus for decoding and/or processing an encoded audio scene; as well as related methods and non-transitory storage units storing instructions which, when executed by a processor, cause the processor to perform a related method. An apparatus for processing an encoded audio scene may include, in a first frame, a first soundfield parameter representation and an encoded audio signal, wherein a second frame is an inactive frame, the apparatus including: an activity detector for detecting that the second frame is the inactive frame; a synthetic signal synthesizer for synthesizing a synthetic audio signal for the second frame using the parametric description for the second frame; an audio decoder for decoding the encoded audio signal for the first frame; and a spatial renderer for spatially rendering the audio signal for the first frame using the first soundfield parameter representation and using the synthetic audio signal for the second frame, or a transcoder for generating a meta data assisted output format including the audio signal for the first frame, the first soundfield parameter representation for the first frame, the synthetic audio signal for the second frame, and a second soundfield parameter representation for the second frame.

10.

发明授权
Providing translated subtitle for video content 有权

公开(公告)号：US11763099B1

公开(公告)日：2023-09-19

申请号：US18126341

申请日：2023-03-24

申请人： VoyagerX, Inc.

发明人： Hyeonsoo Oh , Sedong Nam

IPC分类号： G10L15/26 , G06F40/47 , G10L15/05 , G11B27/34 , G10L25/57 , G10L25/93 , G10L15/22

CPC分类号： G06F40/47 , G10L15/05 , G10L15/22 , G10L15/26 , G10L25/57 , G10L25/93 , G11B27/34

摘要： The present disclosure relates to systems and methods for providing subtitle for a video. The video's audio is transcribed to obtain caption text for the video. A first machine-trained model identifies sentences in the caption text. A second model identifies intra-sentence breaks with in the sentences identified using the first machine-trained model. Based on the identified sentences and intra-sentence breaks, one or more words in the caption text are grouped into a clip caption to be displayed for a corresponding clip of the video.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类