-
公开(公告)号:US20240321273A1
公开(公告)日:2024-09-26
申请号:US18575327
申请日:2021-07-02
发明人: Hiroshi SATO , Tatsuya KAKO
摘要: A signal processing device includes circuitry configured to receive, together with a voice recognition result of an utterance section of an utterance input to each of a plurality of microphones, an input of time information of a start time and an end time of each utterance and information regarding an appearance time of each word in the voice recognition result; detect whether there is an overlap in time of utterance sections in each pair of voice recognition results by combining voice recognition results of two utterances from utterance sections of utterances input to each of the microphones; calculate a similarity of voice recognition result for each pair having an overlap in time of utterance sections; compare the similarity with a predetermined threshold; and reject an utterance having a shorter length of the voice recognition result as a wraparound utterance for a pair in which the similarity exceeds the threshold.
-
公开(公告)号:US12087297B2
公开(公告)日:2024-09-10
申请号:US17930822
申请日:2022-09-09
申请人: Google LLC
发明人: Matthew Sharifi , Victor Carbune
IPC分类号: G10L15/00 , G10L15/02 , G10L15/22 , G10L21/0208 , G10L21/0272 , G10L25/78 , G10L25/87
CPC分类号: G10L15/22 , G10L15/02 , G10L21/0208 , G10L21/0272 , G10L25/78 , G10L25/87
摘要: A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.
-
公开(公告)号:US12033628B2
公开(公告)日:2024-07-09
申请号:US17545257
申请日:2021-12-08
发明人: Hoseon Shin , Chulmin Lee , Youngwoo Lee
CPC分类号: G10L15/22 , G10L25/87 , H04R3/00 , H04R2420/07
摘要: A wireless audio device is provided. The wireless audio device includes an audio receiving circuit, an audio output circuit, an acceleration sensor, a communication circuit, a processor, and a memory. The memory may store instructions that, when executed by the processor, cause the wireless audio device to detect an utterance of a user of the wireless audio device by using the acceleration sensor, enter a dialog mode in which at least some of ambient sounds received by the audio receiving circuit are output through the audio output circuit, in response to detecting the utterance of the user, and end the dialog mode if no voice is detected for a specified time or longer by using the audio receiving circuit in the dialog mode.
-
公开(公告)号:US12001808B2
公开(公告)日:2024-06-04
申请号:US17435995
申请日:2021-08-23
发明人: Yoonjung Choi , Sangha Kim , Hakjung Kim , Yoonjin Yoon , Seokchan Ahn
IPC分类号: G06F40/58 , G06F40/44 , G10L15/183 , G10L21/04 , G10L25/87
CPC分类号: G06F40/58 , G10L21/04 , G10L25/87 , G06F40/44 , G10L15/183
摘要: A method is provided. The method includes receiving a speech input in a first language from a first device; obtaining, by using an artificial intelligence (AI) model, an estimated interpretation time that indicates a time expected to be required to interpret the speech input in the first language into a second language; transmitting, based on the estimated interpretation time, interpretation situation information to at least one of the first device or a second device; interpreting the speech input in the first language into the second language; and transmitting, to the second device a result of the interpreting of the speech input into the second language.
-
公开(公告)号:US20240087593A1
公开(公告)日:2024-03-14
申请号:US17942289
申请日:2022-09-12
发明人: Dushyant Sharma , Uwe Helmut JOST , Patrick Aubrey NAYLOR , Ljubomir MILANOVIC , William Francis GANONG, III
CPC分类号: G10L25/51 , G10L25/87 , H04S7/302 , H04S2420/01
摘要: A method, computer program product, and computing system for determining a plurality of transfer functions for a plurality of corresponding segments from a reference recording and a suspect recording. A delta transfer function between the plurality of transfer functions of a pair of corresponding segments of the plurality of corresponding segments is determined. A recording comparison confidence score is generated for the pair of corresponding segments based upon, at least in part, the delta transfer function. The suspect recording is verified based upon, at least in part, the plurality of recording comparison confidence scores.
-
公开(公告)号:US11900962B2
公开(公告)日:2024-02-13
申请号:US17876017
申请日:2022-07-28
发明人: Martin Sehlstedt
IPC分类号: G10L25/78 , G10L25/87 , G10L19/00 , G10L21/02 , G10L19/012
CPC分类号: G10L25/87 , G10L19/00 , G10L19/012 , G10L21/02 , G10L25/78
摘要: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus for voice activity detection (VAD). The VAD comprises creating a signal indicative of a primary VAD decision and determining hangover addition. The determination on hangover addition is made in dependence of a short term activity measure and/or a long term activity measure. A signal indicative of a final VAD decision is then created.
-
公开(公告)号:US11875819B2
公开(公告)日:2024-01-16
申请号:US17447628
申请日:2021-09-14
发明人: Ravi Kappagantu
CPC分类号: G10L25/87 , G06F21/6245 , G06Q20/4018 , G10L25/51
摘要: A method for redacting sensitive information from an audio stream, such as a voice signal in a telephone call, in real time is provided. The method includes: receiving an audio stream; conveying the audio stream through a channel that includes a valve; detecting, from within the audio stream, a first event that indicates an onset of sensitive information; closing the valve so that the conveying of the audio stream through the channel is temporarily stopped; detecting, from within the audio stream, a second event that indicates an ending of the sensitive information; and reopening the valve so that the conveying of the audio stream through the channel is resumed. The sensitive information may include payment card industry (PCI) information, such as a card number and/or a card verification value (CVV).
-
8.
公开(公告)号:US20240007512A1
公开(公告)日:2024-01-04
申请号:US17813340
申请日:2022-07-19
申请人: CITRIX SYSTEMS, INC.
发明人: Zongpeng QIAO , Dan HU , Ke XU , Jia YIN
IPC分类号: H04L65/4038 , H04L65/1069 , H04L43/0852 , G08B5/36 , G10L25/87
CPC分类号: H04L65/4038 , H04L65/1069 , G10L25/87 , G08B5/36 , H04L43/0852
摘要: A computing system includes first and second client computing devices accessing a communications network to establish a communications session. The first client computing device operates an audio analysis agent to determine network latency within the communications session based on communications with an audio analysis agent in the second client computing device. In response to the network latency exceeding a latency threshold, audio input from a user of the first client computing device is analyzed to determine a speaking status of the user. The audio analysis agent generates an indicator command message for the second client computing device based on the determined speaking status of the user. The second client computing device displays an indicator based on the indicator command message indicating when a user of the second client computing device can speak to avoid speech confliction with the user of said first client computing device.
-
公开(公告)号:US11848027B2
公开(公告)日:2023-12-19
申请号:US17306004
申请日:2021-05-03
申请人: SAP SE
CPC分类号: G10L25/30 , G06N3/04 , G06N3/08 , G10L19/0212 , G10L25/87
摘要: In some example embodiments, there may be provided a method that includes receiving a machine learning model provided by a central server configured to provide federated learning; receiving first audio data obtained from at least one audio sensor monitoring at least one machine located at the first edge node; training, based on the first audio data, the machine learning model; providing parameter information to the central server in order to enable the federated learning among a plurality of edge nodes; receiving an aggregate machine learning model provided by the central server; detecting an anomalous state of the at least one machine. Related systems, methods, and articles of manufacture are also described.
-
公开(公告)号:US20230395095A1
公开(公告)日:2023-12-07
申请号:US18182811
申请日:2023-03-13
摘要: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.
-
-
-
-
-
-
-
-
-