-
公开(公告)号:US20220172735A1
公开(公告)日:2022-06-02
申请号:US17436050
申请日:2019-03-07
发明人: Xiangru BI , Qingshan ZHANG
IPC分类号: G10L21/0272 , G10L25/45 , G10L25/87 , G10L25/90
摘要: The present disclosure is directed to a speech separation method and system using a sliding window. The method comprises: acquiring at least one speech from at least one user by at least one microphone and storing the at least one speech as a speech signal in a sound recording module; extracting the speech signal from the sound recording module and processing the extracted speech signal through a sliding window; and transmitting the processed speech signal to a Degenerate Unmixing Estimation Technique (DUET) module for speech separation.
-
公开(公告)号:US20220165250A1
公开(公告)日:2022-05-26
申请号:US17380426
申请日:2021-07-20
申请人: Xinapse Co., Ltd.
发明人: Jinbeom Kang , Dong Won Joo , Yongwook Nam , Seung Jae Lee
IPC分类号: G10L13/033 , G10L21/013 , G10L21/14 , G10L21/043 , G10L25/45
摘要: This application relates to a method of synthesizing a speech of which a speed and a pitch are changed. In one aspect, the method includes a spectrogram may be generated by performing a short-time Fourier transformation on a first speech signal based on a first hop length and a first window length, and speech signals of sections having a second window length at the interval of a second hop length from the spectrogram. A ratio between the first hop length and the second hop length may be set to be equal to the value of a playback rate and a ratio between the first window length and the second window length may be set to be equal to the value of a pitch change rate, thereby generating a second speech signal of which the speed and the pitch are changed.
-
公开(公告)号:US11295272B2
公开(公告)日:2022-04-05
申请号:US16271329
申请日:2019-02-08
发明人: Daniel Paulino Almendro Barreda , Dushyant Sharma , Joel Praveen Pinto , Uwe Helmut Jost , Patrick A. Naylor
IPC分类号: G06Q10/10 , G10L25/51 , G06F3/16 , G16H15/00 , G16H10/20 , G16H10/60 , G06F40/117 , G10L25/45 , G06T7/20 , H04R1/40 , H04R3/00 , G10L15/22 , G10L15/30 , G16H10/40 , G16H50/70 , G10L15/26 , G06F40/30
摘要: A method, computer program product, and computing system for obtaining encounter information of a patient encounter, wherein the encounter information includes machine vision encounter information; and processing the encounter information to generate an encounter transcript.
-
84.
公开(公告)号:US11270699B2
公开(公告)日:2022-03-08
申请号:US16732069
申请日:2019-12-31
IPC分类号: G10L15/22 , B60R16/037 , G01C21/36 , G06K9/00 , G10L15/06 , G10L17/04 , G10L17/06 , G10L15/25 , G10L21/0208 , G10L25/45 , G10L25/57 , H04L67/306 , H04L67/1097 , G10L25/63 , H04L67/12 , G10L15/30 , G10L15/00 , G10L15/02 , G10L25/84 , H04L67/10 , G06F3/01 , G10L25/90
摘要: Methods and systems for determining an emotion of a human driver of a vehicle and using the emotion for generating a vehicle response, is provided. One example method includes capturing, by a camera of the vehicle, a face of the human driver. The capturing is configured to capture a plurality of images over a period of time, and the plurality of images are analyzed to identify a facial expression and changes in the facial expression of the human driver over the period of time. The method further includes capturing, by a microphone of the vehicle, voice input of the human driver. The voice input is captured over the period of time. The voice input is analyzed to identify a voice profile and changes in the voice profile of the human driver over the period of time. The method processes, by a processor of the vehicle, a combination of the facial expression and the voice profile captured during the period of time to predict the emotion of the human driver. The method generates the vehicle response that is responsive to the emotion of the human driver. The vehicle response is configured to make at least one adjustment to a setting of the vehicle. The adjustment is selected based on the emotion of the human driver. The vehicle response can be used to make the driver more calm and/or assist in reducing distracted driving. The prediction of the emotion may be additionally increased by capturing and analyzing touch and/or gesture characteristic of the human driver when interfacing with a graphical user interface or surfaces of the vehicle or systems of the vehicle.
-
公开(公告)号:US11133023B1
公开(公告)日:2021-09-28
申请号:US17197539
申请日:2021-03-10
申请人: V5 Systems, Inc.
发明人: Will Hedgecock
摘要: This disclosure sets forth a system for detecting and determining the onset times of one or more impulsive acoustic events across multiple channels of audio. Audio is segmented into chunks of predefined length and then processed for detecting acoustic onsets, including cross-validating and refining the estimated acoustic onsets to the level of an audio sample. The output of the system is a list of channel-specific timestamped indices corresponding to the audio samples of the onsets of each impulsive acoustic event in the current segment of audio.
-
公开(公告)号:US20210056983A1
公开(公告)日:2021-02-25
申请号:US17074653
申请日:2020-10-20
申请人: Cordio Medical Ltd.
发明人: Ilan D. Shallom
摘要: Described embodiments include an apparatus that includes a network interface and a processor. The processor is configured to receive, via the network interface, a speech signal that represents speech uttered by a subject, the speech including one or more speech segments, divide the speech signal into multiple frames, such that one or more sequences of the frames represent the speech segments, respectively, compute respective estimated total volumes of air exhaled by the subject while the speech segments were uttered, by, for each of the sequences, computing respective estimated flow rates of air exhaled by the subject during the frames belonging to the sequence and, based on the estimated flow rates, computing a respective one of the estimated total volumes of air, and, in response to the estimated total volumes of air, generate an alert. Other embodiments are also described.
-
公开(公告)号:US10818305B2
公开(公告)日:2020-10-27
申请号:US15967119
申请日:2018-04-30
申请人: DTS, Inc.
发明人: Michael M. Goodwin , Antonius Kalker , Albert Chau
IPC分类号: G10L19/022 , G10L19/008 , G10L19/26 , G10L25/18 , G10L25/45 , G10L19/02 , G10L19/22
摘要: A method of encoding an audio signal is provided comprising: applying multiple different time-frequency transformations to an audio signal frame; computing measures of coding efficiency across multiple frequency bands for multiple time-frequency resolutions; selecting a combination of time-frequency resolutions to represent the frame at each of the multiple frequency bands based at least in part upon the computed measures of coding efficiency; determining a window size and a corresponding transform size; determining a modification transformation; windowing the frame using the determined window size; transforming the windowed frame using the determined transform size; modifying a time-frequency resolution within a frequency band of the transform of the windowed frame using the determined modification transformation.
-
公开(公告)号:US10809970B2
公开(公告)日:2020-10-20
申请号:US16271616
申请日:2019-02-08
发明人: Daniel Paulino Almendro Barreda , Dushyant Sharma , Joel Praveen Pinto , Uwe Helmut Jost , Patrick A. Naylor
IPC分类号: G06F3/16 , G16H10/60 , G16H15/00 , H04R3/00 , H04R1/40 , G10L25/51 , G06Q10/10 , G10L15/26 , G10L25/45 , G16H10/20 , G06T7/20 , G10L15/22 , G10L15/30 , G16H10/40 , G16H50/70 , G06F40/20 , G06F40/30 , G06F40/117
摘要: A method, computer program product, and computing system for obtaining encounter information of a patient encounter, wherein the encounter information includes machine vision encounter information; processing the machine vision encounter information to identify one or more humanoid shapes; and steering one or more audio recording beams toward the one or more humanoid shapes to capture audio encounter information.
-
公开(公告)号:US10755718B2
公开(公告)日:2020-08-25
申请号:US15835318
申请日:2017-12-07
摘要: A method for classifying speakers includes: receiving, by a speaker recognition system including a processor and memory, input audio including speech from a speaker; extracting, by the speaker recognition system, a plurality of speech frames containing voiced speech from the input audio; computing, by the speaker recognition system, a plurality of features for each of the speech frames of the input audio; computing, by the speaker recognition system, a plurality of recognition scores for the plurality of features; computing, by the speaker recognition system, a speaker classification result in accordance with the recognition scores; and outputting, by the speaker recognition system, the speaker classification result.
-
公开(公告)号:US10559314B2
公开(公告)日:2020-02-11
申请号:US16407307
申请日:2019-05-09
发明人: Stefan Bruhn , Jonas Svedberg
IPC分类号: G10L19/005 , G10L19/00 , G10L19/02 , G10L19/025 , G10L25/45
摘要: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus thereof for controlling a concealment method for a lost audio frame of a received audio signal. A method for a decoder of concealing a lost audio frame comprises detecting in a property of the previously received and reconstructed audio signal, or in a statistical property of observed frame losses, a condition for which the substitution of a lost frame provides relatively reduced quality. In case such a condition is detected, the concealment method is modified by selectively adjusting a phase or a spectrum magnitude of a substitution frame spectrum.
-
-
-
-
-
-
-
-
-