METHOD AND SYSTEM FOR SPEECH SEPARATION

    公开(公告)号:US20220172735A1

    公开(公告)日:2022-06-02

    申请号:US17436050

    申请日:2019-03-07

    摘要: The present disclosure is directed to a speech separation method and system using a sliding window. The method comprises: acquiring at least one speech from at least one user by at least one microphone and storing the at least one speech as a speech signal in a sound recording module; extracting the speech signal from the sound recording module and processing the extracted speech signal through a sliding window; and transmitting the processed speech signal to a Degenerate Unmixing Estimation Technique (DUET) module for speech separation.

    METHOD FOR CHANGING SPEED AND PITCH OF SPEECH AND SPEECH SYNTHESIS SYSTEM

    公开(公告)号:US20220165250A1

    公开(公告)日:2022-05-26

    申请号:US17380426

    申请日:2021-07-20

    申请人: Xinapse Co., Ltd.

    摘要: This application relates to a method of synthesizing a speech of which a speed and a pitch are changed. In one aspect, the method includes a spectrogram may be generated by performing a short-time Fourier transformation on a first speech signal based on a first hop length and a first window length, and speech signals of sections having a second window length at the interval of a second hop length from the spectrogram. A ratio between the first hop length and the second hop length may be set to be equal to the value of a playback rate and a ratio between the first window length and the second window length may be set to be equal to the value of a pitch change rate, thereby generating a second speech signal of which the speed and the pitch are changed.

    Methods and vehicles for capturing emotion of a human driver and customizing vehicle response

    公开(公告)号:US11270699B2

    公开(公告)日:2022-03-08

    申请号:US16732069

    申请日:2019-12-31

    摘要: Methods and systems for determining an emotion of a human driver of a vehicle and using the emotion for generating a vehicle response, is provided. One example method includes capturing, by a camera of the vehicle, a face of the human driver. The capturing is configured to capture a plurality of images over a period of time, and the plurality of images are analyzed to identify a facial expression and changes in the facial expression of the human driver over the period of time. The method further includes capturing, by a microphone of the vehicle, voice input of the human driver. The voice input is captured over the period of time. The voice input is analyzed to identify a voice profile and changes in the voice profile of the human driver over the period of time. The method processes, by a processor of the vehicle, a combination of the facial expression and the voice profile captured during the period of time to predict the emotion of the human driver. The method generates the vehicle response that is responsive to the emotion of the human driver. The vehicle response is configured to make at least one adjustment to a setting of the vehicle. The adjustment is selected based on the emotion of the human driver. The vehicle response can be used to make the driver more calm and/or assist in reducing distracted driving. The prediction of the emotion may be additionally increased by capturing and analyzing touch and/or gesture characteristic of the human driver when interfacing with a graphical user interface or surfaces of the vehicle or systems of the vehicle.

    Robust detection of impulsive acoustic event onsets in an audio stream

    公开(公告)号:US11133023B1

    公开(公告)日:2021-09-28

    申请号:US17197539

    申请日:2021-03-10

    申请人: V5 Systems, Inc.

    发明人: Will Hedgecock

    IPC分类号: G10L25/51 G10L25/18 G10L25/45

    摘要: This disclosure sets forth a system for detecting and determining the onset times of one or more impulsive acoustic events across multiple channels of audio. Audio is segmented into chunks of predefined length and then processed for detecting acoustic onsets, including cross-validating and refining the estimated acoustic onsets to the level of an audio sample. The output of the system is a list of channel-specific timestamped indices corresponding to the audio samples of the onsets of each impulsive acoustic event in the current segment of audio.

    Estimating Lung Volume by Speech Analysis

    公开(公告)号:US20210056983A1

    公开(公告)日:2021-02-25

    申请号:US17074653

    申请日:2020-10-20

    发明人: Ilan D. Shallom

    IPC分类号: G10L25/66 G06F17/18 G10L25/45

    摘要: Described embodiments include an apparatus that includes a network interface and a processor. The processor is configured to receive, via the network interface, a speech signal that represents speech uttered by a subject, the speech including one or more speech segments, divide the speech signal into multiple frames, such that one or more sequences of the frames represent the speech segments, respectively, compute respective estimated total volumes of air exhaled by the subject while the speech segments were uttered, by, for each of the sequences, computing respective estimated flow rates of air exhaled by the subject during the frames belonging to the sequence and, based on the estimated flow rates, computing a respective one of the estimated total volumes of air, and, in response to the estimated total volumes of air, generate an alert. Other embodiments are also described.

    Audio coder window sizes and time-frequency transformations

    公开(公告)号:US10818305B2

    公开(公告)日:2020-10-27

    申请号:US15967119

    申请日:2018-04-30

    申请人: DTS, Inc.

    摘要: A method of encoding an audio signal is provided comprising: applying multiple different time-frequency transformations to an audio signal frame; computing measures of coding efficiency across multiple frequency bands for multiple time-frequency resolutions; selecting a combination of time-frequency resolutions to represent the frame at each of the multiple frequency bands based at least in part upon the computed measures of coding efficiency; determining a window size and a corresponding transform size; determining a modification transformation; windowing the frame using the determined window size; transforming the windowed frame using the determined transform size; modifying a time-frequency resolution within a frequency band of the transform of the windowed frame using the determined modification transformation.

    Method and apparatus for controlling audio frame loss concealment

    公开(公告)号:US10559314B2

    公开(公告)日:2020-02-11

    申请号:US16407307

    申请日:2019-05-09

    摘要: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus thereof for controlling a concealment method for a lost audio frame of a received audio signal. A method for a decoder of concealing a lost audio frame comprises detecting in a property of the previously received and reconstructed audio signal, or in a statistical property of observed frame losses, a condition for which the substitution of a lost frame provides relatively reduced quality. In case such a condition is detected, the concealment method is modified by selectively adjusting a phase or a spectrum magnitude of a substitution frame spectrum.