-
公开(公告)号:US20240331716A1
公开(公告)日:2024-10-03
申请号:US18611308
申请日:2024-03-20
IPC分类号: G10L21/0224 , G06F1/16 , G10L21/0232 , G10L21/0216
CPC分类号: G10L21/0224 , G06F1/163 , G10L21/0232 , G10L2021/02166
摘要: A device includes one or more processors configured to obtain audio data representing one or more audio signals. The audio data includes a first segment and a second segment subsequent to the first segment. The one or more processors are configured to perform one or more transform operations on the first segment to generate frequency-domain audio data. The one or more processors are configured to provide input data based on the frequency-domain audio data as input to one or more machine-learning models to generate a noise-suppression output. The one or more processors are configured to perform one or more reverse transform operations on the noise-suppression output to generate time-domain filter coefficients. The one or more processors are configured to perform time-domain filtering of the second segment using the time-domain filter coefficients to generate a noise-suppressed output signal.
-
2.
公开(公告)号:US20240331715A1
公开(公告)日:2024-10-03
申请号:US18457921
申请日:2023-08-29
发明人: Ching-Hua Lee , Chou-Chang Yang , Yilin Shen , Hongxia Jin
IPC分类号: G10L21/0224
CPC分类号: G10L21/0224 , G10L2021/02166
摘要: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.
-
公开(公告)号:US20240274147A1
公开(公告)日:2024-08-15
申请号:US18432572
申请日:2024-02-05
发明人: Deokjun EOM , Kyungrae KIM , Woohyun NAM , Jungkyu KIM
IPC分类号: G10L21/0272 , G10L21/0224 , G10L21/0232 , G10L21/14
CPC分类号: G10L21/0272 , G10L21/0224 , G10L21/0232 , G10L21/14
摘要: Provided is a method for separating one or more candidate audios in a sound source including the one or more candidate audios and a background sound, by using an audio separation system, the method including extracting a first audio feature from the sound source, extracting a background sound feature from the sound source, the background sound feature identifying a degree of association between the first audio feature and the background sound, generating a second audio feature based on the first audio feature, the background sound feature, and a background sound control parameter configured to control the background sound and generating one or more separated audios based on target information corresponding to the one or more candidate audios, the first audio feature, and the second audio feature in which the background sound is adjusted.
-
4.
公开(公告)号:US12002484B2
公开(公告)日:2024-06-04
申请号:US17736797
申请日:2022-05-04
IPC分类号: G10L21/0232 , G10L21/0208 , G10L21/0224 , H04R3/00
CPC分类号: G10L21/0232 , G10L21/0208 , G10L21/0224 , H04R3/002
摘要: This application discloses a method and an apparatus for processing an audio signal. The method includes obtaining a first speech signal acquired by a first device; performing frame blocking on the first speech signal, to obtain multiple speech signal frames; converting the multiple speech signal frames into multiple first frequency domain signal frames; performing aliasing processing on a first sub-frequency domain signal frame among the multiple first frequency domain signal frames with a frequency lower than or equal to a target frequency threshold, and retaining a second sub-frequency domain signal frame among the multiple first frequency domain signal frames with a frequency higher than the target frequency threshold, to obtain multiple second frequency domain signal frames, the target frequency threshold being related to a sampling frequency of a second device; and performing frame fusion on the multiple second frequency domain signal frames, to obtain a second speech signal.
-
公开(公告)号:US11996109B2
公开(公告)日:2024-05-28
申请号:US18146151
申请日:2022-12-23
发明人: Adriana Vasilache
IPC分类号: G10L19/038 , G10L19/022 , G10L21/0224 , G10L21/0232 , G10L19/00
CPC分类号: G10L19/038 , G10L19/022 , G10L21/0224 , G10L21/0232 , G10L2019/0001
摘要: There is disclosed inter alia an apparatus for spatial audio signal encoding comprising means for receiving for each time frequency block of a sub band of an audio frame a spatial audio parameter comprising an azimuth and an elevation; determining a first distortion measure for the audio frame by determining a first distance measure for each time frequency block and summing the first distance measure for each time frequency block; determining a second distortion measure for the audio frame by determining a second distance measure for each time frequency block and summing the second distance measure for each time frequency block, and selecting either the first quantization scheme or the second quantization scheme for quantising the elevation and the azimuth for all time frequency blocks of the sub band of the audio frame, wherein the selecting is dependent on the first and second distortion measures.
-
公开(公告)号:US11984133B2
公开(公告)日:2024-05-14
申请号:US17775274
申请日:2020-10-29
发明人: Takashi Enokihara , Kojiro Matsuyama , Takafumi Ogura , Tetsuro Horikawa , Shin Kimura , Nobuyuki Kihara
IPC分类号: G10L21/0224 , A63F13/215 , A63F13/285 , A63F13/54 , G06F3/01 , G06F3/033 , G06F3/0338 , G10L21/0208 , G10L21/0216 , G10L21/0232 , H04R1/08 , H04R3/00
CPC分类号: G10L21/0224 , A63F13/54 , G06F3/016 , G10L21/0232 , H04R1/08 , H04R3/00 , G10L2021/02082 , G10L2021/02163
摘要: Disclosed is an operation device including an interaction member, a microphone, a control circuit, and an audio signal processing circuit. The interaction member is used for interacting with a user. The control circuit periodically acquires scan data indicating the acting status of the interaction member. The audio signal processing circuit executes a noise removal process of removing noise from a collected audio signal collected by the microphone. The control circuit periodically transmits previously acquired scan data to the audio signal processing circuit. The audio signal processing circuit executes the noise removal process by using the scan data transmitted from the control circuit.
-
公开(公告)号:US11967316B2
公开(公告)日:2024-04-23
申请号:US17183209
申请日:2021-02-23
发明人: Jimeng Zheng , Ian Ernan Liu , Yi Gao , Weiwei Li
IPC分类号: G10L15/22 , G01S3/80 , G01S3/802 , G10L15/08 , G10L15/20 , G10L21/0224 , G10L21/0232 , G10L25/51 , G10L21/0208 , G10L21/0216
CPC分类号: G10L15/20 , G01S3/8006 , G01S3/802 , G10L15/08 , G10L15/22 , G10L21/0224 , G10L21/0232 , G10L25/51 , G10L2015/088 , G10L2021/02082 , G10L2021/02166
摘要: Embodiments of this application disclose method and apparatus for positioning a target audio signal by an audio interaction device, and an audio interaction device The method includes: obtaining audio signals in a plurality of directions in a space, and performing echo cancellation on the audio signal, the audio signal including a target-audio direct signal; obtaining weights of a plurality of time-frequency points in the audio signals, a weight of each time-frequency point indicating, at the time-frequency point, a relative proportion of the target-audio direct signal in the audio signals; weighting time-frequency components of the audio signal at the plurality of time-frequency points separately for each of the plurality of directions by using the weights of the plurality of time-frequency points, to obtain a weighted audio signal energy distribution; and obtaining a sound source azimuth corresponding to the target-audio direct signal in the audio signals accordingly.
-
公开(公告)号:US11967305B2
公开(公告)日:2024-04-23
申请号:US17840958
申请日:2022-06-15
IPC分类号: G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00
CPC分类号: G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303
摘要: A method, computer program product, and computing system for generating a three-dimensional model of at least a portion of a three-dimensional space incorporating an ACI system via a video recording subsystem of an ACI calibration platform; and generating one or more audio calibration signals for receipt by an audio recording system included within the ACI system via an audio generation subsystem of the ACI calibration platform.
-
9.
公开(公告)号:US11955240B1
公开(公告)日:2024-04-09
申请号:US18465977
申请日:2023-09-12
IPC分类号: G06T7/00 , A61B3/00 , A61B3/14 , A61B5/00 , G06T5/30 , G06T5/40 , G06T7/12 , G06V10/77 , G06V10/82 , G06V20/70 , G10L15/02 , G10L15/05 , G10L15/16 , G10L15/18 , G10L15/22 , G10L21/0224 , G10L25/18 , G10L25/21 , G16H50/20
CPC分类号: G16H50/20 , A61B3/0025 , A61B3/14 , A61B5/4803 , G06T5/30 , G06T5/40 , G06T7/0012 , G06T7/12 , G06V10/7715 , G06V10/82 , G06V20/70 , G10L15/02 , G10L15/05 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L21/0224 , G10L25/18 , G10L25/21 , G06T2207/20084 , G06T2207/30041 , G06T2207/30096 , G06T2207/30101 , G06V2201/03 , G10L2015/025
摘要: A neural-network-based-implemented ophthalmologic intelligent consultation method includes: performing correction filtering on a consultation voice of a patient, framing the voice into a consultation voice frame sequence, generating a consultation text corresponding to the consultation voice frame sequence based on phoneme recognition and phoneme transcoding, and extracting an ophthalmologically-described disease; performing gray-level filtering, primary picture segmentation, and size equalization operation on an eye picture set of the to-be-diagnosed patient to acquire a standard eyeball picture group; extracting eye white features, pupil features and blood vessel features from the standard eyeball picture group, performing lesion feature analysis on the eye white features, the pupil features and the blood vessel features to acquire an ophthalmologically-observed disease, and based on the ophthalmologically-observed disease and the ophthalmologically-described disease, generating a consultation result.
-
公开(公告)号:US20240055012A1
公开(公告)日:2024-02-15
申请号:US17819654
申请日:2022-08-15
发明人: Zhong-Qiu Wang , Gordon Wichern , Jonathan Le Roux
IPC分类号: G10L21/0232 , G10L15/06 , G10L25/30 , G10L21/0224
CPC分类号: G10L21/0232 , G10L15/063 , G10L25/30 , G10L21/0224 , G10L2021/02082
摘要: A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. The estimated filter is used for modeling the RIR.
-
-
-
-
-
-
-
-
-