专利检索 ap:("BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.") AND inv:"Yi ZHANG" 第 1 页

1.

发明申请
MULTI-TASK DEEP NETWORK FOR ECHO PATH DELAY ESTIMATION AND ECHO CANCELLATION 有权

公开(公告)号：US20220277721A1

公开(公告)日：2022-09-01

申请号：US17188406

申请日：2021-03-01

申请人： Beijing DiDi Infinity Technology and Development Co., Ltd.

发明人： Yi ZHANG , Chengyun DENG , Shiqian MA , Yongtao SHA , Hui SONG

IPC分类号： G10K11/178 , G06N3/04 , G06N3/08

摘要： A method of echo path delay destination and echo cancellation is described in this disclosure. The method includes: obtaining a reference signal, a microphone signal, and a trained multi-task deep neural network, wherein the multi-task deep neural network comprises a first neural network and a second neural network; generating, using the first neural network of the multi-task deep neural network, an estimated echo path delay based on the reference signal and the microphone signal; updating the reference signal based on the estimated echo path delay; and generating, using the second neural network of the multi-task deep neural network, an enhanced microphone signal based on the microphone signal and the updated reference signal.

2.

发明申请
SPEECH COMMUNICATION SYSTEM AND METHOD FOR IMPROVING SPEECH INTELLIGIBILITY 有权

公开(公告)号：US20210225388A1

公开(公告)日：2021-07-22

申请号：US16314526

申请日：2018-12-06

申请人： BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD

发明人： Yi ZHANG , Hui SONG , Yongtao SHA , Si QIN

IPC分类号： G10L21/0364 , G10L21/0232 , G10L15/08

摘要： A speech communication system for improving speech intelligibility may comprise one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: determining a cutoff frequency based on an estimation of a spectrum of noise, wherein the cutoff frequency defines a noise dominant region of frequency; lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech increases by the cutoff frequency; and applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.

3.

发明申请
CONCURRENT MULTI-PATH PROCESSING OF AUDIO SIGNALS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS 有权

公开(公告)号：US20220139368A1

公开(公告)日：2022-05-05

申请号：US17433868

申请日：2019-02-28

申请人： BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

发明人： Yi ZHANG , Hui SONG , Yongtao SHA , Chengyun DENG

IPC分类号： G10L15/00 , G10L21/18 , G10L19/02 , G10L25/18

摘要： A system and method for concurrent multi-path processing of audio signals for automatic speech recognition is presented. Audio information defining a set of audio signals may be obtained (502). The audio signals may convey mixed audio content produced by multiple audio sources. A set of source-specific audio signals may be determined by demixing the mixed audio content produced by the multiple audio sources. Determining the set of source-specific audio signals may comprises providing the set of audio signals to both a first signal processing path and a second signal processing path (504). The first signal processing path may determine a value of a demixing parameter for demixing the mixed audio content (506). The second signal processing path may apply the value of the demixing parameter to the individual audio signals of the set of audio signals (508) to generate the individual source-specific audio signals (510).

4.

发明申请
METHOD AND SYSTEM FOR ACOUSTIC ECHO CANCELLATION 有权

公开(公告)号：US20230094630A1

公开(公告)日：2023-03-30

申请号：US18062556

申请日：2022-12-06

申请人： BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

发明人： Yi ZHANG , Chengyun DENG , Shiqian MA , Yongtao SHA , Hui SONG

IPC分类号： G06N3/08 , G06V10/82

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media for acoustic echo cancellation and suppression are provided. An exemplary method comprises receiving a far-end acoustic signal and a corrupted near-end acoustic signal, wherein the corrupted near-end acoustic signal is generated based on (1) an echo of the far-end acoustic signal and (2) a near-end acoustic signal; feeding the far-end acoustic signal and the corrupted near-end acoustic signal into a neural network as an input to output a time-frequency (TF) mask that suppresses the echo and retains the near-end acoustic signal, and generating an enhanced version of the corrupted near-end acoustic signal by applying the obtained TF mask to the corrupted near-end acoustic signal.