- 专利标题: Deep multi-channel acoustic modeling using frequency aligned network
-
申请号: US16710811申请日: 2019-12-11
-
公开(公告)号: US11495215B1公开(公告)日: 2022-11-08
- 发明人: Minhua Wu , Shiva Sundaram , Tae Jin Park , Kenichi Kumatani
- 申请人: Amazon Technologies, Inc.
- 申请人地址: US WA Seattle
- 专利权人: Amazon Technologies, Inc.
- 当前专利权人: Amazon Technologies, Inc.
- 当前专利权人地址: US WA Seattle
- 代理机构: Pierce Atwood LLP
- 主分类号: G10L21/00
- IPC分类号: G10L21/00 ; G10L15/16 ; G10L15/06 ; G06N3/04 ; G10L21/0216 ; G06N3/08
摘要:
Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that includes a frequency aligned network (FAN) architecture. Thus, the first model may perform spatial filtering to generate a first feature vector by processing individual frequency bins separately, such that multiple frequency bins are not combined. The first feature vector may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.
信息查询