Deep multi-channel acoustic modeling using frequency aligned network

Invention Grant

US11495215B1 Deep multi-channel acoustic modeling using frequency aligned network 有权

Please log in to see more content

Patent Title: Deep multi-channel acoustic modeling using frequency aligned network
Application No.: US16710811

Application Date: 2019-12-11
Publication No.: US11495215B1

Publication Date: 2022-11-08
Inventor: Minhua Wu , Shiva Sundaram , Tae Jin Park , Kenichi Kumatani
Applicant: Amazon Technologies, Inc.
Applicant Address: US WA Seattle
Assignee: Amazon Technologies, Inc.
Current Assignee: Amazon Technologies, Inc.
Current Assignee Address: US WA Seattle
Agency: Pierce Atwood LLP
Main IPC: G10L21/00
IPC: G10L21/00 ; G10L15/16 ; G10L15/06 ; G06N3/04 ; G10L21/0216 ; G06N3/08

Deep multi-channel acoustic modeling using frequency aligned network

Abstract:

Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that includes a frequency aligned network (FAN) architecture. Thus, the first model may perform spatial filtering to generate a first feature vector by processing individual frequency bins separately, such that multiple frequency bins are not combined. The first feature vector may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）