专利检索 ap:("Zhejiang University") AND inv:"Jinsong Han" 第 1 页

1.

发明授权
Multimodal speech recognition method and system, and computer-readable storage medium 有权

公开(公告)号：US12112744B2

公开(公告)日：2024-10-08

申请号：US17684958

申请日：2022-03-02

申请人： Zhejiang University

发明人： Feng Lin , Tiantian Liu , Ming Gao , Chao Wang , Zhongjie Ba , Jinsong Han , Wenyao Xu , Kui Ren

IPC分类号： G10L15/20 , G01S13/88 , G10L15/06 , G10L15/18 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78

CPC分类号： G10L15/20 , G01S13/88 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78

摘要： The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类