-
公开(公告)号:US12112744B2
公开(公告)日:2024-10-08
申请号:US17684958
申请日:2022-03-02
申请人: Zhejiang University
发明人: Feng Lin , Tiantian Liu , Ming Gao , Chao Wang , Zhongjie Ba , Jinsong Han , Wenyao Xu , Kui Ren
IPC分类号: G10L15/20 , G01S13/88 , G10L15/06 , G10L15/18 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78
CPC分类号: G10L15/20 , G01S13/88 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78
摘要: The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.