Invention Grant
- Patent Title: System for speech recognition text enhancement fusing multi-modal semantic invariance
-
Application No.: US17867937Application Date: 2022-07-19
-
Publication No.: US11488586B1Publication Date: 2022-11-01
- Inventor: Jianhua Tao , Shuai Zhang , Jiangyan Yi
- Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
- Applicant Address: CN Beijing
- Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
- Current Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
- Current Assignee Address: CN Beijing
- Agency: Westbridge IP LLC
- Priority: CN202110815743.4 20210719
- Main IPC: G10L15/18
- IPC: G10L15/18 ; G10L15/16 ; G10L15/06

Abstract:
Disclosed is a system for speech recognition text enhancement fusing multi-modal semantic invariance, the system includes an acoustic feature extraction module, an acoustic down-sampling module, an acoustic feature extraction module, an acoustic down-sampling module, an encoder and a decoder fusing multi-modal semantic invariance; the acoustic feature extraction module is configured for frame-dividing processing of speech data, dividing the speech data into short-term audio frames with a fixed length, extracting thank acoustic features from the short-term audio frames, and inputting the acoustic features into the acoustic down-sampling module for down-sampling to obtain an acoustic representation; inputting the speech data into an existing speech recognition module to obtain input text data, and inputting the input text data into the encoder to obtain an input text encoded representation; inputting the acoustic representation and the input text encoded representation into the decoder to fuse.
Public/Granted literature
- US2191096A Apparatus for feeding aggregate Public/Granted day:1940-02-20
Information query