System for speech recognition text enhancement fusing multi-modal semantic invariance

Invention Grant

US11488586B1 System for speech recognition text enhancement fusing multi-modal semantic invariance 有权

Please log in to see more content

Patent Title: System for speech recognition text enhancement fusing multi-modal semantic invariance
Application No.: US17867937

Application Date: 2022-07-19
Publication No.: US11488586B1

Publication Date: 2022-11-01
Inventor: Jianhua Tao , Shuai Zhang , Jiangyan Yi
Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
Applicant Address: CN Beijing
Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
Current Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
Current Assignee Address: CN Beijing
Agency: Westbridge IP LLC
Priority: CN202110815743.4 20210719
Main IPC: G10L15/18
IPC: G10L15/18 ; G10L15/16 ; G10L15/06

Abstract:

Disclosed is a system for speech recognition text enhancement fusing multi-modal semantic invariance, the system includes an acoustic feature extraction module, an acoustic down-sampling module, an acoustic feature extraction module, an acoustic down-sampling module, an encoder and a decoder fusing multi-modal semantic invariance; the acoustic feature extraction module is configured for frame-dividing processing of speech data, dividing the speech data into short-term audio frames with a fixed length, extracting thank acoustic features from the short-term audio frames, and inputting the acoustic features into the acoustic down-sampling module for down-sampling to obtain an acoustic representation; inputting the speech data into an existing speech recognition module to obtain input text data, and inputting the input text data into the encoder to obtain an input text encoded representation; inputting the acoustic representation and the input text encoded representation into the decoder to fuse.

Public/Granted literature

US2191096A Apparatus for feeding aggregate Public/Granted day:1940-02-20

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/18	..利用自然语言模型