Patent search ap:("INSTITUTE OF AUTOMATION Page CHINESE ACADEMY OF SCIENCES") AND inv:"Shuai Zhang"

1.

发明授权
System for speech recognition text enhancement fusing multi-modal semantic invariance 有权

公开(公告)号：US11488586B1

公开(公告)日：2022-11-01

申请号：US17867937

申请日：2022-07-19

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jianhua Tao , Shuai Zhang , Jiangyan Yi

IPC: G10L15/18 , G10L15/16 , G10L15/06

Abstract: Disclosed is a system for speech recognition text enhancement fusing multi-modal semantic invariance, the system includes an acoustic feature extraction module, an acoustic down-sampling module, an acoustic feature extraction module, an acoustic down-sampling module, an encoder and a decoder fusing multi-modal semantic invariance; the acoustic feature extraction module is configured for frame-dividing processing of speech data, dividing the speech data into short-term audio frames with a fixed length, extracting thank acoustic features from the short-term audio frames, and inputting the acoustic features into the acoustic down-sampling module for down-sampling to obtain an acoustic representation; inputting the speech data into an existing speech recognition module to obtain input text data, and inputting the input text data into the encoder to obtain an input text encoded representation; inputting the acoustic representation and the input text encoded representation into the decoder to fuse.

2.

发明授权
End-to-end system for speech recognition and speech translation and device 有权

公开(公告)号：US11475877B1

公开(公告)日：2022-10-18

申请号：US17852140

申请日：2022-06-28

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jianhua Tao , Shuai Zhang , Jiangyan Yi

IPC: G10L15/02 , G10L15/18 , G10L15/26 , G10L19/00

Abstract: Disclosed are an end-to-end system for speech recognition and speech translation and an electronic device. The system comprises an acoustic encoder and a multi-task decoder and a semantic invariance constraint module, and completes two tasks for speech recognition and speech translation. In addition, according to the characteristic of the semantic consistency of texts between different tasks, semantic constraints are imposed on the model to learn high-level semantic information, and the semantic information can effectively improve the performance of speech recognition and speech translation. The application has the following advantages that the error accumulation problem of serial system is avoided, and the calculation cost of the model is low and the real-time performance is very high.

Patent Agency Ranking