专利检索 ap:"Chao XING" 第 1 页

1.

发明授权
Transformer-based automatic speech recognition system incorporating time-reduction layer 有权

公开(公告)号：US11715461B2

公开(公告)日：2023-08-01

申请号：US17076794

申请日：2020-10-21

申请人： Md Akmal Haidar , Chao Xing

发明人： Md Akmal Haidar , Chao Xing

IPC分类号： G10L15/16 , G10L15/06

CPC分类号： G10L15/16 , G10L15/063

摘要： Computer implemented method and system for automatic speech recognition. A first speech sequence is processed, using a time reduction operation of an encoder NN, into a second speech sequence comprising a second set of speech frame feature vectors that each concatenate information from a respective plurality of speech frame feature vectors included in the first set and includes fewer speech frame feature vectors than the first speech sequence. The second speech sequence is transformed, using a self-attention operation of the encoder NN, into a third speech sequence comprising a third set of speech frame feature vectors. The third speech sequence is processed using a probability operation of the encoder NN, to predict a sequence of first labels corresponding to the third set of speech frame feature vectors, and using a decoder NN to predict a sequence of second labels corresponding to the third set of speech frame feature vectors.

2.

发明授权
Systems and methods for video retrieval and grounding 有权

公开(公告)号：US11698926B2

公开(公告)日：2023-07-11

申请号：US17524862

申请日：2021-11-12

申请人： Arnab Kumar Mondal , Deepak Sridhar , Niamul Quader , Juwei Lu , Peng Dai , Chao Xing

发明人： Arnab Kumar Mondal , Deepak Sridhar , Niamul Quader , Juwei Lu , Peng Dai , Chao Xing

IPC分类号： G06F16/30 , G06F16/732 , G06N3/04 , G06F16/783 , G06V20/40

CPC分类号： G06F16/7343 , G06F16/783 , G06N3/04 , G06V20/40

摘要： Methods and systems are described for performing video retrieval together with video grounding. A word-based query for a video is and encoded into a query representation using a trained query encoder. One or more similar video representations are identified, from a plurality of video representations that are similar to the query representation. Each similar video representation represents a respective relevant video. A grounding is generated for each relevant video by forward propagating each respective similar video representation together with the query representation through a trained grounding module. The relevant videos or identifiers of the relevant videos are outputted together with the grounding generated for each relevant video.

3.

发明公开
SYSTEMS AND METHODS FOR VIDEO RETRIEVAL AND GROUNDING 审中-公开

公开(公告)号：US20230153352A1

公开(公告)日：2023-05-18

申请号：US17524862

申请日：2021-11-12

申请人： Arnab Kumar MONDAL , Deepak SRIDHAR , Niamul QUADER , Juwei LU , Pen DAI , Chao XING

发明人： Arnab Kumar MONDAL , Deepak SRIDHAR , Niamul QUADER , Juwei LU , Pen DAI , Chao XING

IPC分类号： G06F16/732 , G06F16/783 , G06K9/00 , G06N3/04

CPC分类号： G06F16/7343 , G06F16/783 , G06K9/00711 , G06N3/04

摘要： Methods and systems are described for performing video retrieval together with video grounding. A word-based query for a video is and encoded into a query representation using a trained query encoder. One or more similar video representations are identified, from a plurality of video representations that are similar to the query representation. Each similar video representation represents a respective relevant video. A grounding is generated for each relevant video by forward propagating each respective similar video representation together with the query representation through a trained grounding module. The relevant videos or identifiers of the relevant videos are outputted together with the grounding generated for each relevant video.

4.

发明申请
TRANSFORMER-BASED AUTOMATIC SPEECH RECOGNITION SYSTEM INCORPORATING TIME-REDUCTION LAYER 有权

公开(公告)号：US20220122590A1

公开(公告)日：2022-04-21

申请号：US17076794

申请日：2020-10-21

申请人： Md Akmal HAIDAR , Chao XING

发明人： Md Akmal HAIDAR , Chao XING

IPC分类号： G10L15/16 , G10L15/06

摘要： Computer implemented method and system for automatic speech recognition. A first speech sequence is processed, using a time reduction operation of an encoder NN, into a second speech sequence that comprises a second set of speech frame feature vectors that each concatenate information from a respective plurality of speech frame feature vectors included in the first set, wherein the second speech sequence includes fewer speech frame feature vectors than the first speech sequence. The second speech sequence is transformed, using a self-attention operation of the encoder NN, into a third speech sequence that comprises a third set of speech frame feature vectors. The third speech sequence is processed, using a probability operation of the encoder NN, to predict a sequence of first labels corresponding to the third set of speech frame feature vectors. The third speech sequence is also processed using a decoder NN to predict a sequence of second labels corresponding to the third set of speech frame feature vectors.

5.

发明公开
METHODS AND SYSTEMS FOR STREAMABLE MULTIMODAL LANGUAGE UNDERSTANDING 审中-公开

公开(公告)号：US20230223018A1

公开(公告)日：2023-07-13

申请号：US17571425

申请日：2022-01-07

申请人： Chao XING , Anderson AVILA

发明人： Chao XING , Anderson AVILA

IPC分类号： G10L15/197 , G10L15/22 , G10L15/18 , G10L15/16 , G10L19/00

CPC分类号： G10L15/197 , G10L15/22 , G10L15/1815 , G10L15/16 , G10L19/00 , G10L2015/223

摘要： The present disclosure describes methods and systems for generating semantic predictions from an input speech signal representing a speaker's speech, and maps the semantic predictions to a command action that represents the speaker's intent. A streamable multimodal language understanding (MLU) system includes a machine learning-based model, such as a RNN model that is trained to convert speech chunks and corresponding text predictions of the input speech signal into semantic predictions that represent a speaker's intent. A semantic prediction is generated and updated, over a series of time steps. In each time step, a new speech chunk and corresponding text prediction of the input speech signal are obtained, encoded and fused to generate an audio-textual representation. A semantic prediction is generated by a sequence classifier by processing the audio-textual representation and the semantic prediction is updated as new speech chunks and corresponding text predictions are obtained. Extracted semantic information contained within a sequence of semantic predictions representing a speaker's speech are acted upon through a command action performed by another computing device or computer application.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类