专利检索 ap:("Jinyu LI" OR "Rui ZHAO" OR "Yifan GONG" OR "MICROSOFT TECHNOLOGY LICENSING, LLC") AND inv:"Yifan GONG" 第 3 页

21.

发明申请
SPEECH EXTRACTION USING ATTENTION NETWORK 审中-公开

公开(公告)号：US20200335119A1

公开(公告)日：2020-10-22

申请号：US16434537

申请日：2019-06-07

申请人： Microsoft Technology Licensing, LLC

发明人： Xiong XIAO , Zhuo CHEN , Takuya YOSHIOKA , Changliang LIU , Hakan ERDOGAN , Dimitrios Basile DIMITRIADIS , Yifan GONG , James Garnet Droppo, III

IPC分类号： G10L21/028 , G10L21/0208

摘要： Embodiments are associated with determination of a first plurality of multi-dimensional vectors, each of the first plurality of multi-dimensional vectors representing speech of a target speaker, determination of a multi-dimensional vector representing a speech signal of two or more speakers, determination of a weighted vector representing speech of the target speaker based on the first plurality of multi-dimensional vectors and on similarities between the multi-dimensional vector and each of the first plurality of multi-dimensional vectors, and extraction of speech of the target speaker from the speech signal based on the weighted vector and the speech signal.

22.

发明申请
CONDITIONAL TEACHER-STUDENT LEARNING FOR MODEL TRAINING 审中-公开

公开(公告)号：US20200334538A1

公开(公告)日：2020-10-22

申请号：US16410741

申请日：2019-05-13

申请人： Microsoft Technology Licensing, LLC

发明人： Zhong MENG , Jinyu LI , Yong ZHAO , Yifan GONG

IPC分类号： G06N3/08 , G10L15/16 , G06N3/04 , G10L15/183

摘要： Embodiments are associated with conditional teacher-student model training. A trained teacher model configured to perform a task may be accessed and an untrained student model may be created. A model training platform may provide training data labeled with ground truths to the teacher model to produce teacher posteriors representing the training data. When it is determined that a teacher posterior matches the associated ground truth label, the platform may conditionally use the teacher posterior to train the student model. When it is determined that a teacher posterior does not match the associated ground truth label, the platform may conditionally use the ground truth label to train the student model. The models might be associated with, for example, automatic speech recognition (e.g., in connection with domain adaptation and/or speaker adaptation).

23.

发明申请
CAPTION ASSISTED CALLING TO MAINTAIN CONNECTION IN CHALLENGING NETWORK CONDITIONS 有权

公开(公告)号：US20220159047A1

公开(公告)日：2022-05-19

申请号：US17345703

申请日：2021-06-11

申请人： MICROSOFT TECHNOLOGY LICENSING, LLC

发明人： Akash Alok MAHAJAN , Yifan GONG

IPC分类号： H04L29/06 , H04L29/08 , G10L15/26 , H04L12/26

摘要： Systems are provided for managing and coordinating STT/TTS systems and the communications between these systems when they are connected in online meetings and for mitigating connectivity issues that may arise during the online meetings to provide a seamless and reliable meeting experience with either live captions and/or rendered audio. Initially, online meeting communications are transmitted over a lossy connectionless type protocol/channel. Then, in response to detected connectivity problems with one or more systems involved in the online meeting, which can cause jitter or packet loss, for example, an instruction is dynamically generated and processed for causing one or more of the connected systems to transmit and/or process the online meeting content with a more reliable connection/protocol, such as a connection-oriented protocol. Codecs at the systems are used, when needed to convert speech to text with related speech attribute information and to convert text to speech.

24.

发明申请
CONVOLUTIONAL NEURAL NETWORK WITH PHONETIC ATTENTION FOR SPEAKER VERIFICATION 有权

公开(公告)号：US20220157324A1

公开(公告)日：2022-05-19

申请号：US17665862

申请日：2022-02-07

申请人： Microsoft Technology Licensing, LLC

发明人： Yong ZHAO , Tianyan ZHOU , Jinyu LI , Yifan GONG , Jian WU , Zhuo CHEN

IPC分类号： G10L17/18 , G06N3/08 , G10L17/02

摘要： Embodiments may include determination, for each of a plurality of speech frames associated with an acoustic feature, of a phonetic feature based on the associated acoustic feature, generation of one or more two-dimensional feature maps based on the plurality of phonetic features, input of the one or more two-dimensional feature maps to a trained neural network to generate a plurality of speaker embeddings, and aggregation of the plurality of speaker embeddings into a speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, wherein the speaker embedding is associated with an identity of the speaker.

25.

发明申请
SPEAKER ADAPTATION FOR ATTENTION-BASED ENCODER-DECODER 有权

公开(公告)号：US20210065683A1

公开(公告)日：2021-03-04

申请号：US16675515

申请日：2019-11-06

申请人： Microsoft Technology Licensing, LLC

发明人： Zhong MENG , Yashesh GAUR , Jinyu LI , Yifan GONG

IPC分类号： G10L15/065 , G10L15/22 , G10L19/00 , G10L15/06

摘要： Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution, training of the second attention-based encoder-decoder model to classify output tokens based on input speech frames of a target speaker and simultaneously training the speaker-dependent attention-based encoder-decoder model to maintain a similarity between the first output distribution and the second output distribution, and performing automatic speech recognition on speech frames of the target speaker using the trained speaker-dependent attention-based encoder-decoder model.

26.

发明申请
LAYER TRAJECTORY LONG SHORT-TERM MEMORY WITH FUTURE CONTEXT 审中-公开

公开(公告)号：US20200334526A1

公开(公告)日：2020-10-22

申请号：US16410659

申请日：2019-05-13

申请人： Microsoft Technology Licensing, LLC

发明人： Jinyu LI , Vadim MAZALOV , Changliang LIU , Liang LU , Yifan GONG

IPC分类号： G06N3/08 , G10L15/22 , G10L15/16 , G10L15/183

摘要： According to some embodiments, a machine learning model may include an input layer to receive an input signal as a series of frames representing handwriting data, speech data, audio data, and/or textual data. A plurality of time layers may be provided, and each time layer may comprise a uni-directional recurrent neural network processing block. A depth processing block may scan hidden states of the recurrent neural network processing block of each time layer, and the depth processing block may be associated with a first frame and receive context frame information of a sequence of one or more future frames relative to the first frame. An output layer may output a final classification as a classified posterior vector of the input signal. For example, the depth processing block may receive the context from information from an output of a time layer processing block or another depth processing block of the future frame.

27.

发明申请
MULTI-MODAL SPEECH ATTRIBUTION AMONG N SPEAKERS 审中-公开

公开(公告)号：US20190341053A1

公开(公告)日：2019-11-07

申请号：US16019318

申请日：2018-06-26

申请人： Microsoft Technology Licensing, LLC

发明人： Shixiong ZHANG , Lingfeng WU , Eyal KRUPKA , Xiong XIAO , Yifan GONG

IPC分类号： G10L17/00 , G06K9/00 , H04R1/40 , G06K9/62 , H04L12/18

摘要： A computerized conference assistant includes a camera and a microphone. A face location machine of the computerized conference assistant finds a physical location of a human, based on a position of a candidate face in digital video captured by the camera. A beamforming machine of the computerized conference assistant outputs a beamformed signal isolating sounds originating from the physical location of the human. A diarization machine of the computerized conference assistant attributes information encoded in the beamformed signal to the human.

28.

发明申请
ADVANCING WORD-BASED SPEECH RECOGNITION PROCESSING 审中-公开

公开(公告)号：US20190279614A1

公开(公告)日：2019-09-12

申请号：US15917082

申请日：2018-03-09

申请人： Microsoft Technology Licensing, LLC

发明人： Guoli YE , James DROPPO , Jinyu LI , Rui ZHAO , Yifan GONG

IPC分类号： G10L15/187 , G10L15/16 , G10L15/06 , G10L15/22

摘要： Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类