Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Jiusheng CHEN"

1.

发明申请
DYNAMIC CACHE MANAGEMENT IN BEAM SEARCH 有权

公开(公告)号：US20220100676A1

公开(公告)日：2022-03-31

申请号：US17178385

申请日：2021-02-18

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yu YAN , Jiusheng CHEN , Ruofei ZHANG

IPC: G06F12/122 , G06N3/04 , G06F40/40

Abstract: Systems and methods for dynamically modifying a cache associated with a neural network model of a natural language generator are described. In examples, a neural network model employs a beam search algorithm at a decoder when decoding output and generating predicted output candidates. The decoder utilizes caching techniques to improve a speed at which the neural network operations. When an amount of memory utilized by one or more caches of the neural network model is determined to exceed a threshold memory size, a layer-specific portion of a cache associated with a layer of the neural network model is identified. The identified layer-specific portion of the cache can be deleted when the amount of memory utilized by the cache of the neural network model exceeds the threshold memory size. In examples, data in the cache is deduplicated and/or deleted.

2.

发明公开
GENERATION OF DATA MODELS FOR PREDICTING DATA 审中-公开

公开(公告)号：US20240046037A1

公开(公告)日：2024-02-08

申请号：US18268699

申请日：2020-12-25

Applicant: Microsoft Technology Licensing, LLC

Inventor： Jian JIAO , Yeyun GONG , Nan DUAN , Weizhu CHEN , Kewen TANG , Qiang LOU , Ruofei ZHANG , Yu YAN , Jiusheng CHEN

IPC: G06F40/284 , G06F40/40

CPC classification number: G06F40/284 , G06F40/40

Abstract: Systems and methods are provided for training a data model based on training data. The training includes pre-training and fine-tuning the data model based on a combination of an autoregressive (AR) model and a non-autoregressive (NAR) model. Training data may be received and encoded into streams of tokens. A pre-trainer during decoding generates a continuum of data structures of the AR and NAR combined model including a main stream and a series of predicting streams. Masked tokens in predicting streams reference or attend to one or more preceding tokens in the main stream or the preceding predicting streams. A fine-tuner selects streams to generate a trained model according to a target data model. The target data model is determined based on balancing an accuracy constraint and an efficiency constraint for predicting tokens. The decoder acts as abridge between the AR and NAR models in generating a trained data model.

3.

发明申请
Resource-Efficient Attention in a Neural Network 有权

公开(公告)号：US20220318601A1

公开(公告)日：2022-10-06

申请号：US17221791

申请日：2021-04-03

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yu YAN , Jiusheng CHEN , Nikhil BHENDAWADE , Yeyun GONG , Nan DUAN , Ruofei ZHANG

IPC: G06N3/04

Abstract: Computing technology is described herein that provides an attention mechanism, implemented by a neural network, that generates attention information based on head-specific query information and shared key and value (KV) information, without computing head-specific key information and head-specific value information, and without caching the head-specific key information and the head-specific value information in memory. This manner of operation allows the computing technology to make efficient use of processing and memory resources. In some implementations, the attention mechanism is part of decoder of an encoder-decoder system, or a standalone decoder system. In some implementations, the computing technology leverages the attention information to generate synthesized text based on input text.

Patent Agency Ranking