Patent search ap:("Intel Corporation") AND inv:"Yury Gorbachev" Page 1

1.

发明申请
WEIGHT COMPRESSION ACCURACY ENHANCEMENTS IN LARGE LANGUAGE MODELS 有权

公开(公告)号：US20250037017A1

公开(公告)日：2025-01-30

申请号：US18599833

申请日：2024-03-08

Applicant: Intel Corporation

Inventor： Andrei Anufriev , Alexander Kozlov , Yury Gorbachev

IPC: G06N20/00

Abstract: Systems, apparatuses and methods may provide for technology that accesses a pre-trained artificial intelligence (AI) model, quantizes a plurality of weights of the pre-trained AI model to generate a compressed AI model, and applies normalization correction to the compressed AI model to generate an output AI model.

2.

发明申请
METHODS AND APPARATUS TO EVICT TOKENS FROM A KEY VALUE CACHE 有权

公开(公告)号：US20250036876A1

公开(公告)日：2025-01-30

申请号：US18913538

申请日：2024-10-11

Applicant: Intel Corporation

Inventor： Alexander Kozlov , Liubov Talamanova , Yury Gorbachev

IPC: G06F40/284 , G06F12/126

Abstract: Systems, apparatus, articles of manufacture, and methods are disclosed to evict tokens from a key value cache. An example apparatus includes interface circuitry, machine readable instructions, and programmable circuitry to at least one of instantiate or execute the machine readable instructions to: determine score history values for tokens based on attention scores associated with the tokens, wherein a token is a numerical representation of text, after a number of tokens present in the key value cache exceeds a threshold number of tokens, compute group importance scores for groups of tokens based on score history values of the tokens in the groups of tokens, identify low-ranked groups of tokens having lowest group importance scores, the low-ranked groups of tokens associated with an eviction range in the key value cache, and remove an identified low-ranked group of tokens from the eviction range of the key value cache.

3.

发明申请
WEIGHT QUANTIZATION ADAPTATION TECHNOLOGY 有权

公开(公告)号：US20250028965A1

公开(公告)日：2025-01-23

申请号：US18904364

申请日：2024-10-02

Applicant: Intel Corporation

Inventor： Alexander Kozlov , Andrey Anufriev , Nikolay Lyalyushkin , Dmitry Gorokhov , Yury Gorbachev

IPC: G06N3/082

Abstract: Systems, apparatuses and methods may provide for technology that selects a subset of linear layers from a plurality of linear layers in a pre-trained artificial intelligence (AI) model, wherein a quantization error of the subset of linear layers exceeds an error threshold. For each linear layer in the subset of linear layers, the technology solves a singular value decomposition (SVD) approximation, generates a first adapter layer and a second adapter layer based on the SVD decomposition, wherein the first adapter layer and the second adapter layer include weight matrices having a first dimension that is less than a first rank threshold and a second dimension that is greater than a second rank threshold, and determines an inference output based on the linear layer, the first adapter layer and the second adapter layer.

Patent Agency Ranking