-
公开(公告)号:US20250021819A1
公开(公告)日:2025-01-16
申请号:US18900006
申请日:2024-09-27
Applicant: Intel Corporation
Inventor: Vinay Joshi , Om Ji Omer , Prashant Laddha , Shambhavi Sinha
IPC: G06N3/086
Abstract: Systems, apparatus, articles of manufacture, and methods for quality and capacity-aware grouped query attention are disclosed. To accomplish such groupings, example instructions cause a machine to create a plurality of groups of query heads present in a key value cache using an evolutionary algorithm based on at least two objectives, quantify an amount of error introduced by a first group of query heads in the plurality of groups of query heads, and retain the query heads of the first group of query heads in a non-grouped arrangement when the error meets an error threshold.