SYSTEMS, METHOD, AND APPARATUS FOR QUALITY AND CAPACITY-AWARE GROUPED QUERY ATTENTION

    公开(公告)号:US20250021819A1

    公开(公告)日:2025-01-16

    申请号:US18900006

    申请日:2024-09-27

    Abstract: Systems, apparatus, articles of manufacture, and methods for quality and capacity-aware grouped query attention are disclosed. To accomplish such groupings, example instructions cause a machine to create a plurality of groups of query heads present in a key value cache using an evolutionary algorithm based on at least two objectives, quantify an amount of error introduced by a first group of query heads in the plurality of groups of query heads, and retain the query heads of the first group of query heads in a non-grouped arrangement when the error meets an error threshold.

Patent Agency Ranking