Resource-Efficient Attention in a Neural Network

    公开(公告)号:US20220318601A1

    公开(公告)日:2022-10-06

    申请号:US17221791

    申请日:2021-04-03

    Abstract: Computing technology is described herein that provides an attention mechanism, implemented by a neural network, that generates attention information based on head-specific query information and shared key and value (KV) information, without computing head-specific key information and head-specific value information, and without caching the head-specific key information and the head-specific value information in memory. This manner of operation allows the computing technology to make efficient use of processing and memory resources. In some implementations, the attention mechanism is part of decoder of an encoder-decoder system, or a standalone decoder system. In some implementations, the computing technology leverages the attention information to generate synthesized text based on input text.

Patent Agency Ranking