Apparatus and method for multicasting a cache line update using delayed refetch messages

    公开(公告)号:US11422809B2

    公开(公告)日:2022-08-23

    申请号:US15930887

    申请日:2020-05-13

    Abstract: An apparatus and method for processing efficient multicast operation. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions; a shared circuitry region to be shared by the plurality of cores; first cache management circuitry associated with the shared circuitry region to receive delayed prefetch messages from the cores, each delayed prefetch message comprising an address or portion thereof usable to identify a cache line; and a delayed prefetch manager comprising a plurality of entries, each entry associated with at least one of the delayed prefetch messages, the delayed prefetch manager to update one or more of the entries or generate a new entry in accordance with receipt of each new delayed prefetch message, wherein upon receiving a notification that a first cache line is being modified by a first core, the delayed prefetch manager is to transmit delayed prefetch response messages to one or more cores identified in a first entry associated with the first cache line.

    Systems for performing instructions to quickly convert and use tiles as 1D vectors

    公开(公告)号:US10990396B2

    公开(公告)日:2021-04-27

    申请号:US16145066

    申请日:2018-09-27

    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

    Delayed prefetch manager to multicast an updated cache line to processor cores requesting the updated data

    公开(公告)号:US10664273B2

    公开(公告)日:2020-05-26

    申请号:US15941958

    申请日:2018-03-30

    Abstract: An apparatus and method for processing efficient multicast operation. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions; a shared circuitry region to be shared by the plurality of cores; first cache management circuitry associated with the shared circuitry region to receive delayed prefetch messages from the cores, each delayed prefetch message comprising an address or portion thereof usable to identify a cache line; and a delayed prefetch manager comprising a plurality of entries, each entry associated with at least one of the delayed prefetch messages, the delayed prefetch manager to update one or more of the entries or generate a new entry in accordance with receipt of each new delayed prefetch message, wherein upon receiving a notification that a first cache line is being modified by a first core, the delayed prefetch manager is to transmit delayed prefetch response messages to one or more cores identified in a first entry associated with the first cache line.

    Accelerator for processing data
    46.
    发明授权

    公开(公告)号:US10509846B2

    公开(公告)日:2019-12-17

    申请号:US15840552

    申请日:2017-12-13

    Inventor: Chen Koren Dan Baum

    Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

    ACCELERATOR FOR PROCESSING DATA
    47.
    发明申请

    公开(公告)号:US20190042538A1

    公开(公告)日:2019-02-07

    申请号:US15840552

    申请日:2017-12-13

    Inventor: Chen Koren Dan Baum

    Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

Patent Agency Ranking