METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION
    1.
    发明申请
    METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION 有权
    用于执行冲突检测的方法和装置

    公开(公告)号:US20160179528A1

    公开(公告)日:2016-06-23

    申请号:US14581607

    申请日:2014-12-23

    Abstract: An apparatus and method are described for performing conflict detection operations. For example, one embodiment of a processor comprises: a first source vector register to store a first set of data elements; a second source vector register to store a second set of data elements; conflict detection logic to perform a specified comparison operation comparing each of the first set of data elements with specified data elements from the second set and generating a set of comparison results, the comparison operation to be selected from a group consisting of a greater than comparison, a less than comparison, a greater than or equal to comparison, a less than or equal to comparison, and a not equal to comparison.

    Abstract translation: 描述了用于执行冲突检测操作的装置和方法。 例如,处理器的一个实施例包括:第一源向量寄存器,用于存储第一组数据元素; 第二源向量寄存器,用于存储第二组数据元素; 冲突检测逻辑,用于执行指定的比较操作,将第一组数据元素与来自第二组的指定数据元素进行比较,并生成一组比较结果,从大于比较的组中选择的比较操作, 小于比较,大于或等于比较,小于或等于比较,不等于比较。

    APPARATUS AND METHOD FOR PROCESSING EFFICIENT MULTICAST OPERATION

    公开(公告)号:US20200272466A1

    公开(公告)日:2020-08-27

    申请号:US15930887

    申请日:2020-05-13

    Abstract: An apparatus and method for processing efficient multicast operation. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions; a shared circuitry region to be shared by the plurality of cores; first cache management circuitry associated with the shared circuitry region to receive delayed prefetch messages from the cores, each delayed prefetch message comprising an address or portion thereof usable to identify a cache line; and a delayed prefetch manager comprising a plurality of entries, each entry associated with at least one of the delayed prefetch messages, the delayed prefetch manager to update one or more of the entries or generate a new entry in accordance with receipt of each new delayed prefetch message, wherein upon receiving a notification that a first cache line is being modified by a first core, the delayed prefetch manager is to transmit delayed prefetch response messages to one or more cores identified in a first entry associated with the first cache line.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR LOADING DATA AND PADDING INTO A TILE OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20220100513A1

    公开(公告)日:2022-03-31

    申请号:US17134085

    申请日:2020-12-24

    Abstract: Systems, methods, and apparatuses relating to one or more instructions that load data into a tile register and pad a row (or column) with a pad value from a padding circuit are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a tile register that represents a two-dimensional matrix coupled to the matrix operations accelerator circuit, and a coupling to a memory, a padding circuit coupled to the tile register, and a hardware processor core including a decoder, of the hardware processor core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction, the single instruction comprising a first field that identifies the tile register, a second field that identifies data elements in the memory, and an opcode, the opcode to indicate an execution circuit of the hardware processor core is to cause a load of the data elements from the memory into the tile register and the padding circuit to pad a proper subset of elements of the tile register with a same value, and the execution circuit of the hardware processor core to execute the decoded single instruction according to the opcode.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

    公开(公告)号:US20220100502A1

    公开(公告)日:2022-03-31

    申请号:US17134008

    申请日:2020-12-24

    Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a M by N destination matrix having single-precision elements, an M by K first source matrix, and a K by N second source matrix, the source matrices having elements that each comprise a pair of half-precision floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the half-precision floating-point values to single-precision values, a multiplication of converted single-precision values from first values of the pairs together to generate a first result, a multiplication of converted single-precision values from second values of the pairs together to generate a second result, and an accumulation of the first result and the second result with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.

    APPARATUS AND METHOD FOR PROCESSING EFFICIENT MULTICAST OPERATION

    公开(公告)号:US20190303152A1

    公开(公告)日:2019-10-03

    申请号:US15941958

    申请日:2018-03-30

    Abstract: An apparatus and method for processing efficient multicast operation. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions; a shared circuitry region to be shared by the plurality of cores; first cache management circuitry associated with the shared circuitry region to receive delayed prefetch messages from the cores, each delayed prefetch message comprising an address or portion thereof usable to identify a cache line; and a delayed prefetch manager comprising a plurality of entries, each entry associated with at least one of the delayed prefetch messages, the delayed prefetch manager to update one or more of the entries or generate a new entry in accordance with receipt of each new delayed prefetch message, wherein upon receiving a notification that a first cache line is being modified by a first core, the delayed prefetch manager is to transmit delayed prefetch response messages to one or more cores identified in a first entry associated with the first cache line.

Patent Agency Ranking