On-chip atomic transaction engine
    11.
    发明授权

    公开(公告)号:US12229422B2

    公开(公告)日:2025-02-18

    申请号:US18523335

    申请日:2023-11-29

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

    Distributed shared memory using interconnected atomic transaction engines at respective memory interfaces

    公开(公告)号:US10732865B2

    公开(公告)日:2020-08-04

    申请号:US14863354

    申请日:2015-09-23

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

    Bit vector gather row count calculation and handling in direct memory access engine

    公开(公告)号:US10725947B2

    公开(公告)日:2020-07-28

    申请号:US15364149

    申请日:2016-11-29

    Abstract: Techniques are described herein for efficient movement of data from a source memory to a destination memory. In an embodiment, in response to a particular memory location being pushed into a first register within a first register space, the first set of electronic circuits accesses a descriptor stored at the particular memory location. The descriptor indicates a width of a column of tabular data, a number of rows of tabular data, and one or more tabular data manipulation operations to perform on the column of tabular data. The descriptor also indicates a source memory location for accessing the tabular data and a destination memory location for storing data manipulation result from performing the one or more data manipulation operations on the tabular data. Based on the descriptor, the first set of electronic circuits determines control information indicating that the one or more data manipulation operations are to be performed on the tabular data and transmits the control information, using a hardware data channel, to a second set of electronic circuits to perform the one or more operations. Based on the control information, the second set of electronic circuits retrieve the tabular data from source memory location and apply the one or more data manipulation operations to generate the data manipulation result. The second set of electronic circuits cause the data manipulation result to be stored at the destination memory location. Among the data manipulation operations that can be described are a bit vector based gather operation. The second set of electronic circuits determine a row count, which is generated efficiently by the second set of electronic circuits and is used to perform a gather operation and other types of data manipulation operations more efficiently.

    Multicast copy ring for database direct memory access filtering engine

    公开(公告)号:US10459859B2

    公开(公告)日:2019-10-29

    申请号:US15362673

    申请日:2016-11-28

    Abstract: Techniques provide for hardware accelerated data movement between main memory and an on-chip data movement system that comprises multiple core processors that operate on the tabular data. The tabular data is moved to or from the scratch pad memories of the core processors. While the data is in-flight, the data may be manipulated by data manipulation operations. The data movement system includes multiple data movement engines, each dedicated to moving and transforming tabular data from main memory data to a subset of the core processors. Each data movement engine is coupled to an internal memory that stores data (e.g. a bit vector) that dictates how data manipulation operations are performed on tabular data moved from a main memory to the memories of a core processor, or to and from other memories. The internal memory of each data movement engine is private to the data movement engine. Tabular data is efficiently copied between internal memories of the data movement system via a copy ring that is coupled to the internal memories of the data movement system and/or is coupled to a data movement engine. Also, a data movement engine internally broadcasts data to other data movement engines, which then transfer the data to respective core processors. Partitioning may also be performed by the hardware of the data movement system. Techniques are used to partition data “in flight”. The data movement system also generates a column of row identifiers (RIDs). A row identifier is a number treated as identifying a row or element's position within a column. Row identifiers each identifying a row in column are also generated.

    Database tuple-encoding-aware data partitioning in a direct memory access engine

    公开(公告)号:US10061832B2

    公开(公告)日:2018-08-28

    申请号:US15362688

    申请日:2016-11-28

    Abstract: Techniques provide for hardware accelerated data movement between main memory and an on-chip data movement system that comprises multiple core processors that operate on the tabular data. The tabular data is moved to or from the scratch pad memories of the core processors. While the data is in-flight, the data may be manipulated by data manipulation operations. The data movement system includes multiple data movement engines, each dedicated to moving and transforming tabular data from main memory data to a subset of the core processors. Each data movement engine is coupled to an internal memory that stores data (e.g. a bit vector) that dictates how data manipulation operations are performed on tabular data moved from a main memory to the memories of a core processor, or to and from other memories. The internal memory of each data movement engine is private to the data movement engine. Tabular data is efficiently copied between internal memories of the data movement system via a copy ring that is coupled to the internal memories of the data movement system and/or is coupled to a data movement engine. Also, a data movement engine internally broadcasts data to other data movement engines, which then transfer the data to respective core processors. Partitioning may also be performed by the hardware of the data movement system. Techniques are used to partition data “in flight”. The data movement system also generates a column of row identifiers (RIDs). A row identifier is a number treated as identifying a row or element's position within a column. Row identifiers each identifying a row in column are also generated.

    On-chip Atomic Transaction Engine
    17.
    发明申请

    公开(公告)号:US20170083257A1

    公开(公告)日:2017-03-23

    申请号:US14863354

    申请日:2015-09-23

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

    On-Chip Atomic Transaction Engine
    18.
    发明申请

    公开(公告)号:US20250147681A1

    公开(公告)日:2025-05-08

    申请号:US19017209

    申请日:2025-01-10

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

    On-chip Atomic Transaction Engine
    19.
    发明申请

    公开(公告)号:US20220276794A1

    公开(公告)日:2022-09-01

    申请号:US17663280

    申请日:2022-05-13

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

    On-chip atomic transaction engine
    20.
    发明授权

    公开(公告)号:US11334262B2

    公开(公告)日:2022-05-17

    申请号:US16945521

    申请日:2020-07-31

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

Patent Agency Ranking