On-Chip Atomic Transaction Engine
    1.
    发明公开

    公开(公告)号:US20240111441A1

    公开(公告)日:2024-04-04

    申请号:US18523335

    申请日:2023-11-29

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

    PROCESSOR CORE TO COPROCESSOR INTERFACE WITH FIFO SEMANTICS

    公开(公告)号:US20190324939A1

    公开(公告)日:2019-10-24

    申请号:US16457793

    申请日:2019-06-28

    Abstract: Techniques are provided for exchanging dedicated hardware signals to manage a first-in first-out (FIFO). In an embodiment, a first processor initiates content transfer into the FIFO. The first processor activates a first hardware signal that is reserved for indicating that content resides within the FIFO. A second processor activates a second hardware signal that is reserved for indicating that content is accepted. The second hardware signal causes the first hardware signal to be deactivated. This exchange of hardware signals demarcates a FIFO transaction, which is mediated by interface circuitry of the FIFO.

    On-chip atomic transaction engine

    公开(公告)号:US11868628B2

    公开(公告)日:2024-01-09

    申请号:US17663280

    申请日:2022-05-13

    Abstract: A hardware-assisted Distributed Memory System may include software configurable shared memory regions in the local memory of each of multiple processor cores. Accesses to these shared memory regions may be made through a network of on-chip atomic transaction engine (ATE) instances, one per core, over a private interconnect matrix that connects them together. For example, each ATE instance may issue Remote Procedure Calls (RPCs), with or without responses, to an ATE instance associated with a remote processor core in order to perform operations that target memory locations controlled by the remote processor core. Each ATE instance may process RPCs (atomically) that are received from other ATE instances or that are generated locally. For some operation types, an ATE instance may execute the operations identified in the RPCs itself using dedicated hardware. For other operation types, the ATE instance may interrupt its local processor core to perform the operations.

    Run length encoding aware direct memory access filtering engine for scratchpad enabled multicore processors

    公开(公告)号:US10055358B2

    公开(公告)日:2018-08-21

    申请号:US15074248

    申请日:2016-03-18

    Abstract: Techniques are described herein for efficient movement of data from a source memory to a destination memory. In an embodiment, in response to a particular memory location being pushed into a first register within a first register space, the first set of electronic circuits accesses a descriptor stored at the particular memory location. The descriptor indicates a width of a column of tabular data, a number of rows of tabular data, and one or more tabular data manipulation operations to perform on the column of tabular data. The descriptor also indicates a source memory location for accessing the tabular data and a destination memory location for storing data manipulation result from performing the one or more data manipulation operations on the tabular data. Based on the descriptor, the first set of electronic circuits determines control information indicating that the one or more data manipulation operations are to be performed on the tabular data and transmits the control information, using a hardware data channel, to a second set of electronic circuits to perform the one or more operations. Based on the control information, the second set of electronic circuits retrieve the tabular data from source memory location and apply the one or more data manipulation operations to generate the data manipulation result. The second set of electronic circuits cause the data manipulation result to be stored at the destination memory location.

    Dynamically configurable high performance database-aware hash engine

    公开(公告)号:US10783102B2

    公开(公告)日:2020-09-22

    申请号:US15290357

    申请日:2016-10-11

    Abstract: Techniques are provided for configuring and operating hardware to sustain real-time hashing throughput. In an embodiment, during a first set of clock cycles, a particular amount of data items of a first data column are transferred into multiple hash lanes. During a second set of clock cycles, the same particular amount of data items of a second data column are transferred into the hash lanes. The transferred data items of the first and second data columns are then processed to calculate a set of hash values. When combined with techniques such as pipelining and horizontal scaling, the loading, hashing, and other processing occur in real time at the full speed of the underlying data path. For example, hashing throughput may sustainably equal or exceed the throughput of main memory.

    Row identification number generation in database direct memory access engine

    公开(公告)号:US10176114B2

    公开(公告)日:2019-01-08

    申请号:US15362693

    申请日:2016-11-28

    Abstract: Techniques provide for hardware accelerated data movement between main memory and an on-chip data movement system that comprises multiple core processors that operate on the tabular data. The tabular data is moved to or from the scratch pad memories of the core processors. While the data is in-flight, the data may be manipulated by data manipulation operations. The data movement system includes multiple data movement engines, each dedicated to moving and transforming tabular data from main memory data to a subset of the core processors. Each data movement engine is coupled to an internal memory that stores data (e.g. a bit vector) that dictates how data manipulation operations are performed on tabular data moved from a main memory to the memories of a core processor, or to and from other memories. The internal memory of each data movement engine is private to the data movement engine. Tabular data is efficiently copied between internal memories of the data movement system via a copy ring that is coupled to the internal memories of the data movement system and/or is coupled to a data movement engine. Also, a data movement engine internally broadcasts data to other data movement engines, which then transfer the data to respective core processors. Partitioning may also be performed by the hardware of the data movement system. Techniques are used to partition data “in flight”. The data movement system also generates a column of row identifiers (RIDs). A row identifier is a number treated as identifying a row or element's position within a column. Row identifiers each identifying a row in column are also generated.

    Tuple encoding aware direct memory access engine for scratchpad enabled multicore processors

    公开(公告)号:US10061714B2

    公开(公告)日:2018-08-28

    申请号:US15073905

    申请日:2016-03-18

    Abstract: Techniques are described herein for efficient movement of data from a source memory to a destination memory. In an embodiment, in response to a particular memory location being pushed into a first register within a first register space, the first set of electronic circuits accesses a descriptor stored at the particular memory location. The descriptor indicates a width of a column of tabular data, a number of rows of tabular data, and one or more tabular data manipulation operations to perform on the column of tabular data. The descriptor also indicates a source memory location for accessing the tabular data and a destination memory location for storing data manipulation result from performing the one or more data manipulation operations on the tabular data. Based on the descriptor, the first set of electronic circuits determines control information indicating that the one or more data manipulation operations are to be performed on the tabular data and transmits the control information, using a hardware data channel, to a second set of electronic circuits to perform the one or more operations. Based on the control information, the second set of electronic circuits retrieve the tabular data from source memory location and apply the one or more data manipulation operations to generate the data manipulation result. The second set of electronic circuits cause the data manipulation result to be stored at the destination memory location.

    Dynamically Configurable High Performance Database-Aware Hash Engine

    公开(公告)号:US20180101530A1

    公开(公告)日:2018-04-12

    申请号:US15290357

    申请日:2016-10-11

    CPC classification number: G06F13/28 G06F11/1004

    Abstract: Techniques are provided for configuring and operating hardware to sustain real-time hashing throughput. In an embodiment, during a first set of clock cycles, a particular amount of data items of a first data column are transferred into multiple hash lanes. During a second set of clock cycles, the same particular amount of data items of a second data column are transferred into the hash lanes. The transferred data items of the first and second data columns are then processed to calculate a set of hash values. When combined with techniques such as pipelining and horizontal scaling, the loading, hashing, and other processing occur in real time at the full speed of the underlying data path. For example, hashing throughput may sustainably equal or exceed the throughput of main memory.

Patent Agency Ranking