Strong ordered transaction for DMA transfers

    公开(公告)号:US12204757B1

    公开(公告)日:2025-01-21

    申请号:US18067514

    申请日:2022-12-16

    Abstract: A technique for processing strong ordered transactions in a direct memory access engine may include retrieving a memory descriptor to perform a strong ordered transaction, and delaying the strong ordered transaction until pending write transactions associated with previous memory descriptors retrieved prior to the memory descriptor are complete. Subsequent transactions associated with memory descriptors following the memory descriptor are allowed to be issued while waiting for the pending write transactions to complete. Upon completion of the pending write transactions, the strong ordered transaction is performed.

    Neural network processing based on subgraph recognition

    公开(公告)号:US12093801B1

    公开(公告)日:2024-09-17

    申请号:US18142952

    申请日:2023-05-03

    CPC classification number: G06N3/04 G06F9/30003 G06F9/4881 G06F16/9024

    Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.

    Processing for multiple input data sets in a multi-layer neural network

    公开(公告)号:US12067492B2

    公开(公告)日:2024-08-20

    申请号:US18144129

    申请日:2023-05-05

    CPC classification number: G06N3/082 G06F3/0604 G06F3/0644 G06F3/0673 G06N3/045

    Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

    Low latency memory notification
    4.
    发明授权

    公开(公告)号:US12056072B1

    公开(公告)日:2024-08-06

    申请号:US17457603

    申请日:2021-12-03

    Abstract: Techniques to reduce the latency of data transfer notifications in a computing system are disclosed. The techniques can include receiving, at a memory, a first access request of a set of access requests associated with a data transfer. The first access request has a token and an access count indicating the number of access requests in the set of access requests. A counter is initiated to count the number of received access requests having the token. When additional access requests belonging to the set of access requests are received, the counter is incremented for each of the additional access requests being received. A notification is transmitted to an integrated circuit component in response to receiving the last access request of the set of access requests having the token to notify the integrated circuit component that the memory is ready for access.

    Reducing computations for data including padding

    公开(公告)号:US11960566B1

    公开(公告)日:2024-04-16

    申请号:US17229742

    申请日:2021-04-13

    CPC classification number: G06F17/16 G06N3/08

    Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.

    Input batching with serial dynamic memory access

    公开(公告)号:US11875247B1

    公开(公告)日:2024-01-16

    申请号:US16905769

    申请日:2020-06-18

    CPC classification number: G06N3/063 G06N3/08

    Abstract: An acceleration engine with multiple accelerators may share a common set of data that is used by each accelerator to perform computations on input data. The set of shared data can be loaded into the acceleration engine from an external memory. Instead of accessing the external memory multiple times to load the set of shared data into each accelerator, the external memory can be accessed once using direct memory access to load the set of shared data into the first accelerator. The set of shared data can then be serially loaded from one accelerator to the next accelerator in the acceleration engine using direct memory access. To achieve data parallelism and reduce computation time, a runtime driver may split the input data into data batches, and each accelerator can perform computations on a different batch of input data with the common set of shared data.

    Executing sublayers of a fully-connected layer

    公开(公告)号:US11868878B1

    公开(公告)日:2024-01-09

    申请号:US15934523

    申请日:2018-03-23

    CPC classification number: G06N3/08 G06F18/2413 G06F18/2431 G06N5/046

    Abstract: Disclosed herein are techniques for implementing a large fully-connected layer in an artificial neural network. The large fully-connected layer is grouped into multiple fully-connected subnetworks. Each fully-connected subnetwork is configured to classify an object into an unknown class or a class in a subset of target classes. If the object is classified as the unknown class by a fully-connected subnetwork, a next fully-connected subnetwork may be used to further classify the object. In some embodiments, the fully-connected layer is grouped based on a ranking of target classes.

    Processing for multiple input data sets

    公开(公告)号:US11797853B2

    公开(公告)日:2023-10-24

    申请号:US17951084

    申请日:2022-09-22

    CPC classification number: G06N3/082 G06F3/0604 G06F3/0644 G06F3/0673 G06N3/045

    Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

    Error avoidance in memory device
    10.
    发明授权

    公开(公告)号:US11704211B1

    公开(公告)日:2023-07-18

    申请号:US17643292

    申请日:2021-12-08

    CPC classification number: G06F11/2094 G06F2201/82

    Abstract: Techniques for avoiding uncorrectable errors in a memory device can include detecting a correctable error pattern of a memory page of a memory device, and determining that the correctable error pattern of the memory page satisfies a page migration condition. Upon satisfying the page migration condition, write accesses to the memory page are prevented from reaching a memory controller of the memory device. The contents of the memory page are then migrated to a reserved page, and a mapping table is updated to replace accesses to the memory page with accesses to the reserved page.

Patent Agency Ranking