DATA PROCESSING APPARATUS AND METHODS TENSOR TRANSFORM OPERATION

    公开(公告)号:US20240345903A1

    公开(公告)日:2024-10-17

    申请号:US18298833

    申请日:2023-04-11

    申请人: Arm Limited

    IPC分类号: G06F9/54

    CPC分类号: G06F9/544

    摘要: The present disclosure relates to a data processing apparatus for a processing resource to perform a transform operation on an input tensor for the processing resource, said input tensor being formed of a plurality of blocks, each block being a portion of said input tensor capable of being operated on independently of each other, said data processing apparatus comprising: communication circuitry to communicate with a control module and a shared storage of said processing resource; processing circuitry to perform said transform operation, said processing circuitry comprising sub-block processing circuitry and transformation circuitry; and a local storage to store transform operation output from said processing circuitry; wherein said communication circuitry is configured to: receive one or more transform parameters; read a first input sub-block from said shared storage, said first input sub-block being a portion of a first block of said input tensor corresponding to a processing unit of said processing circuitry; and write a first output sub-block to said shared storage, wherein said sub-block processing circuitry is configured to: divide said first block of said input tensor into one or more input sub-blocks capable of being operated on independently of each other based on said one or more transform parameters; and wherein said transformation circuitry is configured to: perform said transform operation on said first input sub-block based on said one or more transform parameters to generate said first output sub-block; and write said first output sub-block to said local storage.

    COMMAND PROCESSOR, NEURAL PROCESSING SYSTEM AND METHOD FOR TRANSMITTING DATA THEREOF

    公开(公告)号:US20240330085A1

    公开(公告)日:2024-10-03

    申请号:US18621895

    申请日:2024-03-29

    申请人: REBELLIONS INC.

    发明人: Hongyun Kim

    IPC分类号: G06F9/48 G06F9/30 G06F15/173

    摘要: An apparatus comprising neural processors, a command processor, and a shared memory. The command processor receives a context start signal indicating a start of a context of a neural network model from a host system. The command processor determines whether neural network model data is entirely or partially updated based on the context start signal. The command processor updates the neural network model data in the shared memory based on a determination on whether neural network model data is entirely or partially updated based on the context start signal. The command processor generates a plurality of task descriptors describing neural network model tasks based on the neural network model data, and distributes the plurality of task descriptors to the neural processors.

    ISOLATING COMMUNICATION STREAMS TO ACHIEVE HIGH PERFORMANCE MULTI-THREADED COMMUNICATION FOR GLOBAL ADDRESS SPACE PROGRAMS

    公开(公告)号:US20240330084A1

    公开(公告)日:2024-10-03

    申请号:US18525553

    申请日:2023-11-30

    申请人: Intel Corporation

    IPC分类号: G06F9/54 G06F9/52

    CPC分类号: G06F9/544 G06F9/52

    摘要: Systems, apparatuses and methods may provide for detecting an outbound communication and identifying a context of the outbound communication. Additionally, a completion status of the outbound communication may be tracked relative to the context. In one example, tracking the completion status includes incrementing a sent messages counter associated with the context in response to the outbound communication, detecting an acknowledgement of the outbound communication based on a network response to the outbound communication, incrementing a received acknowledgements counter associated with the context in response to the acknowledgement, comparing the sent messages counter to the received acknowledgements counter, and triggering a per-context memory ordering operation if the sent messages counter and the received acknowledgements counter have matching values.

    COMMAND PROCESSOR, NEURAL CORE SOC AND METHOD FOR OBTAINING CONTEXT DATA USING THE SAME

    公开(公告)号:US20240330041A1

    公开(公告)日:2024-10-03

    申请号:US18621936

    申请日:2024-03-29

    申请人: REBELLIONS INC.

    IPC分类号: G06F9/48 G06F9/54 G06F12/0831

    摘要: A command processor determines whether a command descriptor describing a current command is in a first format or in a second format, wherein the first format includes a source memory address pointing to a memory area in a shared memory having a binary code to be accessed according to direct memory access (DMA) scheme, and the second format includes one or more object indices, a respective one of the one or more object indices indicating an object in an object database. If the command descriptor describing the current command is in the second format, the command processor converts a format of the command descriptor to the first format, generates one or more task descriptors describing neural network model tasks based on the command descriptor in the first format, and distributes the one or more task descriptors to the one or more neural processors.

    WAVE THROTTLING BASED ON A PARAMETER BUFFER
    8.
    发明公开

    公开(公告)号:US20240320777A1

    公开(公告)日:2024-09-26

    申请号:US18434319

    申请日:2024-02-06

    IPC分类号: G06T1/20 G06F9/54

    CPC分类号: G06T1/20 G06F9/542 G06F9/544

    摘要: A graphics pipeline includes a first shader that generates first wave groups, a shader processor input (SPI) that launches the first wave groups for execution by shaders, and a scan converter that generates second waves for execution on the shaders based on results of processing the first wave groups the one or more shaders. The first wave groups are selectively throttled based on a comparison of in-flight first wave groups and second waves pending execution on the at least one second shader. A cache holds information that is written to the cache in response to the first wave groups finishing execution on the shaders. Information is read from the cache in response to read requests issued by the second waves. In some cases, the first wave groups are selectively throttled by comparing how many first wave groups are in-flight and how many read requests to the cache are pending.

    Tracking of multiple objects in cooperation with multiple neural networks

    公开(公告)号:US12086993B2

    公开(公告)日:2024-09-10

    申请号:US17696024

    申请日:2022-03-16

    申请人: Robert Bosch GmbH

    IPC分类号: G06T7/20 G06F9/54 G06N3/045

    摘要: A method for tracking and/or characterizing multiple objects in a sequence of images. The method includes: assigning a neural network to each object to be tracked; providing a memory shared by all neural networks, and designed to map an address vector of address components, via differentiable operations, onto one or multiple memory locations, and to read data from these memory locations or write data into these memory locations; supplying images from the sequence, and/or details of these images, to each neural network; during the processing of each image and/or image detail by one of the neural networks, generating an address vector from at least one processing product of this neural network; based on this address vector, writing at least one further processing product of the neural network into the shared memory, and/or reading out data from this shared memory and further processing the data by the neural network.