METHOD AND SYSTEM FOR KEY DISTRIBUTION AND EXCHANGE FOR DATA PROCESSING ACCELERATORS

    公开(公告)号:US20210176035A1

    公开(公告)日:2021-06-10

    申请号:US16315998

    申请日:2019-01-04

    摘要: According to one embodiment, a system receives, at a host system from a data processing (DP) accelerator, an accelerator identifier (ID) that uniquely identifies the DP accelerator), wherein the host system is coupled to the DP accelerator over a bus. The system transmits the accelerator ID to a predetermined trusted server over a network. The system receives a certificate from the predetermined trusted server over the network, the certificate certifying the DP accelerator. The system extracts a public root key (PK_RK) from the certificate for verification, the PK_RK corresponding to a private root key (SK_RK) associated with the DP accelerator. The system establishes a secure channel with the DP accelerator using the PK_RK based on the verification to exchange data securely between the host system and the DP accelerator.

    METHOD AND SYSTEM FOR MANAGING MEMORY OF DATA PROCESSING ACCELERATORS

    公开(公告)号:US20210173934A1

    公开(公告)日:2021-06-10

    申请号:US16315957

    申请日:2019-01-04

    IPC分类号: G06F21/57 G06F21/53 G06F9/50

    摘要: According to one embodiment, a system performs a secure boot using a security module such as a trusted platform module (TPM) of a host system. The system establishes a trusted execution environment (TEE) associated with one or more processors of the host system. The system launches a memory manager within the TEE, where the memory manager is configured to manage memory resources of a data processing (DP) accelerator coupled to the host system over a bus, including maintaining memory usage information of global memory of the DP accelerator. In response to a request received from an application running within the TEE for accessing a memory location of the DP accelerator, the system allows or denies the request based on the memory usage information.

    A DATA PROCESSING ACCELERATOR HAVING A LOCAL TIME UNIT TO GENERATE TIMESTAMPS

    公开(公告)号:US20210173428A1

    公开(公告)日:2021-06-10

    申请号:US16315924

    申请日:2019-01-04

    摘要: According to one embodiment, a DP accelerator includes one or more execution units (EUs) configured to perform data processing operations in response to an instruction received from a host system coupled over a bus. The DP accelerator includes a security unit (SU) configured to establish and maintain a secure channel with the host system to exchange commands and data associated with the data processing operations. The DP accelerator includes a time unit (TU) coupled to the security unit to provide timestamp services to the security unit, where the time unit includes a clock generator to generate clock signals locally without having to derive the clock signals from an external source. The TU includes a timestamp generator coupled to the clock generator to generate a timestamp based on the clock signals, and a power supply to provide power to the clock generator and the timestamp generator.

    METHOD AND SYSTEM FOR PROVIDING SECURE COMMUNICATIONS BETWEEN A HOST SYSTEM AND A DATA PROCESSING ACCELERATOR

    公开(公告)号:US20200218821A1

    公开(公告)日:2020-07-09

    申请号:US16751665

    申请日:2020-01-24

    IPC分类号: G06F21/62 H04L29/06 G06F13/16

    摘要: According to one embodiment, a system establishes a secure connection between a host system and a data processing (DP) accelerator over a bus, the secure connection including one or more data channels. The system transmits a first instruction from the host system to the DP accelerator over a command channel, the first instruction requesting the DP accelerator to perform a data preparation operation. The system receives a first request to read a first data from a first memory location of the host system from the DP accelerator over one data channel. In response to the request, the system transmits the first data to the DP accelerator over the data channel, where the first data is utilized for a computation or a configuration operation. The system transmits a second instruction from the host system to the DP accelerator over the command channel to perform the computation or the configuration operation.

    EFFICIENT COMMUNICATIONS AMONGST COMPUTING NODES FOR OPERATING AUTONOMOUS VEHICLES

    公开(公告)号:US20180183873A1

    公开(公告)日:2018-06-28

    申请号:US15115249

    申请日:2016-07-21

    IPC分类号: H04L29/08 G05D1/00

    摘要: A first request is received from a first processing node to produce data blocks of a first data stream representing a first communication topic. The first processing node is one of the processing nodes handling a specific function of operating an autonomous vehicle. Each of the processing nodes is executed within a specific node container having a specific operating environment. A global memory segment is allocated from a global memory to store the data blocks of the first data stream. A first local memory segment is mapped to the global memory segment. The first local memory segment is allocated from a first local memory of a first node container containing the first processing node. The first processing node directly accesses the data blocks of the first data stream stored in the global memory segment by accessing the mapped first local memory segment within the first node container.

    Cursor-based adaptive quantization for deep neural networks

    公开(公告)号:US12039427B2

    公开(公告)日:2024-07-16

    申请号:US16966834

    申请日:2019-09-24

    IPC分类号: G06N3/04 G06N3/08

    CPC分类号: G06N3/04 G06N3/08

    摘要: Deep neural networks (DNN) model quantization may be used to reduce storage and computation burdens by decreasing the bit width. Presented herein are novel cursor-based adaptive quantization embodiments. In embodiments, a multiple bits quantization mechanism is formulated as a differentiable architecture search (DAS) process with a continuous cursor that represents a possible quantization bit. In embodiments, the cursor-based DAS adaptively searches for a quantization bit for each layer. The DAS process may be accelerated via an alternative approximate optimization process, which is designed for mixed quantization scheme of a DNN model. In embodiments, a new loss function is used in the search process to simultaneously optimize accuracy and parameter size of the model. In a quantization step, the closest two integers to the cursor may be adopted as the bits to quantize the DNN together to reduce the quantization noise and avoid the local convergence problem.