-
公开(公告)号:US20210281408A1
公开(公告)日:2021-09-09
申请号:US16315867
申请日:2019-01-04
发明人: Yong LIU , Yueqiang CHENG , Jian OUYANG , Tao WEI
摘要: According to one embodiment, a DP accelerator includes one or more execution units (EUs) configured to perform data processing operations in response to an instruction received from a host system coupled over a bus. The DP accelerator includes a time unit (TU) coupled to the security unit to provide timestamp services. The DP accelerator includes a security unit (SU) configured to establish and maintain a secure channel with the host system to exchange commands and data associated with the data processing operations, where the security unit includes a secure storage area to store a private root key associated with the DP accelerator, where the private root key is utilized for authentication. The SU includes a random number generator to generate a random number, and a cryptographic engine to perform cryptographic operations on data exchanged with the host system over the bus using a session key derived based on the random number.
-
公开(公告)号:US20210176035A1
公开(公告)日:2021-06-10
申请号:US16315998
申请日:2019-01-04
发明人: Yueqiang CHENG , Yong LIU , Tao WEI , Jian OUYANG
摘要: According to one embodiment, a system receives, at a host system from a data processing (DP) accelerator, an accelerator identifier (ID) that uniquely identifies the DP accelerator), wherein the host system is coupled to the DP accelerator over a bus. The system transmits the accelerator ID to a predetermined trusted server over a network. The system receives a certificate from the predetermined trusted server over the network, the certificate certifying the DP accelerator. The system extracts a public root key (PK_RK) from the certificate for verification, the PK_RK corresponding to a private root key (SK_RK) associated with the DP accelerator. The system establishes a secure channel with the DP accelerator using the PK_RK based on the verification to exchange data securely between the host system and the DP accelerator.
-
公开(公告)号:US20210173934A1
公开(公告)日:2021-06-10
申请号:US16315957
申请日:2019-01-04
发明人: Yong LIU , Yueqiang CHENG , Jian OUYANG , Tao WEI
摘要: According to one embodiment, a system performs a secure boot using a security module such as a trusted platform module (TPM) of a host system. The system establishes a trusted execution environment (TEE) associated with one or more processors of the host system. The system launches a memory manager within the TEE, where the memory manager is configured to manage memory resources of a data processing (DP) accelerator coupled to the host system over a bus, including maintaining memory usage information of global memory of the DP accelerator. In response to a request received from an application running within the TEE for accessing a memory location of the DP accelerator, the system allows or denies the request based on the memory usage information.
-
公开(公告)号:US20210173428A1
公开(公告)日:2021-06-10
申请号:US16315924
申请日:2019-01-04
发明人: Yong LIU , Yueqiang CHENG , Jian OUYANG , Tao WEI
摘要: According to one embodiment, a DP accelerator includes one or more execution units (EUs) configured to perform data processing operations in response to an instruction received from a host system coupled over a bus. The DP accelerator includes a security unit (SU) configured to establish and maintain a secure channel with the host system to exchange commands and data associated with the data processing operations. The DP accelerator includes a time unit (TU) coupled to the security unit to provide timestamp services to the security unit, where the time unit includes a clock generator to generate clock signals locally without having to derive the clock signals from an external source. The TU includes a timestamp generator coupled to the clock generator to generate a timestamp based on the clock signals, and a power supply to provide power to the clock generator and the timestamp generator.
-
5.
公开(公告)号:US20200218821A1
公开(公告)日:2020-07-09
申请号:US16751665
申请日:2020-01-24
发明人: Yong LIU , Yueqiang CHENG , Jian OUYANG , Tao WEI
摘要: According to one embodiment, a system establishes a secure connection between a host system and a data processing (DP) accelerator over a bus, the secure connection including one or more data channels. The system transmits a first instruction from the host system to the DP accelerator over a command channel, the first instruction requesting the DP accelerator to perform a data preparation operation. The system receives a first request to read a first data from a first memory location of the host system from the DP accelerator over one data channel. In response to the request, the system transmits the first data to the DP accelerator over the data channel, where the first data is utilized for a computation or a configuration operation. The system transmits a second instruction from the host system to the DP accelerator over the command channel to perform the computation or the configuration operation.
-
公开(公告)号:US20210173666A1
公开(公告)日:2021-06-10
申请号:US16315890
申请日:2019-01-04
发明人: Yueqiang CHENG , Yong LIU , Tao WEI , Jian OUYANG
IPC分类号: G06F9/4401 , G06F9/38 , G06F9/30 , G06F9/54
摘要: According to one embodiment, a data processing system performs a secure boot using a security module (e.g., a trusted platform module (TPM)) of a host system. The system verifies that an operating system (OS) and one or more drivers including an accelerator driver associated with a data processing (DP) accelerator is provided by a trusted source. The system launches the accelerator driver within the OS. The system generates a trusted execution environment (TEE) associated with one or more processors of the host system. The system launches an application and a runtime library within the TEE, where the application communicates with the DP accelerator via the runtime library and the accelerator driver.
-
公开(公告)号:US20210318878A1
公开(公告)日:2021-10-14
申请号:US16607087
申请日:2019-10-12
发明人: Zhibiao ZHAO , Jian OUYANG , Hefei ZHU , Qingshu CHEN , Wei QI
摘要: According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.
-
公开(公告)号:US20210174174A1
公开(公告)日:2021-06-10
申请号:US16622789
申请日:2019-11-15
发明人: Hefei ZHU , Jian OUYANG , Zhibiao ZHAO , Xiaozhang GONG , Qingshu CHEN
摘要: A data processing system includes a central processing unit (CPU) and accelerator cards coupled to the CPU over a bus, each of the accelerator cards having a plurality of data processing (DP) accelerators to receive DP tasks from the CPU and to perform the received DP tasks. At least two of the accelerator cards are coupled to each other via an inter-card connection, and at least two of the DP accelerators are coupled to each other via an inter-chip connection. Each of the inter-card connection and the inter-chip connection is capable of being dynamically activated or deactivated, such that in response to a request received from the CPU, any one of the accelerator cards or any one of the DP accelerators within any one of the accelerator cards can be enabled or disabled to process any one of the DP tasks received from the CPU.
-
公开(公告)号:US20210072996A1
公开(公告)日:2021-03-11
申请号:US16729989
申请日:2019-12-30
发明人: Qingshu CHEN , Zhibiao ZHAO , Hefei ZHU , Xiaozhang GONG , Yong WANG , Jian OUYANG
摘要: Methods, apparatuses, devices, and storage media for performing a processing task are provided. A portion of portions of the processing task can include a group of operations that are to be performed at a processing unit of processing units. The group of operations can include operations of a first type and operations of a second type. In the method, a first queue for performing the operations of the first type and a second queue for performing the operations of the second type can be built, respectively. Based on a definition of the processing task, a dependency relationship between a group of operations to be performed at the processing unit and a group of operations to be performed at other processing units in the plurality of processing units can be obtained. Operations in the first queue and operations in the second queue can be performed respectively based on the dependency relationship.
-
公开(公告)号:US20210173917A1
公开(公告)日:2021-06-10
申请号:US16316015
申请日:2019-01-04
申请人: Baidu USA LLC
发明人: Yueqiang CHENG , Yong LIU , Tao WEI , Jian OUYANG
摘要: According to one embodiment, a system receives, at a runtime library executed within a trusted execution environment (TEE) of a host system, a request from an application to invoke a predetermined function to perform a predefined operation. In response to the request, the system identifies a kernel object associated with the predetermined function. The system verifies an executable image of the kernel object using a public key corresponding to a private key that was used to sign the executable image of the kernel object. In response to successfully the system verifies the executable image of the kernel object, transmitting the verified executable image of the kernel object to a data processing (DP) accelerator over a bus to be executed by the DP accelerator to perform the predefined operation.
-
-
-
-
-
-
-
-
-