Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Kun Xu"

1.

发明授权
Sparse machine learning acceleration 有权

公开(公告)号：US12254398B2

公开(公告)日：2025-03-18

申请号：US17301271

申请日：2021-03-30

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Patricio Kaplan

IPC: G06N3/06 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: To reduce the storage size of weight tensors and speed up loading of weight tensors from system memory, a compression technique can be employed to remove zero values from a weight tensor before storing the weight tensor in system memory. A sparsity threshold can be enforced to achieve a compression ratio target by forcing small weight values to zero during training. When the weight tensor is loaded from system memory, a direct memory access (DMA) engine with an in-line decompression unit can decompress the weight tensor on-the-fly. By performing the decompression in the DMA engine, expansion of the weight values back to the original weight tensor size can be carried out in parallel while other neural network computations are being performed by the processing unit.

2.

发明授权
Direct memory access operation for neural network accelerator 有权

公开(公告)号：US11868872B1

公开(公告)日：2024-01-09

申请号：US16836493

申请日：2020-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Ilya Minkin , Ron Diamant , Kun Xu

IPC: G06N3/063 , G06F12/02 , G06F12/1081 , G06F13/16 , G06F13/28 , G11C15/04 , G06N3/045

CPC classification number: G06N3/063 , G06F12/0292 , G06F12/1081 , G06F13/1605 , G06F13/28 , G06N3/045 , G11C15/04 , G06F2212/152 , G06F2213/2802

Abstract: In one example, an apparatus comprises: a direct memory access (DMA) descriptor queue that stores DMA descriptors, each DMA descriptor including an indirect address; an address translation table that stores an address mapping between indirect addresses and physical addresses; and a DMA engine configured to: fetch a DMA descriptor from the DMA descriptor queue to the address translation table to translate a first indirect address of the DMA descriptor to a first physical address based on the address mapping, and perform a DMA operation based on executing the DMA descriptor to transfer data to or from the first physical address.

3.

发明授权
Address generation for page collision prevention 有权

公开(公告)号：US11789859B1

公开(公告)日：2023-10-17

申请号：US17449579

申请日：2021-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Suresh Hariharan

IPC: G06F12/02 , G06F12/0855 , G06N3/063 , G06F15/173 , G06F12/1081

CPC classification number: G06F12/0238 , G06F12/0857 , G06F12/1081 , G06F15/17375 , G06N3/063

Abstract: To generate sequential addresses when multiple integrated circuit (IC) devices are accessing the same memory, an address token is sent along the IC devices communicatively coupled in a ring topology. The address token is first transferred along the ring topology during a memory reservation phase in which each IC device can set a corresponding memory request bit to indicate that the IC device has data to write to the memory. The modified address token is then transferred along the ring topology again during a memory access phase. During the memory access phase, each IC device that has data to write can perform a memory write operation using a sequential address determined from the contents of the address token.

4.

发明授权
Data replication for accelerator 有权

公开(公告)号：US11500802B1

公开(公告)日：2022-11-15

申请号：US17301344

申请日：2021-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Patricio Kaplan , Henry Wang

IPC: G06F13/28 , G06F15/80 , G06F15/173

Abstract: A direct memory access (DMA) engine can be used to multicast data from system memory to a target memory for loading into an array. The DMA engine may include a controller that is configured to receive a data transfer request, and generate a set of write operations for the output interface. The set of write operations can include, for each of multiple partitions of the target memory, a write operation to write usable data from the multicast data to an address offset in the corresponding partition, and an additional write operation to write filler data from the multicast data to a null device address.

5.

发明授权
Powering-down or rebooting a device in a system fabric 有权

公开(公告)号：US11321179B1

公开(公告)日：2022-05-03

申请号：US17001145

申请日：2020-08-24

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Thomas A. Volpe , Ron Diamant , Mark Anthony Banse

IPC: G06F11/14 , G06F11/07 , G06F13/40

Abstract: A circuit at an interface between a device and an interconnect fabric is configured to track outstanding transactions associated with the device and ensure the completion of the outstanding transactions before rebooting or powering down the device. In some embodiments, the circuit is also configurable to provide appropriate responses when the device is powered down or is being rebooted such that other devices in the system can still operate even without knowing that the device is inactive and would not hang because no response is received from the device.

6.

发明申请
GRADIENT COMPRESSION FOR DISTRIBUTED TRAINING 有权

公开(公告)号：US20210295168A1

公开(公告)日：2021-09-23

申请号：US16827444

申请日：2020-03-23

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant

IPC: G06N3/08 , G06N5/04 , G06N20/00 , G06F17/18

Abstract: Techniques for exchanging compressed gradient data within a distributed system are disclosed. A set of gradients are computed at a first worker node of the distributed system using a neural network model and a set of weights associated with the neural network model. Each of the set of gradients having a value less than a threshold is clipped, resulting in non-clipped data elements and clipped data elements. A mapping indicating which of the set of gradients correspond to non-clipped data elements and which of the set of gradients correspond to clipped data elements is generated. Compressed data is generated based on the non-clipped data elements. The mapping and the compressed data are transmitted from the first worker node to a second worker node of the distributed system

7.

发明授权
Matrix transpose hardware acceleration 有权

公开(公告)号：US12141468B1

公开(公告)日：2024-11-12

申请号：US17875805

申请日：2022-07-28

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Paul Gilbert Meyer , Ron Diamant

IPC: G06F3/06 , G06N3/02 , G06N5/04

Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.

8.

发明授权
Tensorized direct memory access descriptors 有权

公开(公告)号：US11550736B1

公开(公告)日：2023-01-10

申请号：US17449581

申请日：2021-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam

IPC: G06F13/16 , G06N3/04 , G06F13/30

Abstract: To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.

9.

发明授权
Communication of data between software applications 有权

公开(公告)号：US10860397B1

公开(公告)日：2020-12-08

申请号：US16297467

申请日：2019-03-08

Applicant: Amazon Technologies, Inc.

Inventor： Brian Robert Silver , Kun Xu , Alwood Patrick Williams , Thomas A. Volpe

IPC: H04L7/00 , G06F9/54 , G06F9/50

Abstract: A computer system has a memory configured for sharing data between a first application and a second application. The memory includes a metadata region and a data region. The metadata region includes metadata that indicates how data being communicated between the first application and the second application is to be interpreted. The metadata also indicates whether the data can be found in the metadata itself or in a particular location in the data region. Each application can be assigned its own memory location containing a flag that can be set in order to indicate to the other application that the memory is ready to be accessed by the other application. The memory location can be implemented using a hardware register or in memory, either the same memory that includes the metadata and data regions or on a separate memory.

10.

发明授权
PCI-based bus system having peripheral device address translation based on base address register (BAR) index 有权

公开(公告)号：US10740265B1

公开(公告)日：2020-08-11

申请号：US16144910

申请日：2018-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant

IPC: G06F13/00 , G06F13/16 , G06F13/42

Abstract: Methods and apparatus for performing memory access are provided. In one example, an apparatus comprises a hardware processor, a memory, and a bus interface. The hardware processor is configured to: receive, from a host device and via the bus interface, a packet including a host input address, the host input address being defined based on a first host address space operated by the host device; determine, based on the host input address, a host relative address, the host relative address being relative to a first host base address of the first host address space; determine, based on the host relative address, a target device address of the memory; and access the memory at the target device address on behalf of the host device.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification