-
公开(公告)号:US12079722B2
公开(公告)日:2024-09-03
申请号:US18162871
申请日:2023-02-01
Applicant: TuSimple, Inc. , Beijing Tusen Zhitu Technology Co., Ltd.
Inventor: Yuwei Hu , Jiangming Jin , Lei Su , Dinghua Li
CPC classification number: G06N3/08 , G06F12/0207 , G06F17/153 , G06F17/16 , G06N3/045 , G06N3/063 , G06N20/10 , H03M7/30
Abstract: The embodiments of this application provide a method and device for optimizing neural network. The method includes: binarizing and bit-packing input data of a convolution layer along a channel direction, and obtaining compressed input data; binarizing and bit-packing respectively each convolution kernel of the convolution layer along the channel direction, and obtaining each corresponding compressed convolution kernel; dividing the compressed input data sequentially in a convolutional computation order into blocks of the compressed input data with the same size of each compressed convolution kernel, wherein the data input to one time convolutional computation form a data block; and, taking a convolutional computation on each block of the compressed input data and each compressed convolution kernel sequentially, obtaining each convolutional result data, and obtaining multiple output data of the convolution layer according to each convolutional result data.
-
公开(公告)号:US12007890B2
公开(公告)日:2024-06-11
申请号:US18139785
申请日:2023-04-26
Applicant: The Trustees of Princeton University
Inventor: Naveen Verma , Hossein Valavi , Hongyang Jia
IPC: G06F12/06 , G06F12/02 , G06F17/16 , G06N3/065 , G11C11/4074 , G11C11/4094 , G11C11/4097 , G11C11/419 , H03K19/20
CPC classification number: G06F12/0607 , G06F12/0207 , G06F17/16 , G06N3/065 , G11C11/4074 , G11C11/4094 , G11C11/4097 , G11C11/419 , H03K19/20 , G06F2212/454
Abstract: Various embodiments comprise systems, methods, architectures, mechanisms or apparatus for providing programmable or pre-programmed in-memory computing operations.
-
公开(公告)号:US11940907B2
公开(公告)日:2024-03-26
申请号:US17359217
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Martin-Thomas Grymel , David Bernard , Niall Hanrahan , Martin Power , Kevin Brady , Gary Baugh , Cormac Brick
CPC classification number: G06F12/0207 , G06F12/0292 , G06N3/10
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
-
公开(公告)号:US11922306B2
公开(公告)日:2024-03-05
申请号:US17135521
申请日:2020-12-28
Applicant: Meta Platforms, Inc.
Inventor: Harshit Khaitan , Ganesh Venkatesh , Simon James Hollis
CPC classification number: G06N3/08 , G06F12/0207 , G06F12/0215 , G06F12/023 , G06F2212/251
Abstract: A machine-learning accelerator system, comprising: a plurality of controllers each configured to traverse a feature map with n-dimensions according to instructions that specify, for each of the n-dimensions, a respective traversal size, wherein each controller comprises: a counter stack comprising counters each associated with a respective dimension of the n-dimensions of the feature map, wherein each counter is configured to increment a respective count from a respective initial value to the respective traversal size associated with the respective dimension associated with that counter; a plurality of address generators each configured to use the respective counts of the counters to generate at least one memory address at which a portion of the feature map is stored; and a dependency controller computing module configured to (1) track conditional statuses for incrementing the counters and (2) allow or disallow each of the counters to increment based on the conditional statuses.
-
公开(公告)号:US11907112B2
公开(公告)日:2024-02-20
申请号:US17765775
申请日:2020-11-24
Inventor: Haoqian He , Weina Lu , Chao He
IPC: G06F12/02
CPC classification number: G06F12/0207
Abstract: Embodiments of the present disclosure disclose a method and apparatus for calculating tensor data based on a computer, a medium, and a device. The method includes: determining, from a second tensor, a dimension different from a dimension of a first tensor based on dimensions of the first tensor and dimensions of the second tensor; updating stride in the different dimension to a predetermined value; reading a to-be-operated data block of the second tensor from a buffer module based on updated stride with the predetermined value in each dimension of the second tensor, where the to-be-operated data block is a data block for which padding processing is performed; and performing binary operation on the first tensor based on the to-be-operated data block of the second tensor. According to the present disclosure, broadcasting may be conveniently achieved without difficulty of hardware design being increased.
-
公开(公告)号:US11900486B2
公开(公告)日:2024-02-13
申请号:US18115764
申请日:2023-02-28
Applicant: Kenneth Page-Romer , Gregory George Page-Romer
Inventor: Kenneth Page-Romer , Gregory George Page-Romer
IPC: G06F16/28 , G06Q50/00 , G06Q10/107 , G06F16/35 , G06F16/435 , G06Q50/28 , G06Q10/04 , G06F12/02
CPC classification number: G06Q50/01 , G06F16/285 , G06F16/355 , G06F16/435 , G06Q10/107 , G06F12/0207 , G06Q10/04 , G06Q50/28
Abstract: Improved technological solutions are introduced for providing a secure and effective and enhanced clustering/grouping solution that is useful, for example, in an online dating forum as well as any number of other industries. The ability to attend live events in person or remotely is coupled with presence location and automatic verification of user devices and identities. This allows secured communication between participants without having to disclose actual contact information of the participants or their device addresses. An improved algorithm that groups members/items effectively based on a variety of matching criteria, with lowered possibilities of errors and more efficient use of processing power, is now introduced.
-
公开(公告)号:US20240028332A1
公开(公告)日:2024-01-25
申请号:US18375874
申请日:2023-10-02
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna T. Malladi , Hongzhong Zheng
CPC classification number: G06F9/3001 , G06F12/0207 , G06F17/16 , G06F7/00 , G06F7/4876 , G06F9/3004 , G06F2212/1024
Abstract: According to some example embodiments of the present disclosure, in a method for a memory lookup mechanism in a high-bandwidth memory system, the method includes: using a memory die to conduct a multiplication operation using a lookup table (LUT) methodology by accessing a LUT, which includes floating point operation results, stored on the memory die; sending, by the memory die, a result of the multiplication operation to a logic die including a processor and a buffer; and conducting, by the logic die, a matrix multiplication operation using computation units.
-
公开(公告)号:US11847465B2
公开(公告)日:2023-12-19
申请号:US17533788
申请日:2021-11-23
Inventor: Chun-Gi Lyuh , Hyun Mi Kim , Young-Su Kwon , Jin Ho Han
CPC classification number: G06F9/3895 , G06F9/355 , G06F12/0207
Abstract: Disclosed is a parallel processor. The parallel processor includes a processing element array including a plurality of processing elements arranged in rows and columns, a row memory group including row memories corresponding to rows of the processing elements, a column memory group including column memories corresponding to columns of the processing elements, and a controller to generate a first address and a second address, to send the first address to the row memory group, and to send the second address to the column memory group. The controller supports convolution operations having mutually different forms, by changing a scheme of generating the first address.
-
公开(公告)号:US20230401149A1
公开(公告)日:2023-12-14
申请号:US18457672
申请日:2023-08-29
Applicant: KIOXIA CORPORATION
Inventor: Yuki SASAKI , Shinichi KANNO , Takahiro KURITA
IPC: G06F12/02 , G06F12/1009
CPC classification number: G06F12/0246 , G06F12/1009 , G06F12/0207 , G06F2212/7209 , G06F2212/7201 , G06F2212/651
Abstract: According to one embodiment, a memory system includes a non-volatile memory and a data map configured to manage validity of data written in the non-volatile memory. The data map includes a plurality of first fragment tables corresponding to a first hierarchy and a second fragment table corresponding to a second hierarchy higher than the first hierarchy. Each of the first fragment tables is used to manage the validity of each data having a predetermined size written in a range of physical address in the non-volatile memory allocated to the first fragment table. The second fragment table is used for each of the first fragment tables to manage reference destination information for referencing the first fragment table.
-
10.
公开(公告)号:US11797201B2
公开(公告)日:2023-10-24
申请号:US17745278
申请日:2022-05-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Mahzabeen Islam , Shaizeen Aga , Nuwan Jayasena , Jagadish B. Kotra
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/0673 , G06F12/0207 , G06F12/0223 , G06F12/0607
Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
-
-
-
-
-
-
-
-
-