-
公开(公告)号:US11847550B2
公开(公告)日:2023-12-19
申请号:US17111875
申请日:2020-12-04
申请人: NVIDIA Corporation
发明人: William J. Dally , Angshuman Parashar , Joel Springer Emer , Stephen William Keckler , Larry Robert Dennison
IPC分类号: G06N3/04 , G06N3/042 , G06F17/11 , G06F9/30 , G06F9/38 , G06N3/082 , G06N3/063 , G06N3/045 , G06N3/048 , G06F7/544 , G06F9/355 , G06F17/16 , G06F9/28
CPC分类号: G06N3/042 , G06F7/5443 , G06F9/3001 , G06F9/30018 , G06F9/30025 , G06F9/30036 , G06F9/3851 , G06F9/3887 , G06F17/11 , G06N3/045 , G06N3/048 , G06N3/063 , G06N3/082 , G06F9/28 , G06F9/3555 , G06F17/16 , G06F2207/4824
摘要: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.
-
公开(公告)号:US20230393855A1
公开(公告)日:2023-12-07
申请号:US17833504
申请日:2022-06-06
CPC分类号: G06F9/3887 , G06F9/3877 , G06F9/30098 , G06F9/3555
摘要: An approach is provided for implementing register based single instruction, multiple data (SIMD) lookup table operations. According to the approach, an instruction set architecture (ISA) can support one or more SIMD instructions that enable vectors or multiple values in source data registers to be processed in parallel using a lookup table or truth table stored in one or more function registers. The SIMD instructions can be flexibly configured to support functions with inputs and outputs of various sizes and data formats. Various approaches are also described for supporting very large lookup tables that span multiple registers.
-
公开(公告)号:US11836489B2
公开(公告)日:2023-12-05
申请号:US17973466
申请日:2022-10-25
发明人: Fei Sun
CPC分类号: G06F9/3001 , G06F9/30043 , G06F9/3887 , G06F9/3889
摘要: A processor for sparse matrix calculation includes an on-chip memory, a cache, a gather/scatter engine, and a core. The on-chip memory stores a first matrix or vector, and the cache stores a compressed sparse second matrix data structure. The compressed sparse second matrix data structure includes a value array including non-zero element values of the sparse second matrix, where each entry includes a given number of element values; and a column index array where each entry includes the given number of offsets matching the value array. The gather/scatter engine gathers element values of the first matrix or vector using the column index array of the sparse second matrix. In a hybrid horizontal/vertical implementation, the gather/scatter engine gathers sets of element values from sets of rows and from different sub-banks within the same rows based on the column index array of the sparse matrix.
-
公开(公告)号:US20230376315A1
公开(公告)日:2023-11-23
申请号:US18110607
申请日:2023-02-16
申请人: Fujitsu Limited
发明人: Koji KURIHARA , Kentaro KAWAKAMI
CPC分类号: G06F9/3887 , G06F9/3001
摘要: A non-transitory computer-readable recording medium stores a program for causing a computer to execute a process, the process includes, in a search for combinations of conditions that allow extraction of sample data groups that have n or more attribute pairs whose correlation coefficients exceed a threshold value, when a number of combinations of the conditions is equal to or greater than a number capable of being parallelized, parallelizing processing for the combinations of the conditions per the number capable of being parallelized to calculate the correlation coefficients of respective attribute pairs for each of the combinations of the conditions in addition to a single instruction multiple data (SIMD) conversion process that uses predicate registers as many as the number capable of being parallelized, and searching for the combinations of conditions using the correlation coefficients of the respective attribute pairs for each of the combinations of the conditions.
-
35.
公开(公告)号:US20230359565A1
公开(公告)日:2023-11-09
申请号:US18357732
申请日:2023-07-24
发明人: Joseph Zbiciak
IPC分类号: G06F12/0897 , G06F9/30 , G06F12/0815 , G06F12/0875 , G06F9/345 , G06F9/38 , G06F12/0862 , G06F12/04
CPC分类号: G06F12/0897 , G06F9/3013 , G06F12/0815 , G06F12/0875 , G06F9/3001 , G06F9/30036 , G06F9/30047 , G06F9/30072 , G06F9/3012 , G06F9/30145 , G06F9/345 , G06F9/3822 , G06F9/383 , G06F9/3853 , G06F9/3887 , G06F12/0862 , G06F9/3877 , G06F12/04 , G06F9/30101 , G06F2212/452 , G06F2212/6026 , G06F2212/1056 , G06F2212/454 , G06F9/3552
摘要: A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.
-
公开(公告)号:US11803384B2
公开(公告)日:2023-10-31
申请号:US17828075
申请日:2022-05-31
申请人: FUJITSU LIMITED
发明人: Koji Kurihara , Kentaro Kawakami
CPC分类号: G06F9/30174 , G06F9/3001 , G06F9/30018 , G06F9/3887
摘要: A recording medium stores a program for causing a computer to execute a process including: converting, in a first source code corresponding to a first-type processor, a first load command for a first mask register included in the first-type processor into a second load command for a second mask register included in a second-type processor; and converting, when a first SIMD command for performing an arithmetic operation using the first mask register exists after the first load command in the first source code and a state of a value of the first mask register does not coincide with a state of a value of the first mask register, the first SIMD command into a second SIMD command corresponding to the second-type processor and a change command for changing a state of a value of the second mask register to a state of a value of the second mask register.
-
公开(公告)号:US11748270B2
公开(公告)日:2023-09-05
申请号:US17990812
申请日:2022-11-21
IPC分类号: G06F9/30 , G06F12/1045 , G06F9/345 , G06F9/38 , G06F11/00 , G06F11/10 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/48 , G06F17/16 , G06F9/32 , G06F12/0875 , G06F12/0897 , G06F12/0862 , G06F12/1009
CPC分类号: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30014 , G06F9/3016 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30065 , G06F9/30072 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/30149 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/383 , G06F9/3818 , G06F9/3836 , G06F9/3851 , G06F9/3867 , G06F9/3887 , G06F9/48 , G06F11/00 , G06F11/1048 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , G06F9/3822 , G06F11/10 , G06F2212/452 , G06F2212/60 , G06F2212/602 , G06F2212/68
摘要: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
-
公开(公告)号:US11727527B2
公开(公告)日:2023-08-15
申请号:US17541413
申请日:2021-12-03
申请人: Intel Corporation
发明人: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
IPC分类号: G06T1/20 , G06N3/063 , G06F9/38 , G06F9/30 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/04 , G06N3/08
CPC分类号: G06T1/20 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084
摘要: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.
-
公开(公告)号:US20230237097A1
公开(公告)日:2023-07-27
申请号:US18010948
申请日:2020-06-25
申请人: NEC Corporation
发明人: Osamu DAIDO
IPC分类号: G06F16/901 , G06F9/38 , G06F9/30
CPC分类号: G06F16/9027 , G06F9/3887 , G06F9/30021
摘要: An information processing device performs a decision tree based on a decision tree which has condition determination nodes and leaf nodes. In the information processing device, an instruction unification means generates a unified instruction by unifying an instruction, which each of the condition determination nodes included in the decision tree executes, to be suitable for a parallel processing. An acquisition means acquires a plurality of pieces of input data. A condition determination means performs, by the parallel processing, a condition determination with respect to the plurality of pieces of input data for each of the condition determination nodes.
-
公开(公告)号:US11709674B2
公开(公告)日:2023-07-25
申请号:US17072378
申请日:2020-10-16
发明人: David Kravitz , Manan Salvi , David A. Carlson
CPC分类号: G06F9/30036 , G06F9/3012 , G06F9/30014 , G06F9/30112 , G06F9/3887
摘要: A method of implementing a processor architecture and corresponding system includes operands of a first size and a datapath of a second size. The second size is different from the first size. Given a first array of registers and a second array of registers, each register of the first and second arrays being of the second size, selecting a first register and corresponding second register from the first array and the second array, respectively, to perform operations of the first size. This allows a user, who is interfacing with the hardware processor through software, to provide data of the datapath bit-width instead of the register bit-width. Advantageously, the user is agnostic to the size of the registers.
-
-
-
-
-
-
-
-
-