-
1.
公开(公告)号:US20240362180A1
公开(公告)日:2024-10-31
申请号:US18647549
申请日:2024-04-26
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh
IPC分类号: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.
-
公开(公告)号:US12112163B2
公开(公告)日:2024-10-08
申请号:US17726224
申请日:2022-04-21
发明人: Hiroki Noguchi , Yih Wang
CPC分类号: G06F9/3001 , G06F3/0611 , G06F3/0658 , G06F3/0673 , G06F7/575 , G06F9/3016
摘要: A memory interface circuit includes an instruction decoder configured to receive an instruction from a processor to generate a corresponding control code. An execution circuit is configured to receive the control code from the instruction decoder and access a memory and generate an arithmetic result according to the control code.
-
3.
公开(公告)号:US20240329991A1
公开(公告)日:2024-10-03
申请号:US18194327
申请日:2023-03-31
申请人: Intel Corporation
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/3001 , G06F9/30025
摘要: An apparatus of an aspect includes decoder circuitry to decode an instruction. The instruction to indicate at least one source floating-point vector, a destination storage location, and at least one value. The source floating-point vector is to have floating-point data elements. The at least one value is to indicate at least one of: (a) a number of significand bits of the floating-point data elements; (b) a number of exponent bits of the floating-point data elements; (c) exponent bias information for the floating-point data elements; or (d) any combination thereof. Execution circuitry coupled with decoder circuitry is to perform operations according to the instruction. The operations include to interpret the floating-point data elements consistent with the at least one value, perform an operation specified by the instruction on the at least one source floating-point vector to generate a result vector, and store the result vector in the destination storage location.
-
公开(公告)号:US12106098B2
公开(公告)日:2024-10-01
申请号:US18129119
申请日:2023-03-31
发明人: Hyun Pil Kim , Hyun Woo Sim , Seong Woo Ahn
IPC分类号: G06F9/30 , G06F1/3287 , G06T1/20
CPC分类号: G06F9/3001 , G06F1/3287 , G06F9/30101 , G06T1/20 , G06T2207/20024 , G06T2207/20164
摘要: A semiconductor device including a first processor having a first register, the first processor configured to perform region of interest (ROI) calculations using the first register; and a second processor having a second register, the second processor configured to perform arithmetic calculations using the second register. The first register is shared with the second processor, and the second register is shared with the first processor.
-
公开(公告)号:US12099848B2
公开(公告)日:2024-09-24
申请号:US17389118
申请日:2021-07-29
申请人: NVIDIA Corporation
CPC分类号: G06F9/3877 , G06F9/3001 , G06F9/3555 , G06F9/545
摘要: Apparatuses, systems, and techniques to receive, by a processor of a computer system, one or more operations for a kernel; automatically generate, by the processor, one or more operators that perform the one or more operations on elements of one or more input data structures; and automatically generate, by the processor, the kernel that comprises the one or more operators.
-
公开(公告)号:US12086205B2
公开(公告)日:2024-09-10
申请号:US17211627
申请日:2021-03-24
申请人: Intel Corporation
发明人: Chunhui Mei , Hong Jiang , Jiasheng Chen , Yongsheng Liu , Yan Li
CPC分类号: G06F17/16 , G06F7/5443 , G06F9/3001 , G06F9/30043 , G06F15/8046 , G06F17/11
摘要: Matrix multiply units can take advantage of input sparsity by zero gating ALUs, which saves power consumption, but compute throughput does not increase. To improve compute throughput from sparsity, processing resources in a matrix accelerator can skip computation with zero involved in input or output. If zeros in input can be skipped, the processing units can focus calculations on generating meaningful non-zero output.
-
公开(公告)号:US12073214B2
公开(公告)日:2024-08-27
申请号:US17952001
申请日:2022-09-23
申请人: Intel Corporation
发明人: Jesus Corbal , Robert Valentine , Roman S. Dubtsov , Nikita A. Shustrov , Mark J. Charney , Dennis R. Bradford , Milind B. Girkar , Edward T. Grochowski , Thomas D. Fletcher , Warren E. Ferguson
CPC分类号: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/3893
摘要: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.
-
公开(公告)号:US12066975B2
公开(公告)日:2024-08-20
申请号:US17429291
申请日:2020-03-14
申请人: Intel Corporation
发明人: Altug Koker , Lakshminarayanan Striramassarma , Aravindh Anantaraman , Valentin Andrei , Abhishek R. Appu , Sean Coleman , Varghese George , K Pattabhiraman , Mike MacPherson , Subramaniam Maiyuran , ElMoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , Joydeep Ray , S Jayakrishna P , Prasoonkumar Surti
IPC分类号: G06F12/00 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.
-
公开(公告)号:US20240272907A1
公开(公告)日:2024-08-15
申请号:US18438097
申请日:2024-02-09
发明人: Sofiane LANDI , Enea DIMROCI
IPC分类号: G06F9/30
CPC分类号: G06F9/3012 , G06F9/3001
摘要: A register bank includes a plurality of without-reset registers. The register bank has a write input, a write-enable input, and a write-address input coupled to the plurality of without-reset registers. The register bank has a plurality of operating modes, including an initialization mode of operation and a write mode of operation. In the initialization mode of operation, the register bank responds to receipt of a write-enable signal on the write-enable input by storing initialization data received on the write input into a register of the first plurality of without-reset registers based on a write-address signal received on the write-address input.
-
公开(公告)号:US20240256828A1
公开(公告)日:2024-08-01
申请号:US18601739
申请日:2024-03-11
发明人: Ilia Ovsiannikov , Ali Shafiee Ardestani , Joseph H. Hassoun , Lei Wang , Sehwan Lee , JoonHo Song , Jun-Woo Jang , Yibing Michelle Wang , Yuecheng Li
CPC分类号: G06N3/04 , G06F17/153 , G06F17/16 , G06N3/08 , G06T9/002 , G06F9/3001
摘要: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
-
-
-
-
-
-
-
-
-