-
1.
公开(公告)号:US20190042237A1
公开(公告)日:2019-02-07
申请号:US16016278
申请日:2018-06-22
申请人: Omid AZIZI , Guy BOUDOUKH , Tony WERNER , Andrew YANG , Michael ROTZIN , Chen KOREN , Eriko NURVITADHI
发明人: Omid AZIZI , Guy BOUDOUKH , Tony WERNER , Andrew YANG , Michael ROTZIN , Chen KOREN , Eriko NURVITADHI
摘要: Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the element and each corresponding non-zero element at row K, column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the specified output matrix.
-
公开(公告)号:US20180308206A1
公开(公告)日:2018-10-25
申请号:US15698217
申请日:2017-09-07
申请人: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
发明人: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC分类号: G06T1/20 , G06F3/0613 , G06F3/0659 , G06F3/0679 , G06F3/1438 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/08 , G06N3/084 , G06T1/60 , G09G5/001 , G09G5/363 , G09G2352/00 , G09G2360/06 , G09G2360/08 , G09G2360/121 , G09G2360/123 , G09G2370/08
摘要: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.
-
公开(公告)号:US20180189032A1
公开(公告)日:2018-07-05
申请号:US15394968
申请日:2016-12-30
申请人: ERIKO NURVITADHI , YU WANG , DEBORAH T. MARR
发明人: ERIKO NURVITADHI , YU WANG , DEBORAH T. MARR
IPC分类号: G06F9/44
CPC分类号: G06F8/20 , G06F8/30 , G06F17/5045 , G06F2217/08
摘要: An apparatus and method are described for designing an accelerator for processing sparse data. For example, one embodiment comprises a machine-readable medium having program code stored thereon which, when executed by a processor, causes the processor to perform the operations of: analyzing input graph program code and parameters associated with a target accelerator in view of an accelerator architecture template; responsively mapping the parameters onto the architecture template to implement customizations to the accelerator architecture template; and generating a hardware description representation of the target accelerator based on the determined mapping of the parameters to apply to the accelerator architecture template.
-
公开(公告)号:US20210216318A1
公开(公告)日:2021-07-15
申请号:US17214646
申请日:2021-03-26
IPC分类号: G06F9/30 , G06F30/347 , G06F9/445
摘要: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.
-
公开(公告)号:US20200225948A1
公开(公告)日:2020-07-16
申请号:US16833597
申请日:2020-03-28
申请人: Jaewoong SIM , Alaa ALAMELDEEN , Eriko NURVITADHI , Deborah MARR
发明人: Jaewoong SIM , Alaa ALAMELDEEN , Eriko NURVITADHI , Deborah MARR
摘要: An apparatus and method for compressing floating-point values. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions from a memory, the instructions including floating-point instructions; execution circuitry to execute the floating-point instructions, each floating-point instruction having one or more floating-point operands, each floating-point operand comprising an exponent value and a significand value; floating-point compression circuitry to compress a plurality of the exponent values associated with a corresponding plurality of the floating-point operands, the floating-point compression circuitry comprising: base generation circuitry to evaluate the plurality of the exponent values to generate a first base value; and delta generation circuitry to determine a difference between the plurality of exponent values and the first base value and to generate a corresponding first plurality of delta values, wherein the floating-point compression circuitry is to store the first base value and the corresponding first plurality of delta values as a plurality of compressed exponent values.
-
公开(公告)号:US20180308200A1
公开(公告)日:2018-10-25
申请号:US15494886
申请日:2017-04-24
申请人: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
发明人: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC分类号: G06T1/20 , G06F8/41 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06F2009/45583 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084
摘要: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type
-
公开(公告)号:US20110307688A1
公开(公告)日:2011-12-15
申请号:US13158081
申请日:2011-06-10
申请人: Eriko Nurvitadhi , James C. Hoe
发明人: Eriko Nurvitadhi , James C. Hoe
IPC分类号: G06F9/38
CPC分类号: G06F17/5045
摘要: Computer-implemented methods and systems for synthesizing a hardware description for a pipelined datapath for a digital circuit. A transactional datapath specification framework and a transactional design automation system automatically synthesize pipeline implementations. The transactional datapath specification framework captures an abstract datapath, whose execution semantics is interpreted as a sequence of “transactions” where each transaction reads the state values left by the preceding transaction and computes a new set of state values to be seen by the next transaction. The transactional datapath specification framework exposes sufficient information about state accesses that can occur in a datapath, which is necessary for performing precise data hazards analysis, and eventually pipeline synthesis.
摘要翻译: 用于合成数字电路流水线数据路径的硬件描述的计算机实现的方法和系统。 交易数据路径规范框架和事务设计自动化系统自动合成管道实现。 事务数据路径规范框架捕获抽象数据路径,其执行语义被解释为“事务”序列,其中每个事务读取前一事务留下的状态值,并计算下一个事务要看到的一组新状态值。 事务数据路径规范框架暴露了可能发生在数据路径中的状态访问的足够信息,这对于执行精确的数据危害分析以及最终的流水线合成是必需的。
-
-
-
-
-
-