-
公开(公告)号:US10459727B2
公开(公告)日:2019-10-29
申请号:US14986465
申请日:2015-12-31
IPC分类号: G06F9/30 , G06F9/38 , G06F12/0875 , G06F9/32 , G06N3/02
摘要: Loop code processor optimizations are implemented as a loop optimizer extension to a processor pipeline. The loop optimizer generates optimized code associated with code loops that include at least one zero-optimizable instruction. The loop optimizer may generate multiple versions of optimized code associated with a particular code loop, where each of the multiple version of optimized code has a different associated condition under which the optimized code can be safely executed.
-
公开(公告)号:US12056604B2
公开(公告)日:2024-08-06
申请号:US16024369
申请日:2018-06-29
发明人: Vivek Seshadri , Amar Phanishayee , Deepak Narayanan , Aaron Harlap , Nikhil Devanur Rangarajan
摘要: Layers of a deep neural network (DNN) are partitioned into stages using a profile of the DNN. Each of the stages includes one or more of the layers of the DNN. The partitioning of the layers of the DNN into stages is optimized in various ways including optimizing the partitioning to minimize training time, to minimize data communication between worker computing devices used to train the DNN, or to ensure that the worker computing devices perform an approximately equal amount of the processing for training the DNN. The stages are assigned to the worker computing devices. The worker computing devices process batches of training data using a scheduling policy that causes the workers to alternate between forward processing of the batches of the DNN training data and backward processing of the batches of the DNN training data. The stages can be configured for model parallel processing or data parallel processing.
-
公开(公告)号:US20170192793A1
公开(公告)日:2017-07-06
申请号:US14986463
申请日:2015-12-31
CPC分类号: G06F9/3867 , G06F9/3001 , G06F9/30021 , G06F9/3802 , G06F9/3869 , G06F9/3873
摘要: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.
-
公开(公告)号:US10592252B2
公开(公告)日:2020-03-17
申请号:US14986463
申请日:2015-12-31
摘要: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.
-
公开(公告)号:US20170192896A1
公开(公告)日:2017-07-06
申请号:US14986470
申请日:2015-12-31
IPC分类号: G06F12/08
CPC分类号: G06F12/0875 , G06F9/30181 , G06F9/30192 , G06F9/381 , G06F9/3836 , G06F12/0848 , G06F12/0888 , G06F12/0893 , G06F12/128 , G06F2212/1016 , G06F2212/1044 , G06F2212/1048 , G06F2212/401 , G06F2212/452 , G06N3/08
摘要: A zero cache memory system extension includes a zero cache to store cache tags associated with zero cache lines, while a corresponding data cache stores cache tags and data bytes associated with non-zero cache lines. As non-zero data is written to the cache, cache lines may be moved from the zero cache to the data cache. Similarly, as zero data is written to the cache, cache lines may be moved from the data cache to the zero cache.
-
公开(公告)号:US20170192787A1
公开(公告)日:2017-07-06
申请号:US14986465
申请日:2015-12-31
CPC分类号: G06F9/30181 , G06F9/30192 , G06F9/325 , G06F9/381 , G06F9/3873 , G06F12/0875 , G06F2212/452 , G06N3/08
摘要: Loop code processor optimizations are implemented as a loop optimizer extension to a processor pipeline. The loop optimizer generates optimized code associated with code loops that include at least one zero-optimizable instruction. The loop optimizer may generate multiple versions of optimized code associated with a particular code loop, where each of the multiple version of optimized code has a different associated condition under which the optimized code can be safely executed.
-
-
-
-
-