NEURAL NETWORK UNIT WITH MIXED DATA AND WEIGHT SIZE COMPUTATION CAPABILITY

    公开(公告)号:US20180165575A1

    公开(公告)日:2018-06-14

    申请号:US15372555

    申请日:2016-12-08

    IPC分类号: G06N3/08 G06N3/063

    摘要: In a neural network unit, each neural processing unit (NPU) of an array of N NPUs receives respective first and second upper and lower bytes of 2N bytes received from first and second RAMs. In a first mode, each NPU sign-extends the first upper byte to form a first 16-bit word and performs an arithmetic operation on the first 16-bit word and a second 16-bit word formed by the second upper and lower bytes. In a second mode, each NPU sign-extends the first lower byte to form a third 16-bit word and performs the arithmetic operation on the third 16-bit word and the second 16-bit word formed by the second upper and lower bytes. In a third mode, each NPU performs the arithmetic operation on a fourth 16-bit word formed by the first upper and lower bytes and the second 16-bit word formed by the second upper and lower bytes.

    NEURAL NETWORK UNIT WITH MEMORY LAYOUT TO PERFORM EFFICIENT 3-DIMENSIONAL CONVOLUTIONS

    公开(公告)号:US20180157962A1

    公开(公告)日:2018-06-07

    申请号:US15366041

    申请日:2016-12-01

    IPC分类号: G06N3/04 G06N3/063 G06N3/08

    摘要: A neural network unit convolves an H×W×C input with F R×S×C filters to generate F Q×P outputs. N processing units (PU) each have a register receiving a respective word of an N-word row of a second memory and multiplexed-register selectively receiving a respective word of an N-word row of a first memory or word rotated from an adjacent PU multiplexed-register. H first memory rows hold input blocks of B words each of channels of respective 2-dimensional input row slices. R×S×C second memory rows hold filter blocks of B words each holding P copies of a filter weight. B is the smallest factor of N greater than W. The PU blocks multiply-accumulate input blocks and filter blocks in column-channel-row order; they read a row of input blocks and rotate it around the N PUs while performing multiply-accumulate operations so each PU block receives each input block before reading another row.

    PROCESSOR WITH PROGRAMMABLE PREFETCHER
    4.
    发明申请

    公开(公告)号:US20170161196A1

    公开(公告)日:2017-06-08

    申请号:US15372045

    申请日:2016-12-07

    IPC分类号: G06F12/0862 G06F12/0855

    摘要: A processor including a front end, at least one load pipeline, and a memory system that further includes a programmable prefetcher for prefetching information from an external memory. The front end converts fetched program instructions into microinstructions including load microinstructions and dispatches microinstructions for execution. The load pipeline executes dispatched load microinstructions and provides load requests to the memory system. The programmable prefetcher includes a load monitor, a programmable prefetch engine, and a prefetch requester. The load monitor tracks the load requests. The prefetch engine is configured to be programmed by at least one prefetch program to operate as a programmed prefetcher, such that during operation of the processor, the programmed prefetcher generates at least one prefetch address based on the load requests issued by the processor. The prefetch requester submits the at least one prefetch address to prefetch information from the memory system.

    NEURAL NETWORK UNIT THAT PERFORMS CONVOLUTIONS USING COLLECTIVE SHIFT REGISTER AMONG ARRAY OF NEURAL PROCESSING UNITS

    公开(公告)号:US20170103311A1

    公开(公告)日:2017-04-13

    申请号:US15090722

    申请日:2016-04-05

    IPC分类号: G06N3/08 G06N3/04

    摘要: A neural network unit has a first memory that holds elements of a data matrix and a second memory that holds elements of a convolution kernel. An array of neural processing units (NPU) each have a multiplexed register that receives a corresponding element of a row from the first memory and that also receives the multiplexed register output of an adjacent NPU. A register receives a corresponding element of a row from the second memory. An arithmetic unit receives the outputs of the register, the multiplexed register and an accumulator and performs a multiply-accumulate operation on them. For each sub-matrix of a plurality of sub-matrices of the data matrix, each arithmetic unit selectively receives either the element from the first memory or the adjacent NPU multiplexed register output and performs a series of the multiply-accumulate operations to accumulate into the accumulator a convolution of the sub-matrix with the convolution kernel.

    PROCESSOR WITH HYBRID COPROCESSOR/EXECUTION UNIT NEURAL NETWORK UNIT

    公开(公告)号:US20170103307A1

    公开(公告)日:2017-04-13

    申请号:US15090798

    申请日:2016-04-05

    IPC分类号: G06N3/063 G06N3/04

    摘要: A processor includes a front-end portion that issues instructions to execution units that execute the issued instructions. A hardware neural network unit (NNU) execution unit includes a first memory that holds data words associated with artificial neural networks (ANN) neuron outputs, a second memory that holds weight words associated with connections between ANN neurons, and a third memory that holds a program comprising NNU instructions that are distinct, with respect to their instruction set, from the instructions issued to the NNU by the front-end portion of the processor. The program performs ANN-associated computations on the data and weight words. A first instruction instructs the NNU to transfer NNU instructions of the program from architectural general purpose registers to the third memory. A second instruction instructs the NNU to invoke the program stored in the third memory.

    MECHANISM TO PRECLUDE I/O-DEPENDENT LOAD REPLAYS IN AN OUT-OF-ORDER PROCESSOR

    公开(公告)号:US20160342414A1

    公开(公告)日:2016-11-24

    申请号:US14889199

    申请日:2014-12-14

    IPC分类号: G06F9/22 G06F1/32 G06F12/0875

    摘要: An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The resources include an input/output (I/O) unit, configured to perform I/O operations via an I/O bus coupling an out-of-order processor to I/O resources.

    HARDWARE DATA COMPRESSOR WITH MULTIPLE STRING MATCH SEARCH HASH TABLES EACH BASED ON DIFFERENT HASH SIZE
    10.
    发明申请
    HARDWARE DATA COMPRESSOR WITH MULTIPLE STRING MATCH SEARCH HASH TABLES EACH BASED ON DIFFERENT HASH SIZE 有权
    硬件数据压缩机,具有多个字符匹配搜索基于不同大小的哈希表

    公开(公告)号:US20160336961A1

    公开(公告)日:2016-11-17

    申请号:US14883068

    申请日:2015-10-14

    IPC分类号: H03M7/42

    CPC分类号: H03M7/42 H03M7/3084

    摘要: A hardware data compressor. A hardware engine maintains first and second hash tables while it scans an input block of characters to be compressed. The first hash table is indexed by a hash of N characters of the input block. The second hash table is indexed by a hash of M characters of the input block. M is greater than two. N is greater than M. The engine uses the first hash table to search the input block behind a current search target location for a match of at least N characters at the current search target location, and uses the second hash table to search the input block behind the current search target location for a match of at least M characters at the current search target location when no match of at least N characters at the current search target location using the first hash table is found.

    摘要翻译: 硬件数据压缩器。 硬件引擎在扫描要压缩的字符的输入块时维护第一和第二散列表。 第一个哈希表由输入块的N个字符的散列索引。 第二个哈希表由输入块的M个字符的哈希索引。 M大于2。 N大于M.引擎使用第一哈希表来搜索当前搜索目标位置之后的输入块,以获得当前搜索目标位置处至少N个字符的匹配,并且使用第二哈希表来搜索输入块 在当前搜索目标位置之后的当前搜索目标位置处的至少M个字符的匹配的当前搜索目标位置之后的当前搜索目标位置处的至少N个字符的匹配被发现。