摘要:
Techniques and mechanisms for performing in-memory computations with circuitry having a pipeline architecture. In an embodiment, various stages of a pipeline each include a respective input interface and a respective output interface, distinct from said input interface, to couple to different respective circuitry. These stages each further include a respective array of memory cells and circuitry to perform operations based on data stored by said array. A result of one such in-memory computation may be communicated from one pipeline stage to a respective next pipeline stage for use in further in-memory computations. Control circuitry, interconnect circuitry, configuration circuitry or other logic of the pipeline precludes operation of the pipeline as a monolithic, general-purpose memory device. In other embodiments, stages of the pipeline each provide a different respective layer of a neural network.
摘要:
Techniques and mechanisms for configuring a memory device to perform a sequence of in-memory computations. In an embodiment, a memory device includes a memory array and circuitry, coupled thereto, to perform data computations based on the data stored at the memory array. Based on instructions received at the memory device, control circuitry is configured to enable an automatic performance of a sequence of operations. In another embodiment, the memory device is coupled in an in-series arrangement of other memory devices to provide a pipeline circuit architecture. The memory devices each function as a respective stage of the pipeline circuit architecture, where the stages each perform respective in-memory computations. Some or all such stages each provide a different respective layer of a neural network.
摘要:
The present disclosure is directed to systems and methods of implementing an analog neural network using a pipelined SRAM architecture (“PISA”) circuitry disposed in on-chip processor memory circuitry. The on-chip processor memory circuitry may include processor last level cache (LLC) circuitry. One or more physical parameters, such as a stored charge or voltage, may be used to permit the generation of an in-memory analog output using a SRAM array. The generation of an in-memory analog output using only word-line and bit-line capabilities beneficially increases the computational density of the PISA circuit without increasing power requirements. Thus, the systems and methods described herein beneficially leverage the existing capabilities of on-chip SRAM processor memory circuitry to perform a relatively large number of analog vector/tensor calculations associated with execution of a neural network, such as a recurrent neural network, without burdening the processor circuitry and without significant impact to the processor power requirements.
摘要:
The present disclosure is directed to systems and methods of implementing a neural network using in-memory, bit-serial, mathematical operations performed by a pipelined SRAM architecture (bit-serial PISA) circuitry disposed in on-chip processor memory circuitry. The on-chip processor memory circuitry may include processor last level cache (LLC) circuitry. The bit-serial PISA circuitry is coupled to PISA memory circuitry via a relatively high-bandwidth connection to beneficially facilitate the storage and retrieval of layer weights by the bit-serial PISA circuitry during execution. Direct memory access (DMA) circuitry transfers the neural network model and input data from system memory to the bit-serial PISA memory and also transfers output data from the PISA memory circuitry to system memory circuitry. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of vector/tensor calculations without burdening the processor circuitry.
摘要:
The present disclosure is directed to systems and methods of bit-serial, in-memory, execution of at least an nth layer of a multi-layer neural network in a first on-chip processor memory circuitry portion contemporaneous with prefetching and storing layer weights associated with the (n+1)st layer of the multi-layer neural network in a second on-chip processor memory circuitry portion. The storage of layer weights in on-chip processor memory circuitry beneficially decreases the time required to transfer the layer weights upon execution of the (n+1)st layer of the multi-layer neural network by the first on-chip processor memory circuitry portion. In addition, the on-chip processor memory circuitry may include a third on-chip processor memory circuitry portion used to store intermediate and/or final input/output values associated with one or more layers included in the multi-layer neural network.
摘要:
The present disclosure is directed to systems and methods of implementing a neural network using in-memory mathematical operations performed by pipelined SRAM architecture (PISA) circuitry disposed in on-chip processor memory circuitry. A high-level compiler may be provided to compile data representative of a multi-layer neural network model and one or more neural network data inputs from a first high-level programming language to an intermediate domain-specific language (DSL). A low-level compiler may be provided to compile the representative data from the intermediate DSL to multiple instruction sets in accordance with an instruction set architecture (ISA), such that each of the multiple instruction sets corresponds to a single respective layer of the multi-layer neural network model. Each of the multiple instruction sets may be assigned to a respective SRAM array of the PISA circuitry for in-memory execution. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of in-memory vector/tensor calculations in furtherance of neural network processing without burdening the processor circuitry.
摘要:
A full-rail digital-read CIM circuit enables a weighted read operation on a single row of a memory array. A weighted read operation captures a value of a weight stored in the single memory array row without having to rely on weighted row-access. Rather, using full-rail access and a weighted sampling capacitance network, the CIM circuit enables the weighted read operation even under process variation, noise and mismatch.
摘要:
Integrated circuits may have arrays of memory elements. Data may be loaded into the memory elements and read from the memory elements using data lines. Address lines may be used to apply address signals to write address transistors and read circuitry. A memory element may include a bistable storage element. Read circuitry may be coupled between the bistable storage element and a data line. The read circuitry may include a data storage node. A capacitor may be coupled between the data storage node and ground and may be used in storing preloaded data from the bistable storage element. The read circuitry may include a transistor that is coupled between the bistable storage element and the data storage node and a transistor that is coupled between the data storage node and the data line.
摘要:
Application service requests received by an application hosting framework are automatically differentiated and categorized, and resource usage patterns associated with the requests are predicted. Resource usage data points are successively extracted from the hosting framework. Elements of an initial resource usage pattern matrix are computed from the data points. An estimate for the number of categories of requests is computed from the initial resource usage pattern matrix, where the requests in each category have similar resource usage patterns. Elements of a resource usage signature matrix and request categorization matrix are computed from the estimate for the number of categories of requests and the initial resource usage pattern matrix.
摘要:
A system and method for providing a handoff of a call from a mobile node. The call has associated state information. The call is established between the mobile node and a first RNN and the mobile node and a first PDSN. Thereafter, the mobile node roams and establishes a link between the mobile node and a second RNN. A second PDSN is selected to service the mobile node. A direct communication path is established between the first PDSN and the second PDSN.State information of the call is exchanged between the first PDSN and the second PDSN using the communication path.