摘要:
An integrated circuit comprises an external memory, a plurality of parallel connected Vector Processing Engines (VPEs), and an External Memory Unit (EMU) providing a data transfer path between the VPEs and the external memory. Each VPE contains a plurality of data processing units and a message queuing system adapted to transfer messages between the data processing units and other components of the integrated circuit.
摘要:
An integrated circuit comprises an external memory, a plurality of parallel connected Vector Processing Engines (VPEs), and an External Memory Unit (EMU) providing a data transfer path between the VPEs and the external memory. Each VPE contains a plurality of data processing units and a message queuing system adapted to transfer messages between the data processing units and other components of the integrated circuit.
摘要:
A system for performing data-parallel operations and task-parallel operations. A first switch fabric node (SFN) includes first and second lane processing engines (LPEs). The first LPE includes a first set of lane processing units (LPUs) configured to perform data-parallel operations, where each LPU performs a set of operations, and each LPU uses a different set of data for the set of operations, and each LPU within the first set of LPUs uses a different set of data for the set of operations. The second LPE includes a second set of LPUs configured to perform task-parallel operations, where each LPU performs a different set of operations. A processing control engine (PCE) is configured to distribute instructions and data to the first LPE and the second LPE. Advantageously, data parallel operations and task parallel operations are able to be performed on the same processor simultaneously.
摘要:
One embodiment of the invention sets forth a hardware-based physics processing unit (PPU) having unique architecture designed to efficiently generate physics data. The PPU includes a PPU control engine (PCE), a data movement engine and a floating point engine (FPE). The PCE manages the overall operation of the PPU by allocating memory resources and transmitting graphics processing commands to the FPE and data movement commands to the DME. The FPE includes multiple vector processors that operate in parallel and perform floating point operations on data received from a host unit to generate physics simulation data. The DME facilitates the transmission of data between the host unit and the FPE by performs data movement operations between memories internal and external to the PPU.
摘要:
A method and device for controlling bandwidth distribution through a switch fabric is provided wherein a plurality of line cards and processor cards are connected through a switch fabric for parallel processing of transmission requests, along with the provision of transmission “credits” allowing for transmitting additional data bytes during a given cycle, which provides efficient and speedy bandwidth distribution, as well as resolution of output contentions. The processors maintain a credit balance which allows flexibility in granting transmission requests to accommodate transmission scheduling and “bursty” transmissions. Processors on both of the line cards and the processor cards normalize the data transmission requirements for both inputs and outputs connected by the switch fabric. Smoothing of data transmission is provided using a time-weighted buffer.
摘要:
A method of providing physics data within a game program or simulation using a hardware-based physics processing unit having unique architecture designed to efficiently calculate physics related data.
摘要:
A method of providing physics data within a game program or simulation using a hardware-based physics processing unit having unique architecture designed to efficiently calculate physics related data.
摘要:
A method of operating a Linear Complementarity Problem (LCP) solver is disclosed, where the LCP solver is characterized by multiple execution units operating in parallel to implement a competent computational method adapted to resolve physics-based LCPs in real-time.
摘要:
An efficient quasi-custom instruction set for Physics Processing Unit (PPU) is enabled by balancing the dictates of a parallel arrangement of multiple, independent vector processors and programming considerations. A hierarchy of multiple, programmable memories and distributed control over data transfer is presented.
摘要:
A method of operating a Linear Complementarity Problem (LCP) solver is disclosed, where the LCP solver is characterized by multiple execution units operating in parallel to implement a competent computational method adapted to resolve physics-based LCPs in real-time.