摘要:
A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network.
摘要:
Disclosed is a procedure or design approach for functional modules that may be used in connection with a multiprocessor integrated circuit chip. The approach includes keeping the dimensions of each module substantially the same and having the bus, power, clock and I/O connection configured the same on all modules. Further requirements for ease of use are to generalize the capability of each module as much as possible and to decentralize functions such as testing to be primarily performed within each module. The use of such considerations or rules substantially eases the design of a given type of custom chips, and based upon an initial chip design greatly facilitates the design of further custom chips, similar in application, but subsequent to the successful completion of the initial chip. The standardization modules and replication of the modules on a given chip also reduces physical verification time in initial chip design as well as redesign time of the initial chip when requirements for chip capability are redefined or otherwise changed. Any subsequent or further custom chips can include more or less of specific modules based upon already established parameters.
摘要:
Systems and methods for distributing thread instructions in the pipeline of a multi-threading digital processor are disclosed. More particularly, hardware and software are disclosed for successively selecting threads in an ordered sequence for execution in the processor pipeline. If a thread to be selected cannot execute, then a complementary thread is selected for execution.
摘要:
Disclosed is a procedure or design approach for functional modules that may be used in connection with a multiprocessor integrated circuit chip. The approach includes keeping the dimensions of each module substantially the same and having the bus, power, clock and I/O connection configured the same on all modules. Further requirements for ease of use are to generalize the capability of each module as much as possible and to decentralize functions such as testing to be primarily performed within each module. The use of such considerations or rules substantially eases the design of a given type of custom chips, and based upon an initial chip design greatly facilitates the design of further custom chips, similar in application, but subsequent to the successful completion of the initial chip. The standardization modules and replication of the modules on a given chip also reduces physical verification time in initial chip design as well as redesign time of the initial chip when requirements for chip capability are redefined or otherwise changed. Any subsequent or further custom chips can include more or less of specific modules based upon already established parameters.
摘要:
A program is into at least two object files: one object file for each of the supported processor environments. During compilation, code characteristics, such as data locality, computational intensity, and data parallelism, are analyzed and recorded in the object file. During run time, the code characteristics are combined with runtime considerations, such as the current load on the processors and the size of the data being processed, to arrive at an overall value. The overall value is then used to determine which of the processors will be assigned the task. The values are assigned based on the characteristics of the various processors. For example, if one processor is better at handling intensive computations against large streams of data, programs that are highly computationally intensive and process large quantities of data are weighted in favor of that processor. The corresponding object is then loaded and executed on the assigned processor.
摘要:
The present invention provides for asynchronous DMA command completion notification in a computer system. A command tag, associated with a plurality DMA command is generated. A DMA data movement command having the command tag is grouped with another DMA data movement command having the command tag. DMA commands belonging to the same tag group are monitored to see whether all DMA commands of the same tag group are completed.
摘要:
Disclosed is a coherent cache system that operates in conjunction with non-homogeneous processing units. A set of processing units of a first configuration has conventional cache and directly accesses common or shared system physical and virtual address memory through the use of a conventional MMU (Memory Management Unit). Additional processors of a different configuration and/or other devices that need to access system memory are configured to store accessed data in compatible caches. Each of the caches is compatible with a given protocol coherent memory management bus interspersed between the caches and the system memory.
摘要:
A method and system for executing one or more remote procedure calls. In one embodiment, a method comprises the step of a processing unit issuing a plurality of commands to a corresponding DMA controller. One or more commands of the plurality of commands issued by the processing unit are to copy attached processing unit instructions associated with one or more Attached Processing Unit's (APU's) and data associated with the attached processing unit instructions from the shared memory to one or more APU's. The attached processing unit instructions may include instructions that enable the associated one or more APU's to perform one or more particular operations on the data. The method further comprises the DMA controller issuing an indication to the one or more APU's to perform the one or more operations on the data associated with the attached processing unit instructions. Instead of having the particular APU that completed its operation notify the corresponding processing unit of its completion of the operation, the DMA controller polls a status line of each of the one or more attached processing units to determine if any of the one or more attached processing units completed its operation. The DMA controller then copies the results of the operations after each of the one or more attached processing units completes its operation.
摘要:
An improved cell circuit for data readout with reduced number of read wordlines is provided in a memory block of a multiport memory array. The number of read wordlines is significantly reduced by using a decoder between the read wordlines and a multiplexer in the cell circuit. The memory block has a plurality of address inputs and stores a plurality of write data signals. In the cell circuit, the decoder receives as decoder inputs a subset of the address inputs and outputs a plurality of select signals. The multiplexer is coupled to the decoder to receive the select signals and select one of the write data signals based on the select signals. Additionally, the read wordlines are coupled to the decoder for carrying the subset of the address inputs to the decoder.
摘要:
A system including a central processor and a plurality of attached processors all on a single die are disclosed. Each of the attached processors is preferably functionally equivalent to each of the other attached processors. The system further includes at least one redundant processor that is connectable to the central processor. The redundant processor may be substantially equivalent to each of the attached processors. Upon detecting a failure in one of the attached processors, the system is configured to disable the non-functional processor and enable the redundant processor. The attached processors may be connected to a memory interface unit via a parallel bus or a pipelined bus in which each attached processor is connected to a stage of the pipelined bus. The attached processors may each include a load/store unit and logic suitable for performing a mathematical function.