摘要:
An apparatus for controlling rounding modes in a single instruction multiple data (SIMD) floating-point unit is disclosed. The SIMD floating-point unit includes a floating-point status-and-control register (FPSCR) having a first rounding mode bit field and a second rounding mode bit field. The SIMD floating-point unit also includes means for generating a first slice and a second slice. During a floating-point operation, the SIMD floating-point unit concurrently performs a first rounding operation on the first slice and a second rounding operation on the second slice according to a bit in the first rounding mode bit field and a bit in the second rounding mode bit field within the FPSCR, respectively.
摘要:
A program is into at least two object files: one object file for each of the supported processor environments. During compilation, code characteristics, such as data locality, computational intensity, and data parallelism, are analyzed and recorded in the object file. During run time, the code characteristics are combined with runtime considerations, such as the current load on the processors and the size of the data being processed, to arrive at an overall value. The overall value is then used to determine which of the processors will be assigned the task. The values are assigned based on the characteristics of the various processors. For example, if one processor is better at handling intensive computations against large streams of data, programs that are highly computationally intensive and process large quantities of data are weighted in favor of that processor. The corresponding object is then loaded and executed on the assigned processor.
摘要:
Systems and methods for distributing thread instructions in the pipeline of a multi-threading digital processor are disclosed. More particularly, hardware and software are disclosed for successively selecting threads in an ordered sequence for execution in the processor pipeline. If a thread to be selected cannot execute, then a complementary thread is selected for execution.
摘要:
The circuit library available for logic synthesis is limited to a single dynamic circuit block or logic synthesis block. The circuit design method includes first defining the logic synthesis block and then performing logic synthesis for a predetermined logical operation to be implemented. The logic synthesis step constrained to the single logic synthesis block produces an intermediate circuit design which necessarily comprises a series of dynamic circuit blocks, each associated with a single reset signal. Once the intermediate circuit is produced, the circuit design method includes eliminating unnecessary devices from the intermediate circuit to produce a final logic circuit, and then sizing the devices in the final circuit to complete the design.
摘要:
High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic.
摘要:
A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network.
摘要:
The circuit library available for logic synthesis is limited to a single dynamic circuit block or logic synthesis block. The circuit design method includes first defining the logic synthesis block (16) and then performing logic synthesis (17) for a predetermined logical operation to be implemented. The logic synthesis step constrained to the single logic synthesis block produces an intermediate circuit design (29) which necessarily comprises a series of dynamic circuit blocks, each associated with a single reset signal. Once the intermediate circuit (29) is produced, the circuit design method includes eliminating unnecessary devices (46) from the intermediate circuit (29) to produce a final logic circuit, and then sizing the devices (48) in the final circuit to complete the design.
摘要:
A first device is operable to communicate on an bus according to a first protocol. A bridge is also operable to communicate on the bus according to the first protocol. A second device is coupled to the bus via the bridge and operable to communicate according to a second protocol. The bridge has a memory for holding data received from the second device and is operable to translate from the second to the first protocol. The second device sends write data responsive to receiving a ready signal from the bridge, and includes memory for holding the write data that the second device has sent, but for which completion has not been signaled. The second device re-sends the write data from the memory responsive to receiving a non-completion signal via the bridge, and releases the memory for the data responsive to receiving a completion signal via the bridge.
摘要:
The present invention provides a system for managing cache replacement eligibility. A first address register is configured to request an address from an L1 cache. An L1 cache is configured to determine whether a requested address is in the L1 cache and, in response to a determination that a requested address is not in the L1 cache, is further configured to transmit the requested address to a range register coupled to the L1 cache. The range register is configured to generate a class identifier in response to a received requested address and to transmit the requested address and class identifier to a replacement management table coupled to the range register. The replacement management table is configured to generate L2 tag replacement control indicia in response to a received requested address and class identifier. An L2 address register is coupled to the first address register and configured to request an address from an L2 cache. An L2 cache is coupled to the L2 address register and the replacement management table and is configured to determine whether a requested address is in the L2 cache and is further configured to assign replacement eligibility of at least one set of cache lines in the L2 cache in response to received L2 tag replacement control indicia. In response to a determination that a requested address is not in the L2 cache, the L2 cache is further configured to overwrite a cache line within a set of the L2 cache as a function of the replacement eligibility.
摘要:
A method and system for attached processing units accessing a shared memory in an SMT system. In one embodiment, a system comprises a shared memory. The system further comprises a plurality of processing elements coupled to the shared memory. Each of the plurality of processing elements comprises a processing unit, a direct memory access controller and a plurality of attached processing units. Each direct memory access controller comprises an address translation mechanism thereby enabling each associated attached processing unit to access the shared memory in a restricted manner without an address translation mechanism. Each attached processing unit is configured to issue a request to an associated direct memory access controller to access the shared memory specifying a range of addresses to be accessed as virtual addresses. The associated direct memory access controller is configured to translate the range of virtual addresses into an associated range of physical addresses.