摘要:
According to the invention, a process for averaging two pixel values is disclosed. In one step, an instruction is decoded. A plurality of first operands is loaded from a first input register. A plurality of second operands is loaded from a second input register. An average of one of the plurality of first operands and one of the plurality of second operands is produced. The average is stored in an output register.
摘要:
According to the invention, a processing core is disclosed that includes a first source register, a number of second operands, a destination register, and a number of arithmetic processors. A bitwise inverter is coupled to at least one of the first number of operands and the second number of operands. The first source register includes a plurality of first operands and the destination register includes a plurality of results. The number of arithmetic processors are respectively coupled to the first operands, second operands and results, wherein each arithmetic processor computes one of a sum and a difference of the first operand and a respective second operand.
摘要:
According to the invention, a processing core is disclosed. The processing core includes one or more processing pipelines and a number of register flies. The processing pipelines having a total of N-number of processing paths, where each of the processing paths processes instructions on M-bit data words. Each of the number of register files has Q-number of registers that are each M-bits wide. The Q-number of registers within each of the plurality of register files are either private or global registers. When a value is written to one of said Q-number of said registers, which is a global register within one of said number of register files, the value is propagated to a corresponding global register in the other of the number of register files. When a value is written to one of said Q-number of the registers, which is a private register within one of said number of register files, the value is not propagated to a corresponding register in the other of said number of register files.
摘要:
A microprocessor with reduced context switching overhead and a corresponding method is disclosed. The microprocessor comprises a working register file that comprises dirty bit registers and working registers. The working registers including one or more corresponding working registers for each of the dirty bit registers. The microprocessor also comprises a decoder unit that is configured to decode an instruction that has a dirty bit register field specifying a selected dirty bit register of the dirty bit registers. The decoder unit is configured to generate decode signals in response. Furthermore, the working register file is configured to cause the selected dirty bit register to store a new dirty bit in response to the decode signals. The new dirty bit indicates that each operand stored by the one or more corresponding working registers is inactive and no longer needs to be saved to memory if a new context switch occurs.
摘要:
An integrated processor/memory device comprising a main memory, a CPU, and a full width cache. The main memory comprises main memory banks. Each of the main memory banks stores rows of words. The rows are a predetermined number of words wide. The cache comprises cache banks. Each of the cache banks stores one or more cache lines of words. Each of the cache lines has a corresponding row in the corresponding main memory bank. The cache lines are the predetermined number of words wide. When the CPU issues an address in the address space of the corresponding main memory bank, the cache bank determines from the address and the tags of the cache lines whether a cache bank hit or a cache miss has occurred in the cache bank. When a cache bank miss occurs, the cache bank replaces a victim cache line of the cache lines with a new cache line that comprises the corresponding row of the corresponding memory bank specified by the issued address.
摘要:
In one embodiment, an apparatus includes a network management module configured to execute at a network device operatively coupled to a switch fabric. The network management module is configured to receive a first set of configuration information associated with a subset of network resources from a set of network resources, the set of network resources being included in a virtual local area network from a plurality of virtual local area networks, the plurality of virtual local area networks being defined within the switch fabric. The first set of configuration information dynamically includes at least a second set of configuration information associated with the set of network resources.
摘要:
An apparatus and method for mapping memory addresses to reduce or avoid conflicting memory accesses in memory systems such as cache memories is described in connection with a multithreaded multiprocessor chip. A CMT processor reduces the probability of hot-spots in cache operations by hashing certain bits of a physical cache address to form a hashed cache address. By using exclusive OR functionality to hash the index bits, an efficient address transformation is achieved for indexing into an L2 cache memory.
摘要:
A method and apparatus for delivering a device driver to an operating system without user intervention. One or more operating systems (e.g., different operating system programs, different versions of one operating system) execute on a computer platform. During booting of an operating system a device is identified for which a driver is needed. The driver is requested from a service processor of the platform, which includes memory or storage for storing multiple device drivers (or multiple versions of one driver, for different operating systems). The driver is retrieved from the service processor's storage and delivered to the operating system.
摘要:
A plurality of processors on a chip is operated in lockstep. A crossbar switch on the chip couples and decouples the plurality of processors to a plurality of banks in a level-two (L2) cache. As data is stored in a first bank of the L2 cache, the old data at that location is passed through the crossbar switch to a second bank of the L2 cache that is functioning as a first-in-first-out memory (FIFO). Thus, new data is cached at a location in the first bank of the level-two cache, i.e., stored, and old data, from that location, is logged in the second bank of the level-two cache. The logged data in the second bank is used to restore the first bank to a known prior state when necessary.
摘要:
A processing core comprising R-number of processing pipelines each comprising N-number of processing paths. Each of the R-number of processing pipelines are synchronized together to operate as a single very long instruction word (VLIW) processing core. The VLIW processing core is configured to process R×N-number of VLIW sub-instructions in parallel. In addition, the R-number of pipelines can be configured to operate independently as separately operating pipelines. In accordance with one embodiment of the present invention, each of the R-number of processing pipelines comprises S-number of register files, such that the processing core comprises R×S-number of register files. In accordance with another embodiment of the present invention, each of the R-number of processing pipelines comprises one register file for every two of the N-number of processing paths, such that S=N/2. In accordance with yet another embodiment of the invention, a single VLIW processing instruction comprises R×N-number of P-bit sub-instructions appended together.