Abstract:
A system and an accelerator circuit including a register file comprising instruction registers to store a trigonometric calculation instruction for evaluating a trigonometric function, and data registers comprising a first data register to store a floating-point input value associated with the trigonometric calculation instruction. The accelerator circuit further includes a determination circuit to identify the trigonometric calculation function and the floating-point input value associated with the trigonometric calculation instruction and determine whether the floating-point input value is in a small value range, and an approximation circuit to responsive to determining that the floating-point input value is in the small value, receive the floating-point input value and calculate an approximation of the trigonometric function with respect to the input value.
Abstract:
A processor including a vector register file comprising a plurality of vector registers, at least one buffer register, and a vector processing core, communicatively connected to the vector register file and the at least one buffer register, to receive a vector instruction comprising a first identifier representing a first vector register of the plurality of vector registers, and a second identifier representing a second vector register of the plurality of vector registers, wherein the first vector register is a source register and the second vector register is a destination register, execute the vector instruction based on data values stored in the first vector register to generate a result and store the result in the at least one buffer register, and responsive to determining that the second vector register is safe to write, copy the result from the at least one buffer register to the second vector register.
Abstract:
A processor comprising a cache, the cache comprising a cache line, an execution unit to execute an atomic primitive to responsive to executing a read instruction to retrieve a data item from a memory location, cause to store a copy of the data item in the cache line, execute a lock instruction to lock the cache line to the processor, execute at least one instruction while the cache line is locked to the processor, and execute an unlock instruction to cause the cache controller to release the cache line from the processor.
Abstract:
A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising one or more registers to hold a varying number of elements. The computer processor may further comprise processing logic configured to implicitly type each of the varying number of elements in the vector register file. The computer processor may be implemented as a monolithic integrated circuit.
Abstract:
A computer processor is disclosed. The computer processor may comprises a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that produce results with elements of widths different than that of the input elements. The computer processor may be implemented as a monolithic integrated circuit.
Abstract:
A processor includes a plurality of physical registers and a processor core, communicatively coupled to the plurality of physical registers, the processor core to execute a process comprising a plurality of instructions to responsive to issuance of a call instruction for out-of-order execution, identify, based on a head pointer of the plurality of physical registers, a first physical register of the plurality of physical registers, store a return address in the first physical register, wherein the first physical register is associated with a first identifier, store, based on an out-of-order pointer of a call stack associated with the process, the first identifier in a first entry of the call stack, and increment, modulated by a length of the call stack, the out-of-order pointer of the call stack to point to a second entry of the call stack.
Abstract:
A computer processor may include a plurality of hardware threads. The computer processor may further include state processor logic for a state of a hardware thread. The state processor logic may include per thread logic that contains state that is replicated in each hardware thread of the plurality of hardware threads and common logic that is independent of each hardware thread of the plurality of hardware threads. The computer processor may further include single threaded mode logic to execute instructions in a single threaded mode from only one hardware thread of the plurality of hardware threads. The computer processor may further include second mode logic to execute instructions in a second mode from more than one hardware thread of the plurality of hardware threads simultaneously. The computer processor may further include witching mode logic to switch between the first mode and the second mode.
Abstract:
A computer processor with an address register file is disclosed. The computer processor may include a memory. The computer processor may further include a general purpose register file comprising at least one general purpose register. The computer processor may further include an address register file comprising at least one address register. The computer processor may further include having access to the memory, the general purpose register file, and the address register file. The processing logic may execute a memory access instruction that accesses one or more memory locations in the memory at one or more corresponding addresses computed by retrieving the value of an address register of the at least one register of the address register file specified in the instruction and adding a displacement value encoded in the instruction.
Abstract:
A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more graphics processing instructions. The computer processor may be implemented as a monolithic integrated circuit.
Abstract:
A system and an accelerator circuit including a register file comprising instruction registers to store an instruction for evaluating an elementary function, and data registers comprising a first data register to store an input value. The accelerator circuit further includes a successive cumulative rotation circuit comprising a reconfigurable inner stage to perform a successive cumulative rotation recurrence, and a determination circuit to determine a type of the elementary function based on the instruction, and responsive to determining that the input value is a fixed-point number, configure the reconfigurable inner stage to a configuration for evaluating the type of the elementary function, wherein the successive cumulative rotation circuit is to calculate an evaluation of the elementary function using the reconfigurable inner stage performing the successive cumulative rotation recurrence.