Abstract:
A system includes a memory, a processor, and an accelerator circuit. The accelerator circuit includes an internal memory, an input circuit block, a filter circuit block, a post-processing circuit block, and an output circuit block to concurrently perform tasks of a neural network application assigned to the accelerator circuit by the processor.
Abstract:
A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more complex arithmetic instructions. The computer processor may be implemented as a monolithic integrated circuit.
Abstract:
A computer processor that implements pre-translation of virtual addresses is disclosed. The computer processor may include a register file comprising one or more registers. The computer processor may include processing logic. The processing logic may receive a value to store in a register of one or more registers. The processing logic may store the value in the register. The processing logic may designate the received value as a virtual address, the virtual address having a corresponding virtual base page number. The processing logic may translate the virtual base page number to a corresponding real base page number and zero or more real page numbers corresponding to zero or more virtual page numbers adjacent to the virtual base page number. The processing logic may further store in the register of the one or more registers the real base page number and the zero or more real page numbers.
Abstract:
A processor includes a plurality of physical registers and a processor core, communicatively coupled to the plurality of physical registers, the processor core to execute a process comprising a plurality of instructions to responsive to issuance of a call instruction for out-of-order execution, identify, based on a head pointer of the plurality of physical registers, a first physical register of the plurality of physical registers, store a return address in the first physical register, wherein the first physical register is associated with a first identifier, store, based on an out-of-order pointer of a call stack associated with the process, the first identifier in a first entry of the call stack, and increment, modulated by a length of the call stack, the out-of-order pointer of the call stack to point to a second entry of the call stack.
Abstract:
A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising one or more registers to hold a varying number of elements. The computer processor may further comprise processing logic configured to implicitly type each of the varying number of elements in the vector register file. The computer processor may be implemented as a monolithic integrated circuit.
Abstract:
A computer processor with register direct branches and employing an instruction preload structure is disclosed. The computer processor may include a hierarchy of memories comprising a first memory organized in a structure having one or more entries for one or more addresses corresponding to one or more instructions. The one or more entries of the one or more addresses may have a starting address. The structure may have one or more locations for storing the one or more instructions. The computer processor may include one or more registers to which one or more corresponding instruction addresses are writable. The computer processor may include processing logic. In response to the processing logic writing the one or more instruction addresses to the one or more registers, the processing logic may to pre-fetch the one or more instructions of a linear sequence of instructions from a first memory level of the hierarchy of memories into a second memory level of the hierarchy of memories beginning at the starting address. At least one address of the one or more addresses may be the contents of a register of the one or more registers.
Abstract:
A computer processor is disclosed. The computer processor comprises a vector unit comprising a vector register file comprising at least one vector register to hold a varying number of elements. The number of architected vector registers in the vector register file differs from the number of physical vector registers in the vector register file.
Abstract:
A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more graphics processing instructions. The computer processor may be implemented as a monolithic integrated circuit.
Abstract:
A system and an accelerator circuit including a register file comprising instruction registers to store a trigonometric calculation instruction for evaluating a trigonometric function, and data registers comprising a first data register to store a floating-point input value associated with the trigonometric calculation instruction. The accelerator circuit further includes a determination circuit to identify the trigonometric calculation function and the floating-point input value associated with the trigonometric calculation instruction and determine whether the floating-point input value is in a small value range, and an approximation circuit to responsive to determining that the floating-point input value is in the small value, receive the floating-point input value and calculate an approximation of the trigonometric function with respect to the input value.
Abstract:
A system and an accelerator circuit including a register file comprising instruction registers to store an instruction for evaluating an elementary function, and data registers comprising a first data register to store an input value. The accelerator circuit further includes a successive cumulative rotation circuit comprising a reconfigurable inner stage to perform a successive cumulative rotation recurrence, and a determination circuit to determine a type of the elementary function based on the instruction, and responsive to determining that the input value is a fixed-point number, configure the reconfigurable inner stage to a configuration for evaluating the type of the elementary function, wherein the successive cumulative rotation circuit is to calculate an evaluation of the elementary function using the reconfigurable inner stage performing the successive cumulative rotation recurrence.