Abstract:
A computer processor with indirect only branching is disclosed. The computer processor may include one or more target registers. The computer processor may include processing logic in signal communication with the one or more target registers. The processing logic may execute a non-interrupting branch instruction based on a value stored in a target register of the one or more target registers. The non-interrupting branch instruction may use the one or more target registers to specify a destination address of a branch specified by the non-interrupting branch instruction.
Abstract:
A chaining bit decoder of a computer processor receives an instruction stream. The chaining bit decoder selects a group of instructions from the instruction stream. The chaining bit decoder extracts a designated bit from each instruction of the instruction stream to produce a sequence of chaining bits. The chaining bit decoder decodes the sequence of chaining bits. The chaining bit decoder identifies zero or more instruction stream dependencies among the selected group of instructions in view of the decoded sequence of chaining bits. The chaining bit decoder outputs control signals to cause one or more pipelines stages of the processor to execute the selected group of instructions in view of the identified zero or more instruction stream dependencies among the group sequence of instructions.
Abstract:
A processor includes a vector register file including vector registers, at least one buffer register, and a vector processing core to receive a vector instruction comprising a first identifier representing a first vector register of the vector registers, and a second identifier representing a second vector register of the vector registers, wherein the first vector register is a source register and the second vector register is a destination register, execute the vector instruction based on data values stored in the first vector register to generate a result and store the result in the at least one buffer register, and copy the result from the at least one buffer register to the second vector register.
Abstract:
A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.
Abstract:
A processor includes a register file comprising a length register, a vector register file comprising a plurality of vector registers, a mask register file comprising a plurality of mask registers, and a vector instruction execution circuit to execute a masked vector instruction comprising a first length register identifier representing the length register, a first vector register identifier representing a first vector register of the vector register file, and a first mask register identifier representing a first mask register of the mask register file, wherein the length register is to store a length value representing a number of operations to be applied to data elements stored in the first vector register, the first mask register is to store a plurality of mask bits, and a first mask bit of the plurality of mask bits determines whether a corresponding first one of the operations causes an effect.
Abstract:
A system and an accelerator circuit including a register file comprising instruction registers to store a trigonometric calculation instruction for evaluating a trigonometric function, and data registers comprising a first data register to store a floating-point input value associated with the trigonometric calculation instruction. The accelerator circuit further includes a determination circuit to identify the trigonometric calculation function and the floating-point input value associated with the trigonometric calculation instruction and determine whether the floating-point input value is in a small value range, and an approximation circuit to responsive to determining that the floating-point input value is in the small value, receive the floating-point input value and calculate an approximation of the trigonometric function with respect to the input value.
Abstract:
A system and an accelerator circuit includes an internal memory to store data received a memory associated with a processor and a filter circuit block comprising a plurality of circuit stripes, each circuit stripe including a filter processor, a plurality of filter circuits, and a slice of the internal memory assigned to the plurality of filter circuits, where the filter processor is to execute a filter instruction to read data values from the internal memory based on a first memory address, for each of the plurality of circuit stripes: load the data values in weight registers and input registers associated with the plurality of filter circuits of the circuit stripe to generate a plurality of filter results, and write a result generated using the plurality of filter circuits in the internal memory at a second memory address.
Abstract:
A system and method include an accelerator circuit comprising an input circuit block, a filter circuit block, a post-processing circuit block, and an output circuit block and a processor to initialize the accelerator circuit, determining tasks of a neural network application to be performed by at least one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, assign each of the tasks to a corresponding one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, instruct the accelerator circuit to perform the tasks, and execute the neural network application based on results received from the accelerator circuit completing performance of the tasks.
Abstract:
A system and method relating to object detection using multiple sensor devices include receiving a range data comprising a plurality of points, each of plurality of points being associated with an intensity value and a depth value, determining, based on the intensity values and depth values of the plurality of points, abounding box surrounding a cluster of points among the plurality of points, receiving a video image comprising an array of pixels, determining a region in the video image corresponding to the bounding box, and applying a first neural network to the region to determine an object captured by the range data and the video image.
Abstract:
A computer processor may include a plurality of hardware threads. The computer processor may further include state processor logic for a state of a hardware thread. The state processor logic may include per thread logic that contains state that is replicated in each hardware thread of the plurality of hardware threads and common logic that is independent of each hardware thread of the plurality of hardware threads. The computer processor may further include single threaded mode logic to execute instructions in a single threaded mode from only one hardware thread of the plurality of hardware threads. The computer processor may further include second mode logic to execute instructions in a second mode from more than one hardware thread of the plurality of hardware threads simultaneously. The computer processor may further include switching mode logic to switch between the first mode and the second mode.