Abstract:
Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.
Abstract:
A system and method of hardware multithreading in VLIW DSPs includes an instruction fetch and dispatch unit, a plurality of program control units coupled to the instruction fetch and dispatch unit, a plurality of function units coupled to the plurality of program control units, and a mode control unit coupled to the function units and the program control units, the mode control unit configured to dynamically organize the plurality of function units and the plurality of program control units into one or more threads, each thread comprising a program control of the plurality of program control units and a subset of the plurality of function units.
Abstract:
Methods and apparatus for inter-process communication are provided. A circuit may have a plurality of clusters, and at least one cluster may have a computation element (CE), a memory operatively coupled with the CE, and an autonomic transport system (ATS) block operatively coupled with the CE and the memory. The ATS block may be configured to perform inter-process communication (IPC) for the at least one cluster. In one embodiment, the ATS block may transfer a message to a different cluster based on a request from the CE. In another embodiment, the ATS block may receive a message by allocating a buffer in the memory and write the message into the buffer. The ATS block may also be configured to manage synchronization and schedule tasks for the CE.
Abstract:
An apparatus comprises a central processor that outputs a first control signal to data organizers that organizes and moves data and a second control signal to vector processors that receives a first and second set of data from the data organizers. A first vector processor includes a first instruction circuit that executes a first plurality of vector functions and a second instruction circuit that executes a second plurality of vector functions. A first vector function is selected from the first plurality of vector functions to process the first set of data in response to the second control signal. Similarly, a second vector function is selected from the second plurality of vector functions to process the second set of data in response to the second control signal.
Abstract:
An apparatus and method are provided for allocating resources to a plurality of threads to perform a service. In use, a request for service is received. At least one of a plurality of resources is allocated to the threads. Further, the service is performed with the threads, utilizing the allocated at least one resource.
Abstract:
An apparatus and method are provided for allocating resources to a plurality of threads to perform a service. In use, a request for service is received. At least one of a plurality of resources is allocated to the threads. Further, the service is performed with the threads, utilizing the allocated at least one resource.
Abstract:
An embodiment of a system and method for performing a numerical operation on input data in a hybrid floating-point format includes representing input data as a sign bit, exponent bits, and mantissa bits. The exponent bits are represented as an unsigned integer including an exponent bias, and a signed numerical value of zero is represented as a first reserved combination of the mantissa bits and the exponent bits. Each of all other combinations of the mantissa bits and the exponent bits represents a real finite non-zero number. The mantissa bits are operated on with a “one” bit before a radix point for the all other combinations of the mantissa bits and the exponent bits.
Abstract:
A method for scaling a plurality of data values includes storing a first subset of data values of the plurality of data values into a first vector register, determining a maximum data value of the first subset of data values, and storing the greater of the maximum data value and a value stored in a scalar register to the scalar register. Each data value of the subset of data values is stored in a different element of the first vector register. The method further includes determining an adjustment factor based on the value stored in the scalar register and adjusting each data value of the plurality of data values by the adjustment factor.
Abstract:
Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.
Abstract:
An embodiment of a system and method for performing a numerical operation on input data in a hybrid floating-point format includes representing input data as a sign bit, exponent bits, and mantissa bits. The exponent bits are represented as an unsigned integer including an exponent bias, and a signed numerical value of zero is represented as a first reserved combination of the mantissa bits and the exponent bits. Each of all other combinations of the mantissa bits and the exponent bits represents a real finite non-zero number. The mantissa bits are operated on with a “one” bit before a radix point for the all other combinations of the mantissa bits and the exponent bits.