摘要:
Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings.
摘要:
Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings.
摘要:
Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings.
摘要:
In a system comprising a transactional memory architecture, initiating a transactional memory based transaction and then, within the transaction, checking a lock and if the lock is free, executing a critical section.
摘要:
A system of extending functionalities of a host device using a smart flash storage device comprises the host device having a host interface and configured to perform a specific function to generate a first set of data. The host device is coupled with a flash storage device. The flash storage device is configured to conform to a flash memory interface. A set of data generated by the host device is to be stored in flash memory storage of the flash storage device. A processor of the flash storage device is configured to run one or more user applications to process the set of data. The processor is to operate using power supplied by the host device.
摘要:
In an embodiment of the invention, a technique includes assigning a first pointer to an address of an array and using the first pointer to identify a first location of first data of the array. The first pointer is used to locate at least one additional pointer to identify at least one additional location of additional data of the array.
摘要:
A system of extending functionalities of a host device using a smart flash storage device comprises the host device having a host interface and configured to perform a specific function to generate a first set of data. The host device is coupled with a flash storage device. The flash storage device is configured to conform to a flash memory interface. A set of data generated by the host device is to be stored in flash memory storage of the flash storage device. A processor of the flash storage device is configured to run one or more user applications to process the set of data. The processor is to operate using power supplied by the host device.
摘要:
A hierarchical software profiling mechanism that gathers hierarchical path profile information has been described. Software to be profiled is instrumented with instructions that save an outer path sum when an inner region is entered, and restore the outer path sum when the inner region is exited. When the inner region is being executed, an inner path sum is generated and a profile indicator representing the inner path traversed is updated prior to the outer path sum being restored. The software to be profiled is instrumented using information from augmented control flow graphs that represent the software.
摘要:
A speculative code reuse mechanism includes a reuse buffer, a main processing core and a reuse checking core. The reuse buffer includes inputs and outputs of previously executed instances of code reuse regions. Aliased reuse regions are regions that access memory locations that may change between executions of the region. When an aliased code reuse region is encountered and a matching instance exists in the reuse buffer, the main core speculatively executes code occurring after the reuse region, while the reuse checking core executes code from the reuse region to verify the matching instance. If the matching instance is verified, the speculative execution is committed, and if the matching instance is not verified, the speculative execution is squashed.