摘要:
Embodiments of the present invention provides for the execution of threads and/or workitems on multiple processors of a heterogeneous computing system in a manner that they can share data correctly and efficiently. Disclosed method, system, and article of manufacture embodiments include, responsive to an instruction from a sequence of instructions of a work-item, determining an ordering of visibility to other work-items of one or more other data items in relation to a particular data item, and performing at least one cache operation upon at least one of the particular data item or the other data items present in any one or more cache memories in accordance with the determined ordering. The semantics of the instruction includes a memory operation upon the particular data item.
摘要:
Embodiments of the present invention provides for the execution of threads and/or workitems on multiple processors of a heterogeneous computing system in a manner that they can share data correctly and efficiently. Disclosed method, system, and article of manufacture embodiments include, responsive to an instruction from a sequence of instructions of a work-item, determining an ordering of visibility to other work-items of one or more other data items in relation to a particular data item, and performing at least one cache operation upon at least one of the particular data item or the other data items present in any one or more cache memories in accordance with the determined ordering. The semantics of the instruction includes a memory operation upon the particular data item.
摘要:
Provided is a method of permitting the reordering of a visibility order of operations in a computer arrangement configured for permitting a first processor and a second processor threads to access a shared memory. The method includes receiving in a program order, a first and a second operation in a first thread and permitting the reordering of the visibility order for the operations in the shared memory based on the class of each operation. The visibility order determines the visibility in the shared memory, by a second thread, of stored results from the execution of the first and second operations.
摘要:
Provided is a method of permitting the reordering of a visibility order of operations in a computer arrangement configured for permitting a first processor and a second processor threads to access a shared memory. The method includes receiving in a program order, a first and a second operation in a first thread and permitting the reordering of the visibility order for the operations in the shared memory based on the class of each operation. The visibility order determines the visibility in the shared memory, by a second thread, of stored results from the execution of the first and second operations.
摘要:
Existing multiprocessor computing systems often have insufficient memory coherency and, consequently, are unable to efficiently utilize separate memory systems. Specifically, a CPU cannot effectively write to a block of memory and then have a GPU access that memory unless there is explicit synchronization. In addition, because the GPU is forced to statically split memory locations between itself and the CPU, existing multiprocessor computing systems are unable to efficiently utilize the separate memory systems. Embodiments described herein overcome these deficiencies by receiving a notification within the GPU that the CPU has finished processing data that is stored in coherent memory, and invalidating data in the CPU caches that the GPU has finished processing from the coherent memory. Embodiments described herein also include dynamically partitioning a GPU memory into coherent memory and local memory through use of a probe filter.
摘要:
Embodiments of the present invention provide a method of a first processor using a memory resource associated with a second processor. The method includes receiving a memory instruction from a first processor process, wherein the memory instruction refers to a shared memory address (SMA) that maps to a second processor memory. The method also includes mapping the SMA to the second processor memory, wherein the mapping produces a mapping result and providing the mapping result to the first processor.
摘要:
Methods and systems are provided for mapping a memory instruction to a shared memory address space in a computer arrangement having a CPU and an APD. A method includes receiving a memory instruction that refers to an address in the shared memory address space, mapping the memory instruction based on the address to a memory resource associated with either the CPU or the APD, and performing the memory instruction based on the mapping.
摘要:
A multiprocessor computer system is provided having a multiplicity of sub-systems and a main memory coupled to a system controller. An interconnect module, interconnects the main memory and sub-systems in accordance with interconnect control signals received from the system controller. At least two of the sub-systems are data processors, each having a respective cache memory that stores multiple blocks of data and a set of master cache tags (Etags), including one cache tag for each data block stored by the cache memory. Each data processor includes a master interface for sending memory transaction requests to the system controller. The system controller processes each memory transaction and maintains a set of duplicate cache tags (Dtags) for each data processor. Finally, the system controller contains transaction execution circuitry for activating a transaction for servicing by the interconnect. The transaction execution circuitry pipelines memory access requests from the data processors, and includes invalidation circuitry for processing each writeback request from a given data processor prior to activation to determine if the Dtag index corresponding to the victimized cache line is invalid. Thereafter, the invalidation circuitry activates writeback requests only if the Dtag index is not invalid and cancels the writeback request if the Dtag index is invalid.
摘要:
A multiprocessor computer system has data processors and a main memory coupled to a system controller. Each data processor has a cache memory. Each cache memory has a cache controller with two ports for receiving access requests. A first port receives access requests from the associated data processor and a second port receives access requests from the system controller. All cache memory access requests include an address value; access requests from the system controller also include a mode flag. A comparator in the cache controller processes the address value in each access request and generates a hit/miss signal indicating whether the data block corresponding to the address value is stored in the cache memory. The cache controller has two modes of operation, including a first standard mode of operation in which read/write access to the cache memory is preceded by generation of the hit/miss signal by the comparator, and a second accelerated mode of operation in which read/write access to the cache memory is initiated without waiting for the comparator to process the access request's address value. The first mode of operation is used for all access requests by the data processor and for system controller access requests when the mode flag has a first value. The second mode of operation is used for the system controller access requests when the mode flag has a second value distinct from the first value.
摘要:
Executing an ordering operation is disclosed. A store operation associated with storing a value into a portion of a memory is initiated. An ordering operation to ensure that the store operation, but not necessarily all store operations, are completed is executed.