摘要:
A thread execution analyzer analyzes blocking events of threads in a program using execution data and callstacks collected at the blocking events. The thread execution analyzer attempts to identify an application programming interface (API) responsible for each blocking event and provides blocking analysis information to a user. The blocking analysis information may be used by a developer of the program to understand the causes of blocking events that occur for threads of the program.
摘要:
Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
摘要:
Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
摘要:
The present invention comprises a data access pattern interface that allows software to specify one or more data access patterns such as stream access patterns, pointer-chasing patterns and producer-consumer patterns. Software detects a data access pattern for a memory region and passes the data access pattern information to hardware via proper data access pattern instructions defined in the data access pattern interface. Hardware maintains the data access pattern information properly when the data access pattern instructions are executed. Hardware can then use the data access pattern information to dynamically detect data access patterns for a memory region throughout the program execution, and voluntarily invoke appropriate memory and cache operations such as pre-fetch, pre-send, acquire-ownership and release-ownership. Further, hardware can provide runtime monitoring information for memory accesses to the memory region, wherein the runtime monitoring information indicates whether the software-provided data access pattern information is accurate.
摘要:
The present invention extends to methods, systems, and computer program products for indicating parallel operations with user-visible events. Event markers can be used to indicate an abstracted outer layer of execution as well as expose internal specifics of parallel processing systems, including systems that provide data parallelism. Event markers can be used to show a variety of execution characteristics including higher-level markers to indicate the beginning and end of an execution program (e.g., a query). Inside the execution program (query) individual fork/join operations can be indicated with sub-levels of markers to expose their operations. Additional decisions made by an execution engine, such as, for example, when elements initially yield, when queries overlap or nest, when the query is cancelled, when the query bails to sequential operation, when premature merging or re-partitioning are needed can also be exposed.
摘要:
An analysis and visualization depicts how an application is leveraging processor cores of a distributed computing system, such as a computer cluster, in time. The analysis and visualization enables a developer to readily identify the degree of concurrency exploited by an application at runtime and the amount of overhead used by libraries or middleware. Information regarding processes or threads running on the nodes over time is received, analyzed, and presented to indicate portions of computer cluster that are used by the application, idle, other processes, and libraries in the system. The analysis and visualization can help a developer understand or confirm contention for or under-utilization of system resources for the application and libraries.
摘要:
The use of marker(s) in the source code of a program under evaluation. A representation of the marker(s) remains in the binary version of the program under evaluation. During execution, upon executing the marker, data is gathered regarding the timeline of the execution of the marker in the context of overall timeline of execution. A visualization of the marker is then displayed that illustrates the execution of the marker in the context of a larger timeframe of execution. Optionally, the marker may be associated with text, or other data, at least some of which being rendered with the visualization. Accordingly, an application developer, or indeed anyone evaluating the program, may place markers within source code and/or evaluate the timeline of execution of those markers.
摘要:
Thread blocking synchronization event analysis software uses kernel context switch data and thread unblocking data to form a visualization of thread synchronization behavior. The visualization provides interactive access to source code responsible for thread blocking, identifies blocking threads and blocked threads, summarizes execution delays due to synchronization and lists corresponding APIs and objects, correlates thread synchronization events with application program phases, and otherwise provides information associated with thread synchronization. The visualization may operate within an integrated development environment.
摘要:
A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.
摘要:
A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.