Abstract:
A method of balancing load on multiple cores includes maintaining multiple bitmaps in a global memory location. Each bitmap indicates loads of the threads included in a thread domain. The multiple threads are associated with each core. Each core maintains and updates the respective bitmap based on the loads of the threads. The multiple bitmaps are maintained in the global memory location which is accessible by a multiple thread domains configured to execute threads using the cores. Execution of the multiple thread domains is balanced using the multiple cores based on loads of each thread described in each bitmap.
Abstract:
An apparatus, method, and computer program product are provided for utilizing secondary threads to assist primary threads in performing application tasks. In use, a plurality of primary threads are utilized for performing at least one of a plurality of tasks of an application utilizing at least one corresponding core. Further, it is determined whether the primary threads require assistance in performing one or more of the plurality of tasks of the application. Based on such determination, a plurality of secondary threads are utilized for performing the one or more of the plurality of tasks of the application.
Abstract:
A method implemented by a network element (NE) in a distributed system, the method comprising tracing an execution of a program in the distributed system to produce a record of the execution of the program, wherein the record indicates states of shared resources at various times during the execution of the program, identifying a vulnerable operation that occurred during the program execution based on the record, wherein the record indicates that a first shared resource of the shared resources is in a flawed state after a node that caused the first shared resource to be in the flawed state crashed, and determining that the vulnerable operation results in a time of fault (TOF) bug based on performing a fault-tolerance mechanism.
Abstract:
A method for operating a multithread processing system is provided, including assigning, by a controller, a subset of a plurality of tasks to a plurality of threads during a time N, collecting, by the controller, data during the time N concerning the operation of the plurality of threads, analyzing, by the controller, the data to determine at least one condition concerning the operation of the plurality of threads during the time N, and adjusting, by the controller, a number of the plurality of threads available in time N+1 in accordance with the at least one condition.
Abstract:
A method for region guided and change tolerant fast shortest path determination and graph preprocessing for network management and control. In an embodiment, a method includes partitioning, by a network component, a plurality of network nodes into a plurality of regions, each network node belonging to one of the regions; identifying, by the network component, border nodes for each region, each border node in a region connecting to at least one border node in a connecting region; determining, by the network component, intervals between regions according to the border nodes, each interval comprising a minimum distance and a maximum distance between two regions; determining, by the network component, a path from a source node to a target node according to the intervals.
Abstract:
Described herein are systems and methods for distributed concurrency (DC) bug detection. The method includes identifying a plurality of nodes in a distributed computing cluster; identifying a plurality of messages to be transmitted during execution of an application by the distributed computing cluster; determining a set of orderings of the plurality of messages for DC bug detection, the set of orderings determined based upon the plurality of nodes and the plurality of messages; removing a subset of the orderings from the set of orderings based upon one or more of a state symmetry algorithm, a disjoint-update independence algorithm, or a zero-crash-impact reordering algorithm; and performing DC bug detection testing using the set of orderings after the subset of the orderings is removed from the set of orderings.
Abstract:
A method for detecting distributed concurrency errors in a distributed cloud computing system includes tracing operations that access objects in functions involving inter-process messaging, applying a set of happens-before rules to the traced operations. Analyzing the traced operations to identify concurrent operations that access a common object to generate a list of potential distributed concurrency errors (DCbugs). Pruning the list of DCbugs to remove DCbugs having only local effect and that do not generate run-time errors.
Abstract:
A method implemented by a network element (NE) in a distributed system, the method comprising tracing an execution of a program in the distributed system to produce a record of the execution of the program, wherein the record indicates states of shared resources at various times during the execution of the program, identifying a vulnerable operation that occurred during the program execution based on the record, wherein the record indicates that a first shared resource of the shared resources is in a flawed state after a node that caused the first shared resource to be in the flawed state crashed, and determining that the vulnerable operation results in a time of fault (TOF) bug based on performing a fault-tolerance mechanism.
Abstract:
Described herein are systems and methods for distributed concurrency (DC) bug detection. The method includes identifying a plurality of nodes in a distributed computing cluster; identifying a plurality of messages to be transmitted during execution of an application by the distributed computing cluster; determining a set of orderings of the plurality of messages for DC bug detection, the set of orderings determined based upon the plurality of nodes and the plurality of messages; removing a subset of the orderings from the set of orderings based upon one or more of a state symmetry algorithm, a disjoint-update independence algorithm, or a zero-crash-impact reordering algorithm; and performing DC bug detection testing using the set of orderings after the subset of the orderings is removed from the set of orderings.
Abstract:
A method for detecting distributed concurrency errors in a distributed cloud computing system includes tracing operations that access objects in functions involving inter-process messaging, applying a set of happens-before rules to the traced operations. Analyzing the traced operations to identify concurrent operations that access a common object to generate a list of potential distributed concurrency errors (DCbugs). Pruning the list of DCbugs to remove DCbugs having only local effect and that do not generate run-time errors.