摘要:
Techniques for adaptive trace logging include, in one embodiment, obtaining input data on trace logging behavior and computing resources used by trace logging. Based on the obtained input data, an adaptive trace logging module automatically takes action at runtime to reduce the amount of computing resources consumed by tracing logging. For example, the action taken may include decreasing a trace logging level of an executing software program to reduce the number of trace logging messages added to a trace log. In another embodiment, the techniques include detecting a condition of an executing software program that warrants a change to a trace logging level of the executing program. The adaptive trace logging module automatically changes the trace logging level of the executing program as-needed for the detected condition. For example, the adaptive trace logging module may increase the trace logging level of an executing program upon detecting a deadlock or other abnormal condition of the executing program. By automatically increasing the trace logging level upon detecting an abnormal condition, additional trace logging messages may be written to a trace log aiding diagnosis and troubleshooting of the condition.
摘要:
A system, method, and computer program product are described for distinguishing between a computing system that is hung in a hang state and systems that are in an idle or otherwise non-hang state which do not need intervention before regaining the ability to adequately process work. According to some approaches, heuristics are employed to perform hang and idle system detection and validation. Data representative of systems resources are analyzed and transformed in order to identify systems that are in a hang state.
摘要:
Techniques are described herein for synchronizing cluster time. According to one technique, a master node is appointed in a cluster. Other “slave” nodes periodically synchronize their clocks with the master node. To synchronize its clock with the master node, a slave node sends a timestamped message to the master node, which also timestamps the message and sends the message back to the slave node, which then timestamps the message again. Based on the timestamps, the slave node is able to determine the difference between the master node's clock's time and slave node's clock's time, compensating for the message travel time between master node and slave node. Depending on various circumstances, and based on the determined difference, the slave node adjusts its clock so that the time indicated by the slave node's clock at least begins to approach more closely the time indicated by the master node's clock.
摘要:
A method and apparatus is provided for determining the most probable cause of a problem observed in a complex multi-host system. The approach relies on a probabilistic model to represent causes and effects in a complex computing system. However, complex systems include a multitude of independently operating components that can cause temporary anomalous states. To reduce the resources required to perform root cause analysis on each transient failure, as well as to raise the confidence in the most probable cause of a failure that is identified by the model, inputs to the probabilistic model are aggregated over a sliding window of values from the recent past.
摘要:
A method and apparatus for managing shared resources in a clustered database management system is provided. In an embodiment, multiple master nodes exist in a database management system. A master node receives a lock request from a second node. The lock request is a request for a lock on a shared resource. The master node grants the lock request to the second node. While the second node holds the lock, the second node causes the master node to modify the shared resource.
摘要:
Systems, methods, and other embodiments associated with selective tag-based file backup and recovery are described. One example method includes selectively tagging a file for inclusion in a snapshot-based backup image by associating a tag with the file. The associating may include encoding file metadata with a tag. The method may include selectively adding a file to the backup image upon determining that the file has experienced a write event and that the file is associated with a tag. The method may also include receiving a request to provide a recovery file from the backup image and selectively providing the recovery file upon determining that the recovery file is associated with a recovery tag specified in the request.
摘要:
Described herein are techniques for time limited lock ownership. In one embodiment, in response to receiving a request for a lock on a shared resource, the lock is granted and a lock lease period associated with the lock is established. Then, in response to determining that the lock lease period has expired, one or more lock lease expiration procedures are performed. In many cases, the time limited lock ownership may prevent system hanging, timely detect system deadlocks, and/or improve overall performance of the database.
摘要:
Techniques are provided for remastering resources based on node utilization. According to one such technique, resources are remastered in response to the over-utilization of the node that currently masters those resources. The utilization of each node is tracked, and when a particular node's utilization exceeds a specified threshold, selected resources that are currently mastered by that node are remastered so that nodes other than the particular node become the new masters for the selected resources. Each node's utilization is based on that node's capacity, and each node's capacity may differ. According to an aspect of one technique, each node's capacity is based on that node's processing resources and memory resources. Remastering resources in this manner tends to reduce the average amount of time taken for nodes to handle requests for the resources that they master.
摘要:
Various techniques are described for improving the performance of a shared-nothing database system in which at least two of the nodes that are running the shared-nothing database system have shared access to a disk. Specifically, techniques are provided for changing the ownership of data in a shared-nothing database dynamically, based on factors such as which node would be the most efficient owner relative to the performance of a particular operation. Once determined, the ownership of the data may be changed permanently to the new owner, or temporarily for the duration of the particular operation.
摘要:
A method and apparatus for managing locks in a database system is provided. A master node grants a lock on a first resource and a group of resources that includes the first resource to a first requester node. The requester node receives a mapping corresponding to the group of resources that may indicate that a lock already exists for a second resource in the group. If the requester node desires a lock on a resource located in the group, the requester node grants itself the lock without notifying the master node. A second requester node requests a lock for a particular resource in the group of resources. The first requester node grants the lock on the particular resource and updates the mapping to indicate that a different node holds a lock for the particular resource.