摘要:
A system event notification service detects that an event has occurred that impacts infrastructure of a computing resource service. In response to the event, the service identifies a customer account that is impacted by the event. The service generates, for the customer account, event data corresponding to a plurality of computing resources impacted by the event. The service provides the event data in accordance with one or more preferences specified in the customer account.
摘要:
The technology described in this document is, among other things, capable of efficiently monitoring storage device signal data for anomalies. In an example method, signal data for a plurality of non-transitory storage devices is collected. The method determines a hyper feature representation from the collected signal data and computes, using the hyper feature representation, scores for statistics associated with the non-transitory storage devices. The method further determines a reduced hyper feature representation aggregating the scores for each of the statistics associated with each of the non-transitory storage devices; generates, using the reduced hyper feature representation, storage device scores for the non-transitory storage devices of the plurality, respectively; and identifies one or more non-transitory storage devices from among the plurality of non-transitory storage devices exhibiting anomalous storage device behavior using the storage device scores.
摘要:
Monitoring the health of a computer system and suggesting an order of repair when problems within the computer system have been identified. Problem(s) and problem entity(s) within the computer system are identified during monitoring. Relationship(s) of the problem entities with other entities in the computer system are identified. A relationship type for each of the identified relationship(s) is determined. A combination of the identified problem(s), the identified problem entity(s), and the determined relationship type(s) is analyzed to determine an order in which repairs of one or more user-visible entities of the computing system should occur in order to address the identified problem(s). An alert comprising the determined order of the repairs is then presented to a user.
摘要:
Methods and apparatuses relating to memory corruption detection are described. In one embodiment, a hardware processor includes an execution unit to execute an instruction to request access to a block of a memory through a pointer to the block of the memory, and a memory management unit to allow access to the block of the memory when a memory corruption detection value in the pointer is validated with a memory corruption detection value in the memory for the block, wherein a position of the memory corruption detection value in the pointer is selectable between a first location and a second, different location.
摘要翻译:描述与存储器损坏检测有关的方法和设备。 在一个实施例中,硬件处理器包括执行单元和存储器管理单元,所述执行单元执行指令以请求访问存储器的块,所述指针指向存储器的块,以及存储器管理单元, 指针中的存储器损坏检测值用该存储器中用于该块的存储器损坏检测值进行验证,其中指针中存储器损坏检测值的位置可在第一位置和第二不同位置之间选择。 p >
摘要:
The present disclosure relates to an apparatus and a method for collecting failure/error history lists to identify and categorize erring memory locations in randomly accessible memory of a computer system. Method and apparatus consistent with the present disclosure may identify whether particular memory cells, rows of memory cells, or columns of memory cells within a memory device are associated with transient or persistent errors. These methods and apparatus may also avoid using portions of memory that have been associated with persistent errors or failures.
摘要:
Generating a graphical display region including a synchronized display of alert data and impact data indicative of conditions of a computing infrastructure is described. Alerts are identified where each alert has a timestamp indicative of a first time at which it was identified. An impact calculation is performed to generate the impact data based on alerts valid as of a second time proximate to an impact calculation start time. The generated graphical display region includes impact data valid as of a display time and alert data indicative of the alerts valid as of the second time.
摘要:
Example computer-implemented methods, computer-readable media, and computer systems are described for performing a computing node health check. In some aspects, a routine health check of a plurality of computing nodes of a computer system is performed. A computing job is assessed. A first set of computing nodes are allocated from the plurality of computing nodes to the computing job. A prior-job-execution diagnosis is performed on the first set of computing nodes. Whether the first set of computing nodes are all healthy is determined. In response to determining that the first set of computing nodes are healthy, the job is executed. The job is monitored while the job is running. Whether the job fails or succeeds is determined. In response to determining that the job fails, a post-job-execution diagnosis is performed on an exit code of the job. A result of the post-job-execution diagnosis is output via a user interface of the computer system.
摘要:
A data processing system (2) supports non-speculative execution of vector load instructions that perform at least one contingent load of a data value. Fault detection circuitry (26) serves to detect whether a contingent load is fault-generating contingent load or a fault-free contingent load. Contingent load suppression circuitry (28) detects and suppresses a fault-free contingent load that matches a predetermined criteria that may result in an undesired change of architectural state (undesired side-effect). Examples of such predetermined criteria are that the contingent load is to a non-memory device or that the contingent load will trigger a diagnostic response such as entry of a halting debug halting mode or triggering of a debug exception.
摘要:
A neural network (10) is implemented as a memristive neuromorphic circuit that includes a neuron circuit (112, 114) and a memristive device (113) connected to the neuron circuit (112, 114). An input voltage is sensed at a first terminal of the memristive device (113) during a feedforward operation of the neural network (10). An error voltage is sensed at a second terminal of the memristive device (113) during an error backpropagation operation of the neural network (10). In accordance with a training rule, a desired conductance change for the memristive device (113) is computed based on the sensed input voltage and the sensed error voltage. Then a training voltage is applied to the memristive device (113). Here, the training voltage is proportional to a logarithmic value of the desired conductance change.