-
21.
公开(公告)号:US10664341B2
公开(公告)日:2020-05-26
申请号:US16053538
申请日:2018-08-02
发明人: Wang Ping He , Larry Juarez , Matthew J. Kalos , John N. McCauley , Louis A. Rasor , Brian A. Rinaldi , Todd C. Sorenson
摘要: Provided are a method, a system, and a computer program product in which a storage controller determines one or more resources that are impacted by an error. A cleanup of tasks associated with the one or more resources that are impacted by the error is performed, to recover from the error, wherein host input/output (I/O) operations continue to be processed, and wherein tasks associated with other resources continue to execute.
-
公开(公告)号:US20200081630A1
公开(公告)日:2020-03-12
申请号:US16124123
申请日:2018-09-06
IPC分类号: G06F3/06 , G06F12/0868
摘要: A method for protecting data in a storage system is disclosed. In one embodiment, such a method includes detecting, by a first rack power controller, first battery-on status associated with a first uninterruptible power supply. The method further detects, by a second rack power controller, second battery-on status associated with a second uninterruptible power supply. The method communicates, from the first rack power controller to the second rack power controller, the first battery-on status. The method then triggers, by the second rack power controller, a dump of modified data from memory to more persistent storage upon detecting both the first battery-on status and the second battery-on status. A corresponding system and computer program product are also disclosed.
-
公开(公告)号:US20200073772A1
公开(公告)日:2020-03-05
申请号:US16677647
申请日:2019-11-07
发明人: Herve G.P. Andre , Matthew D. Carson , Rashmi Chandra , Clint A. Hardy , Larry Juarez , Tony Leung , Todd C. Sorenson
摘要: Provided are a computer program product, system, and method for a computer program product, system, and method for determining an availability score based on available resources of different resource types in a distributed computing environment of storage servers to determine whether to perform a failure operation for one of the storage servers. A health status monitor program deployed in the storage servers performs: maintaining information indicating availability of a plurality of storage server resources for a plurality of resource types; calculating an availability score as a function of a number of available resources of the resource types; and transmitting information on the availability score to a management program. The management program uses the transmitted information to determine whether to migrate services from the storage server from which the availability score is received to at least one of the other storage servers in the distributed computing environment.
-
公开(公告)号:US10545553B2
公开(公告)日:2020-01-28
申请号:US15640176
申请日:2017-06-30
摘要: In one embodiment, a method includes determining a plurality of hardware components of a system. The method also includes power cycling a first hardware component of the plurality of hardware components of the system according to a dynamic schedule. Also, the method includes determining whether the first hardware component experienced a power-up failure resulting from the power cycling. Moreover, the method includes outputting an indication to replace and/or repair the first hardware component in response to a determination that the first hardware component experienced the power-up failure resulting from the power cycling. Other systems, methods, ad computer program products for preventing unexpected power-up failures of individual hardware components are described in accordance with more embodiments.
-
公开(公告)号:US20190354158A1
公开(公告)日:2019-11-21
申请号:US16525342
申请日:2019-07-29
摘要: In one embodiment, a method includes determining a plurality of hardware components of a system. The method also includes power cycling a first hardware component of the plurality of hardware components of the system according to a dynamic schedule. A period of time in which power cycling of the first hardware component takes place is shortened as the age of the first hardware component approaches the expected lifespan of the first hardware component. Also, the method includes determining whether the first hardware component experienced a power-up failure resulting from the power cycling. Moreover, the method includes outputting an indication to replace and/or repair the first hardware component in response to a determination that the first hardware component experienced the power-up failure resulting from the power cycling. Other systems, methods, ad computer program products for preventing unexpected power-up failures of individual hardware components are described in accordance with more embodiments.
-
26.
公开(公告)号:US20190332562A1
公开(公告)日:2019-10-31
申请号:US16284977
申请日:2019-02-25
IPC分类号: G06F13/40
摘要: Provided are techniques for detecting a type of storage adapter connected to an Input/Output (I/O) bay and miscabling of a microbay housing the storage adapter. Under control of an Input/Ouput (I/O) bay, cable sidebands are driven high for a predetermined period of time. It is determined whether a cable sidebands response has been detected that indicates that the cable sidebands have been driven low. In response to determining that the cable sidebands response has been detected, it is determined that the I/O bay is connected to a first storage adapter supporting a first protocol for the cable sidebands. In response to determining that the cable sidebands response has not been detected, it is determined that the I/O bay is connected to a second storage adapter supporting a second protocol for the cable sidebands. Moreover, I/O bay and port numbers stored by the microbay are used to determine miscabling.
-
公开(公告)号:US10324780B2
公开(公告)日:2019-06-18
申请号:US15650128
申请日:2017-07-14
摘要: For efficient data system error recovery, an error threshold is dynamically adjusted from a default error threshold to one of a plurality of error threshold values comprising at least high threshold values, medium threshold values, and low threshold values, for a particular error associated with an event object indicating a responsive action for handling the particular error in a data system. The responsive action to the event object comprises determining whether the error threshold needs to be adjusted for the particular error, and if it is determined the error threshold for the particular error does not need adjustment, the default error threshold is used.
-
公开(公告)号:US20190004581A1
公开(公告)日:2019-01-03
申请号:US15640176
申请日:2017-06-30
摘要: In one embodiment, a method includes determining a plurality of hardware components of a system. The method also includes power cycling a first hardware component of the plurality of hardware components of the system according to a dynamic schedule. Also, the method includes determining whether the first hardware component experienced a power-up failure resulting from the power cycling. Moreover, the method includes outputting an indication to replace and/or repair the first hardware component in response to a determination that the first hardware component experienced the power-up failure resulting from the power cycling. Other systems, methods, ad computer program products for preventing unexpected power-up failures of individual hardware components are described in accordance with more embodiments.
-
29.
公开(公告)号:US10114633B2
公开(公告)日:2018-10-30
申请号:US15373116
申请日:2016-12-08
发明人: Gary W. Batchelor , Veronica S. Davila , Enrique Q. Garcia , Robin Han , Jay T. Kirch , Ronald D. Martens , Trung N. Nguyen , Brian A. Rinaldi , Todd C. Sorenson
摘要: Provided are techniques for concurrent Input/Output (I/O) enclosure firmware/Field-Programmable Gate Array (FPGA) update in a multi-node environment. First notifications are sent to each I/O enclosure management engine on each of a plurality of server nodes that code activation for a first set of I/O enclosures is starting. An update image is distributed to the first set of I/O enclosures. The update image on the first set of I/O enclosures is activated by sending an activate reset command to each of the first set of I/O enclosures, wherein a reset is not propagated to other devices within each I/O enclosure in the first set of I/O enclosures in response to determining that the reset is an activate reset. In response to the activate reset command completing, second notifications are sent to each I/O enclosure management engine that code activation for the first set of I/O enclosures has completed.
-
30.
公开(公告)号:US20180165082A1
公开(公告)日:2018-06-14
申请号:US15373116
申请日:2016-12-08
发明人: Gary W. Batchelor , Veronica S. Davila , Enrique Q. Garcia , Robin Han , Jay T. Kirch , Ronald D. Martens , Trung N. Nguyen , Brian A. Rinaldi , Todd C. Sorenson
摘要: Provided are techniques for concurrent Input/Output (I/O) enclosure firmware/Field-Programmable Gate Array (FPGA) update in a multi-node environment. First notifications are sent to each I/O enclosure management engine on each of a plurality of server nodes that code activation for a first set of I/O enclosures is starting. An update image is distributed to the first set of I/O enclosures. The update image on the first set of I/O enclosures is activated by sending an activate reset command to each of the first set of I/O enclosures, wherein a reset is not propagated to other devices within each I/O enclosure in the first set of I/O enclosures in response to determining that the reset is an activate reset. In response to the activate reset command completing, second notifications are sent to each I/O enclosure management engine that code activation for the first set of I/O enclosures has completed.
-
-
-
-
-
-
-
-
-