-
公开(公告)号:US10127109B2
公开(公告)日:2018-11-13
申请号:US15625985
申请日:2017-06-16
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
-
公开(公告)号:US20170308430A1
公开(公告)日:2017-10-26
申请号:US15625985
申请日:2017-06-16
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
CPC classification number: G06F11/1076 , G06F3/0619 , G06F3/064 , G06F3/067 , G06F3/0673 , G06F11/08 , G06F11/10 , G06F11/1004 , G06F11/1008 , G06F11/1016 , G06F11/1068 , G06F11/1088 , G06F11/14 , G06F11/1402 , G06F11/1405 , G06F11/141 , G06F11/1479 , G06F11/1662 , G06F11/202 , G06F11/2023 , G06F11/2035 , G06F2201/805 , G06F2201/82
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
-
公开(公告)号:US10884859B2
公开(公告)日:2021-01-05
申请号:US16385448
申请日:2019-04-16
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
-
公开(公告)号:US10324792B2
公开(公告)日:2019-06-18
申请号:US15625957
申请日:2017-06-16
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
-
公开(公告)号:US09910731B2
公开(公告)日:2018-03-06
申请号:US15357448
申请日:2016-11-21
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
CPC classification number: G06F11/1076 , G06F3/0619 , G06F3/064 , G06F3/067 , G06F3/0673 , G06F11/08 , G06F11/10 , G06F11/1004 , G06F11/1008 , G06F11/1016 , G06F11/1068 , G06F11/1088 , G06F11/14 , G06F11/1402 , G06F11/1405 , G06F11/141 , G06F11/1479 , G06F11/1662 , G06F11/202 , G06F11/2023 , G06F11/2035 , G06F2201/805 , G06F2201/82
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
-
公开(公告)号:US20170068596A1
公开(公告)日:2017-03-09
申请号:US15357448
申请日:2016-11-21
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
CPC classification number: G06F11/1076 , G06F3/0619 , G06F3/064 , G06F3/067 , G06F3/0673 , G06F11/08 , G06F11/10 , G06F11/1004 , G06F11/1008 , G06F11/1016 , G06F11/1068 , G06F11/1088 , G06F11/14 , G06F11/1402 , G06F11/1405 , G06F11/141 , G06F11/1479 , G06F11/1662 , G06F11/202 , G06F11/2023 , G06F11/2035 , G06F2201/805 , G06F2201/82
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
Abstract translation: 弹性系统使用先前存储的纠错信息来检测和校正由计算系统的存储器系统报告的存储器错误。 当程序将数据存储到存储器位置时,在计算系统上执行的弹性系统生成并存储纠错信息。 当程序然后执行加载指令以从存储器位置检索数据时,如果没有存储器错误,则加载指令正常完成。 然而,如果存在内存错误,则计算系统将控制权传给弹性系统(例如,经由陷阱)来处理存储器错误。 弹性系统检索存储器位置的纠错信息并重新创建存储器位置的数据。 弹性系统存储数据,就好像加载指令已经正常完成,并将控制权传给程序的下一条指令。
-
-
-
-
-