Deallocation of memory in a logically-partitioned computer
    1.
    发明授权
    Deallocation of memory in a logically-partitioned computer 失效
    在逻辑分区的计算机中重新分配内存

    公开(公告)号:US07478268B2

    公开(公告)日:2009-01-13

    申请号:US11225653

    申请日:2005-09-13

    IPC分类号: G06F11/00

    摘要: A method, apparatus, system, and computer-readable storage medium that, in an embodiment, set uncorrectable error indicators in logical memory blocks in response to detecting an uncorrectable error in memory pages associated with the logical memory blocks. If the logical memory block is allocated to a hypervisor, the memory page may be deallocated in response to detection of the uncorrectable error. When an IPL of a partition is subsequently performed, a determination is made whether a logical memory block allocated to the partition previously encountered the uncorrectable error via the uncorrectable error indicator. If the logical memory block did previously encounter the uncorrectable error, the logical memory block is deallocated from the partition. In an embodiment, if spare memory exists, the logical memory block with the previously encountered uncorrectable error is replaced with the spare memory and the IPL of the partition is continued with the spare memory.

    摘要翻译: 一种方法,装置,系统和计算机可读存储介质,在一个实施例中,响应于检测到与逻辑存储器块相关联的存储器页中的不可校正错误,在逻辑存储器块中设置不可校正的错误指示符。 如果逻辑存储器块被分配给管理程序,则可以响应于检测到不可校正的错误来释放存储器页面。 当随后执行分区的IPL时,确定分配给分区的逻辑存储器块先前是否经由不可校正的错误指示符遇到不可校正的错误。 如果逻辑内存块以前遇到不可纠正的错误,逻辑内存块将从分区中释放。 在一个实施例中,如果存在备用存储器,则具有先前遇到的不可校正错误的逻辑存储器块被备用存储器替换,并且分区的IPL与备用存储器一起继续。

    Deallocation of memory in a logically-partitioned computer
    2.
    发明申请
    Deallocation of memory in a logically-partitioned computer 失效
    在逻辑分区的计算机中重新分配内存

    公开(公告)号:US20070061612A1

    公开(公告)日:2007-03-15

    申请号:US11225653

    申请日:2005-09-13

    IPC分类号: G06F11/00

    摘要: A method, apparatus, system, and signal-bearing medium that, in an embodiment, set uncorrectable error indicators in logical memory blocks in response to detecting an uncorrectable error in memory pages associated with the logical memory blocks. If the logical memory block is allocated to a hypervisor, the memory page may be deallocated in response to detection of the uncorrectable error. When an IPL of a partition is subsequently performed, a determination is made whether a logical memory block allocated to the partition previously encountered the uncorrectable error via the uncorrectable error indicator. If the logical memory block did previously encounter the uncorrectable error, the logical memory block is deallocated from the partition. In an embodiment, if spare memory exists, the logical memory block with the previously encountered uncorrectable error is replaced with the spare memory and the IPL of the partition is continued with the spare memory. If spare memory does not exist, the IPL of the partition is continued without the logical memory block that previously encountered the uncorrectable error. This allows a partition to IPL if it had not been able to because of a persistent uncorrectable error in its IPL path.

    摘要翻译: 一种方法,装置,系统和信号承载介质,在一个实施例中,响应于检测到与逻辑存储器块相关联的存储器页面中的不可校正错误,在逻辑存储器块中设置不可校正的错误指示符。 如果逻辑存储器块被分配给管理程序,则可以响应于检测到不可校正的错误来释放存储器页面。 当随后执行分区的IPL时,确定分配给分区的逻辑存储器块先前是否经由不可校正的错误指示符遇到不可校正的错误。 如果逻辑内存块以前遇到不可纠正的错误,逻辑内存块将从分区中释放。 在一个实施例中,如果存在备用存储器,则具有先前遇到的不可校正错误的逻辑存储器块被备用存储器替换,并且分区的IPL与备用存储器一起继续。 如果备用内存不存在,则分区的IPL将继续,而没有先前遇到不可纠正错误的逻辑内存块。 如果由于IPL路径中持续存在不可纠正的错误,分区不能由IPL分区执行。

    Method and apparatus for coordinating dynamic memory deallocation with a redundant bit line steering mechanism
    4.
    发明申请
    Method and apparatus for coordinating dynamic memory deallocation with a redundant bit line steering mechanism 失效
    用于协调动态内存释放与冗余位线转向机制的方法和装置

    公开(公告)号:US20050028039A1

    公开(公告)日:2005-02-03

    申请号:US10631067

    申请日:2003-07-31

    摘要: A method and apparatus for coordinating dynamic memory page deallocation with a redundant bit line steering mechanism are provided. With the method and apparatus, memory scrubbing and redundant bit line steering operations are performed in parallel with handling of notifications of runtime correctable errors. When a correctable error is encountered during runtime, and the correctable error is determined to be persistent, then dynamic memory page deallocation is requested of a hypervisor. The determination of persistence is based on a history CE table that is populated by the operation of the memory scrubbing and redundant bit line steering mechanism of a service processor. Thus, only those correctable errors that persist for longer than one memory scrubbing cycle are subject to memory page deallocation.

    摘要翻译: 提供了一种用于与冗余位线转向机构协调动态存储器页面解除分配的方法和装置。 利用该方法和装置,与处理运行时可校正错误的通知并行执行存储器擦除和冗余位线转向操作。 当在运行时遇到可纠正的错误,并且确定可纠正的错误是持久的,则请求虚拟机管理程序的动态内存页解除分配。 持久性的确定基于由服务处理器的存储器擦除和冗余位线转向机制的操作填充的历史CE表。 因此,只有那些持续时间超过一个内存擦除周期的可纠正错误才会受到内存页解除分配。

    Bus failure management method and system
    5.
    发明授权
    Bus failure management method and system 失效
    总线故障管理方法和系统

    公开(公告)号:US07895493B2

    公开(公告)日:2011-02-22

    申请号:US12110611

    申请日:2008-04-28

    IPC分类号: G01R31/3181 G01R31/40

    摘要: A method, apparatus and program product improve computer reliability by, in part, identifying a plurality of error occurrences from Error Correction Codes. It may then be determined if the plurality of error occurrences are associated with a single bit of a bus. The determined, single bit may correspond to a faulty component of the bus. This level of identification efficiently addresses problems. For instance, a corrective algorithm may be applied if the plurality of error occurrences are associated with the single bit. Alternatively, the bus may be disabled if the plurality of error occurrences are not associated with the single bit of the bus. In this manner, implementations may detect, identify and act in response to multiple failure modes.

    摘要翻译: 一种方法,装置和程序产品通过部分地从错误校正码识别多个错误发生来提高计算机的可靠性。 然后可以确定多个错误发生是否与总线的单个位相关联。 所确定的单个位可对应于总线的故障部件。 此级别的识别有效地解决了问题。 例如,如果多个错误发生与单个位相关联,则可以应用校正算法。 或者,如果多个错误发生不与总线的单个位相关联,则总线可以被禁用。 以这种方式,实现可以响应于多种故障模式来检测,识别和动作。

    Bus Failure Management Method and System
    6.
    发明申请
    Bus Failure Management Method and System 失效
    总线故障管理方法与系统

    公开(公告)号:US20090271668A1

    公开(公告)日:2009-10-29

    申请号:US12110611

    申请日:2008-04-28

    IPC分类号: G06F11/00 G06F11/07

    摘要: A method, apparatus and program product improve computer reliability by, in part, identifying a plurality of error occurrences from Error Correction Codes. It may then be determined if the plurality of error occurrences are associated with a single bit of a bus. The determined, single bit may correspond to a faulty component of the bus. This level of identification efficiently addresses problems. For instance, a corrective algorithm may be applied if the plurality of error occurrences are associated with the single bit. Alternatively, the bus may be disabled if the plurality of error occurrences are not associated with the single bit of the bus. In this manner, implementations may detect, identify and act in response to multiple failure modes.

    摘要翻译: 一种方法,装置和程序产品通过部分地从错误校正码识别多个错误发生来提高计算机的可靠性。 然后可以确定多个错误发生是否与总线的单个位相关联。 所确定的单个位可对应于总线的故障部件。 此级别的识别有效地解决了问题。 例如,如果多个错误发生与单个位相关联,则可以应用校正算法。 或者,如果多个错误发生不与总线的单个位相关联,则总线可以被禁用。 以这种方式,实现可以响应于多种故障模式来检测,识别和动作。

    Method and apparatus for isolating uncorrectable errors while system continues to run
    7.
    发明授权
    Method and apparatus for isolating uncorrectable errors while system continues to run 失效
    用于在系统继续运行时分离不可校正错误的方法和装置

    公开(公告)号:US07089461B2

    公开(公告)日:2006-08-08

    申请号:US10392504

    申请日:2003-03-20

    IPC分类号: G06F11/00

    CPC分类号: G06F11/106

    摘要: A method, apparatus and computer program product are provided for implementing uncorrectable error isolation in a computer system while the system continues to run. A memory controller performs data fetching from a system memory, capturing error information, and responsive to detecting an uncorrectable error, generates a predefined attention to a service processor. The service processor utilizing a processor runtime diagnostic (PRD) program, reads the captured error data and identifies a memory extent with the uncorrectable error. Then the memory controller performs accelerated scrubbing of the identified memory extent with the uncorrectable error, capturing error information and responsive to a scrub correctable error threshold being exceeded, sends a predefined scrub threshold exceeded attention to the service processor. The service processor reads the captured error data and identifies a failed memory chip.

    摘要翻译: 提供了一种方法,装置和计算机程序产品,用于在系统继续运行时在计算机系统中实现不可校正的错误隔离。 存储器控制器执行从系统存储器获取数据,捕获错误信息,并且响应于检测到不可校正的错误,产生对服务处理器的预定义的注意。 利用处理器运行时诊断(PRD)程序的服务处理器读取所捕获的错误数据并用不可校正的错误来识别存储器范围。 然后,存储器控制器利用不可校正的错误来执行对所识别的存储器扩展的加速擦除,捕获错误信息并且响应于被超过的可擦除校正错误阈值,将超出注意事项的预定义擦除阈值发送到服务处理器。 服务处理器读取捕获的错误数据并识别故障存储器芯片。