Method and apparatus for fault tolerant time synchronization mechanism in a scaleable multi-processor computer

    公开(公告)号:US20060179364A1

    公开(公告)日:2006-08-10

    申请号:US11054294

    申请日:2005-02-09

    IPC分类号: G06F11/00

    摘要: Redundant time-of-day (TOD) oscillators are aligned, within a master oscillator path, to local logic oscillator and used to create independent step-sync signals. A step checker validates and provides selection signals to identify which of the TOD oscillators operates according to a criterion. Independent step-sync signals are transmitted to several sibling chips. Local step and sync signals are delayed to arrive at TOD register nearly synchronous with TOD registers in sibling chips. A slave oscillator path may be used to select time signals generated in a sibling chip, whereby the master oscillator path is deselected. A primary control register set may be used to configure which among several chips is a master chip using the master oscillator path. All remaining chips are slave chips. All segments of the topology are redundant. One of multiple possible alternate topologies is defined in a secondary control register set. Commands and TOD values are passed on the fabric at predefined time increment boundaries to establish, restore, or maintain synchronization across all chips.

    Method, apparatus, and product for an efficient virtualized time base in a scaleable multi-processor computer

    公开(公告)号:US20060242442A1

    公开(公告)日:2006-10-26

    申请号:US11110180

    申请日:2005-04-20

    IPC分类号: G06F1/12

    CPC分类号: G06F1/14

    摘要: A method, apparatus, and computer program product are disclosed in a data processing system for providing a virtualized time base in a logically partitioned data processing system. A time base is determined for each one of multiple processor cores. The time base is used to indicate a current time to one of the processor cores for which the time base is determined. The time bases are synchronized together for the processor cores such that each one of the processor cores includes its own copy of a synchronized time base. For one of the processor cores, a virtualized time base is generated that is different from the synchronized time base but that remains synchronized with at least a portion of the synchronized time base. The processor core utilizes the virtualized time base instead of the synchronized time base for indicating the current time to the processor core. The synchronized time bases and the portion of the virtualized time base remaining in synchronization together.

    Method for providing low-level hardware access to in-band and out-of-band firmware
    4.
    发明申请
    Method for providing low-level hardware access to in-band and out-of-band firmware 失效
    用于提供对带内和带外固件的低级硬件访问的方法

    公开(公告)号:US20060179184A1

    公开(公告)日:2006-08-10

    申请号:US11055675

    申请日:2005-02-10

    IPC分类号: G06F3/06 G06F3/02 G06F3/00

    CPC分类号: G06F15/161

    摘要: In-band firmware executes instructions which cause commands to be sent on a coherency fabric. Fabric snoop logic monitors the coherency fabric for command packets that target a resource in one of the support chips attached via an FSI link. Conversion logic converts the information from the fabric packet into an FSI protocol. An FSI command is transmitted via the FSI transmit link to an FSI slave of the intended support chip. An FSI receive link receives response data from the FSI slave of the intended support chip. Conversion logic converts the information from the support chip received via the FSI receive link into the fabric protocol. Response packet generation logic generates the fabric response packet and returns it on the coherency fabric. An identical FSI link between a support processor and support chips allows direct access to the same resources on the support chips by out-of-band firmware.

    摘要翻译: 带内固件执行指令,使指令在一致性结构上发送。 Fabric Snoop逻辑监视针对通过FSI链接附加的支持芯片之一的资源的命令包的一致性结构。 转换逻辑将信息从Fabric数据包转换为FSI协议。 FSI命令通过FSI传输链路发送到预期支持芯片的FSI从站。 FSI接收链路从预期的支持芯片的FSI从站接收响应数据。 转换逻辑将从通过FSI接收链路接收的支持芯片的信息转换为结构协议。 响应分组生成逻辑生成结构响应分组并将其返回到一致性结构上。 支持处理器和支持芯片之间的相同FSI链路允许通过带外固件直接访问支持芯片上的相同资源。

    Method for indirect access to a support interface for memory-mapped resources to reduce system connectivity from out-of-band support processor
    5.
    发明申请
    Method for indirect access to a support interface for memory-mapped resources to reduce system connectivity from out-of-band support processor 失效
    用于间接访问内存映射资源的支持接口以减少带外支持处理器的系统连接的方法

    公开(公告)号:US20060176897A1

    公开(公告)日:2006-08-10

    申请号:US11055404

    申请日:2005-02-10

    IPC分类号: H04L12/66

    CPC分类号: G06F15/7842

    摘要: A method and apparatus are provided for a support interface for memory-mapped resources. A support processor sends a sequence of commands over and FSI interface to a memory-mapped support interface on a processor chip. The memory-mapped support interface updates memory, memory-mapped registers or memory-mapped resources. The interface uses fabric packet generation logic to generate a single command packet in a protocol for the coherency fabric which consists of an address, command and/or data. Fabric commands are converted to FSI protocol and forwarded to attached support chips to access the memory-mapped resource, and responses from the support chips are converted back to fabric response packets. Fabric snoop logic monitors the coherency fabric and decodes responses for packets previously sent by fabric packet generation logic. The fabric snoop logic updates status register and/or writes response data to a read data register. The system also reports any errors that are encountered.

    摘要翻译: 提供了一种用于存储器映射资源的支持接口的方法和装置。 支持处理器将一系列命令和FSI接口发送到处理器芯片上的存储器映射支持接口。 内存映射支持接口更新内存,内存映射寄存器或内存映射资源。 该接口使用结构数据包生成逻辑在由地址,命令和/或数据组成的一致性结构的协议中生成单个命令分组。 Fabric命令转换为FSI协议,并转发到附加的支持芯片以访问内存映射资源,并将来自支持芯片的响应转换回Fabric响应数据包。 Fabric监听逻辑监视一致性结构,并解码先前由Fabric数据包生成逻辑发送的数据包的响应。 织物窥探逻辑更新状态寄存器和/或将响应数据写入读取数据寄存器。 系统还报告遇到的任何错误。

    Double DRAM bit steering for multiple error corrections
    6.
    发明申请
    Double DRAM bit steering for multiple error corrections 失效
    双重DRAM位转向可进行多次错误更正

    公开(公告)号:US20060179362A1

    公开(公告)日:2006-08-10

    申请号:US11054417

    申请日:2005-02-09

    IPC分类号: G06F11/00

    摘要: A method and system is presented for correcting a data error in a primary Dynamic Random Access Memory (DRAM) in a Dual In-line Memory Module (DIMM). Each DRAM has a left half (for storing bits 0:3) and a right half (for storing bits 4:7). A determination is made as to whether the data error was in the left or right half of the primary DRAM. The half of the primary DRAM in which the error occurred is removed from service. All subsequent reads and writes for data originally stored in the primary DRAM's defective half are made to a half of a spare DRAM in the DIMM, while the DRAM's non-defective half continues to be used for subsequently storing data.

    摘要翻译: 提出了一种用于校正双列直插式存储器模块(DIMM)中的主动态随机存取存储器(DRAM)中的数据错误的方法和系统。 每个DRAM具有左半部分(用于存储位0:3)和右半部分(用于存储位4:7)。 确定数据错误是在主DRAM的左半还是右半部。 发生错误的主要DRAM的一半从服务中删除。 原始存储在主DRAM缺陷半部分的数据的所有后续读取和写入都被制成DIMM中的备用DRAM的一半,而DRAM的无缺陷半部分继续用于随后存储数据。

    Mini-refresh processor recovery as bug workaround method using existing recovery hardware
    8.
    发明申请
    Mini-refresh processor recovery as bug workaround method using existing recovery hardware 审中-公开
    微型刷新处理器恢复作为使用现有恢复硬件的错误解决方法

    公开(公告)号:US20060184771A1

    公开(公告)日:2006-08-17

    申请号:US11055823

    申请日:2005-02-11

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3863 G06F9/3851

    摘要: A method in a data processing system for avoiding a microprocessor's design defects and recovering a microprocessor from failing due to design defects, the method comprised of the following steps: The method detects and reports of events which warn of an error. Then the method locks a current checkpointed state and prevents instructions not checkpointed from checkpointing. After that, the method releases checkpointed state stores to a L2 cache, and drops stores not checkpointed. Next, the method blocks interrupts until recovery is completed. Then the method disables the power savings states throughout the processor. After that, the method disables an instruction fetch and an instruction dispatch. Next, the method sends a hardware reset signal. Then the method restores selected registers from the current checkpointed state. Next, the method fetches instructions from restored instruction addresses. Then the method resumes a normal execution after a programmable number of instructions.

    摘要翻译: 一种用于避免微处理器设计缺陷并由于设计缺陷而使微处理器故障恢复的数据处理系统中的方法,该方法包括以下步骤:该方法检测并报告发生错误的事件。 然后,该方法锁定当前的检查点状态,并防止从检查点进行检查点的指令。 之后,该方法将检查点状态存储发送到L2缓存,并且将不检查点丢弃存储。 接下来,该方法将阻止中断,直到恢复完成。 然后该方法将禁用整个处理器的省电状态。 之后,该方法禁用指令提取和指令分派。 接下来,该方法发送硬件复位信号。 然后,该方法将从当前检查点状态恢复所选寄存器。 接下来,该方法从恢复的指令地址获取指令。 然后,该方法在可编程指令数量之后恢复正常执行。

    System and method for creating precise exceptions
    9.
    发明申请
    System and method for creating precise exceptions 失效
    用于创建精确异常的系统和方法

    公开(公告)号:US20060179290A1

    公开(公告)日:2006-08-10

    申请号:US11055193

    申请日:2005-02-10

    IPC分类号: G06F9/44

    摘要: A method for creating precise exceptions including checkpointing an exception causing instruction. The checkpointing results in a current checkpointed state. The current checkpointed state is locked. It is determined if any of a plurality of registers require restoration to the current checkpointed state. One or more of the registers are restored to the current checkpointed state in response to the results of the determining indicating that the one or more registers require the restoring. The execution unit is restarted at the exception handler or the next sequential instruction dependent on whether traps are enabled for the exception.

    摘要翻译: 一种用于创建精确异常的方法,包括检查指向引起异常的指令。 检查点导致当前检查点状态。 当前检查点状态被锁定。 确定多个寄存器中的任一个是否需要恢复到当前检查点状态。 响应于指示一个或多个寄存器需要恢复的确定结果,一个或多个寄存器恢复到当前检查点状态。 执行单元在异常处理程序或下一个顺序指令下重新启动,取决于是否为异常启用陷阱。

    System and method for recovering from errors in a data processing system
    10.
    发明申请
    System and method for recovering from errors in a data processing system 失效
    用于从数据处理系统中的错误中恢复的系统和方法

    公开(公告)号:US20060179358A1

    公开(公告)日:2006-08-10

    申请号:US11054186

    申请日:2005-02-09

    IPC分类号: G06F11/00

    CPC分类号: G06F11/1441 G06F11/2028

    摘要: A system and method of recovering from errors in a data processing system. The data processing system includes one or more processor cores coupled to one or more memory controllers. The one or more memory controllers include at least a first memory interface coupled to a first memory and at least a second memory interface coupled to a second memory. In response to determining an error has been detected in the first memory, access to the first memory via the first memory interface is inhibited. Also, the first memory interface is locally restarted without restarting the second memory interface.

    摘要翻译: 一种从数据处理系统中的错误中恢复的系统和方法。 数据处理系统包括耦合到一个或多个存储器控制器的一个或多个处理器核心。 一个或多个存储器控制器至少包括耦合到第一存储器的第一存储器接口和耦合到第二存储器的至少第二存储器接口。 响应于确定在第一存储器中检测到错误,禁止经由第一存储器接口访问第一存储器。 此外,第一个存储器接口在本地重新启动,而不重新启动第二个存储器接口。