-
公开(公告)号:US08812907B1
公开(公告)日:2014-08-19
申请号:US13186087
申请日:2011-07-19
申请人: Thomas D. Bissett , Paul A. Leveille , Ted M. Lin , Jerry Melnick , Angel L. Pagan , Glenn A. Tremblay
发明人: Thomas D. Bissett , Paul A. Leveille , Ted M. Lin , Jerry Melnick , Angel L. Pagan , Glenn A. Tremblay
IPC分类号: G06F11/00
CPC分类号: G06F11/1484 , G06F11/2038 , G06F11/2097
摘要: A computer system configured to provide fault tolerance includes a first host system and a second host system. The first host system is programmed to monitor a number of portions of memory of the first host system that have been modified by a guest running on the first host system and, upon determining that the number of portions exceeds a threshold level, determine that a checkpoint needs to be created. Upon determining that the checkpoint needs to be created, operation of the guest is paused and checkpoint data is generated. After generating the checkpoint data, operation of the guest is resumed while the checkpoint data is transmitted to the second host system.
摘要翻译: 被配置为提供容错的计算机系统包括第一主机系统和第二主机系统。 第一主机系统被编程为监视由第一主机系统上运行的客户机修改的第一主机系统的多个部分部分,并且在确定部件数量超过阈值电平时,确定检查点 需要创建。 在确定需要创建检查点时,暂停客户机的操作并生成检查点数据。 在产生检查点数据之后,当检查点数据被发送到第二主机系统时,恢复访客的操作。
-
公开(公告)号:US07877552B2
公开(公告)日:2011-01-25
申请号:US11419936
申请日:2006-05-23
申请人: Paul A. Leveille , Thomas D. Bissett , Stephen S. Corbin , Jerry Melnick , Glenn A. Tremblay , Satoshi Watanabe , Keiichi Koyama
发明人: Paul A. Leveille , Thomas D. Bissett , Stephen S. Corbin , Jerry Melnick , Glenn A. Tremblay , Satoshi Watanabe , Keiichi Koyama
IPC分类号: G06F13/00
CPC分类号: G06F9/52 , G06F11/1405 , G06F11/1683 , G06F11/1691 , G06F11/2038 , G06F11/2097 , G06F12/1458 , G06F12/1483
摘要: A symmetric multiprocessing fault-tolerant computer system controls memory access in a symmetric multiprocessing computer system. To do so, virtual page structures are created, where the virtual page structures reflect physical page access privileges to shared memory for processors in a symmetric multiprocessing computer system. Access to shared memory is controlled based on physical page access privileges reflected in the virtual paging structures to coordinate deterministic shared memory access between processors in the symmetric multiprocessing computer system. A symmetric multiprocessing fault-tolerant computer system may use duplication or continuous replay.
摘要翻译: 对称多处理容错计算机系统控制对称多处理计算机系统中的存储器访问。 为此,创建了虚拟页面结构,虚拟页面结构反映了对称多处理计算机系统中处理器的共享内存的物理页面访问权限。 基于在虚拟分页结构中反映的物理页面访问权限来控制对共享存储器的访问,以协调对称多处理计算机系统中的处理器之间的确定性共享存储器访问。 对称多处理容错计算机系统可以使用重复或连续重放。
-
公开(公告)号:US5790397A
公开(公告)日:1998-08-04
申请号:US710404
申请日:1996-09-17
申请人: Thomas D. Bissett , Martin J. Fitzgerald, V , Paul A. Leveille , James D. McCollum , Erik Muench , Glenn A. Tremblay
发明人: Thomas D. Bissett , Martin J. Fitzgerald, V , Paul A. Leveille , James D. McCollum , Erik Muench , Glenn A. Tremblay
IPC分类号: G06F11/14 , G06F9/52 , G06F11/07 , G06F11/16 , G06F11/18 , G06F11/20 , G06F11/30 , G06F11/32 , G06F19/00
CPC分类号: G06F11/07 , G06F11/0709 , G06F11/0745 , G06F11/079 , G06F11/1641 , G06F11/1645 , G06F11/1658 , G06F11/1683 , G06F11/1691 , G06F11/327 , G06F11/165 , G06F11/20 , G06F11/3065 , G06F2201/845
摘要: Data transfer to computing elements is synchronized in a computer system that includes the computing elements and controllers that provide data from data sources to the computing elements. A request for data made by a computing element is intercepted and transmitted to the controllers. At least a first controller responds by transmitting requested data to the computing element and by indicating how a second controller will respond to the intercepted request.
摘要翻译: 将数据传输到计算元件在包括从数据源向计算元件提供数据的计算元件和控制器的计算机系统中同步。 对计算单元做出的数据的请求被截取并发送给控制器。 至少第一控制器通过将所请求的数据发送到计算元件并且通过指示第二控制器如何对所截取的请求进行响应来做出响应。
-
公开(公告)号:US20090240916A1
公开(公告)日:2009-09-24
申请号:US12434496
申请日:2009-05-01
CPC分类号: G06F11/1691 , G06F11/1633
摘要: A fault tolerant/fault resilient computer system includes a first coserver and a second coserver. The first coserver includes a first application environment (AE) processor and a first I/O subsystem processor on a first common motherboard. The second coserver includes a second AE processor and a second I/O subsystem processor on a second common motherboard.
摘要翻译: 容错/故障恢复计算机系统包括第一协同服务器和第二协同服务器。 第一协同服务器包括第一公共主板上的第一应用环境(AE)处理器和第一I / O子系统处理器。 第二协同服务器包括第二公共主板上的第二AE处理器和第二I / O子系统处理器。
-
公开(公告)号:US5896523A
公开(公告)日:1999-04-20
申请号:US868670
申请日:1997-06-04
CPC分类号: G06F11/1691 , G06F9/3851 , G06F11/1683
摘要: Synchronized execution is maintained by compute elements processing instruction streams in a computer system including the compute elements and a controller. Each compute element includes a clock that operates asynchronously with respect to clocks of the other compute elements. Each compute element processes instructions from an instruction stream and counts the instructions processed. Upon processing a quantum of instructions from the instruction stream, the compute element initiates a synchronization procedure and continues to process instructions from the instruction stream and to count instructions processed from the instruction stream. The compute element halts processing of instructions from the instruction stream after processing an unspecified number of instructions from the instruction stream in addition to the quantum of instructions. Upon halting processing, the compute element sends a synchronization request to the controller and waits for a synchronization reply.
摘要翻译: 在包括计算元件和控制器的计算机系统中,计算元件处理指令流来维持同步执行。 每个计算元件包括相对于其他计算元件的时钟异步操作的时钟。 每个计算单元处理来自指令流的指令,并对所处理的指令进行计数。 在处理来自指令流的指令量时,计算元件启动同步过程并继续处理来自指令流的指令,并计数从指令流处理的指令。 除了指令量之外,计算单元在处理来自指令流的未指定数量的指令之后停止来自指令流的指令的处理。 在停止处理时,计算单元向控制器发送同步请求,并等待同步应答。
-
公开(公告)号:US06205565B1
公开(公告)日:2001-03-20
申请号:US09081074
申请日:1998-05-19
申请人: Thomas D. Bissett , Martin J. Fitzgerald, V , Paul A. Leveille , James D. McCollum , Erik Muench , Glenn A. Tremblay
发明人: Thomas D. Bissett , Martin J. Fitzgerald, V , Paul A. Leveille , James D. McCollum , Erik Muench , Glenn A. Tremblay
IPC分类号: G06F1300
CPC分类号: G06F11/07 , G06F11/0709 , G06F11/0745 , G06F11/079 , G06F11/1641 , G06F11/1645 , G06F11/165 , G06F11/1658 , G06F11/1683 , G06F11/1691 , G06F11/20 , G06F11/3065 , G06F11/327 , G06F2201/845
摘要: Data transfer to computing elements is synchronized in a computer system that includes the computing elements and controllers that provide data from data sources to the computing elements. A request for data made by a computing element is intercepted and transmitted to the controllers. At least a first controller responds by transmitting requested data to the computing element and by indicating how a second controller will respond to the intercepted request.
摘要翻译: 将数据传输到计算元件在包括从数据源向计算元件提供数据的计算元件和控制器的计算机系统中同步。 对计算单元做出的数据的请求被截取并发送给控制器。 至少第一控制器通过将所请求的数据发送到计算元件并且通过指示第二控制器如何对所截取的请求进行响应来做出响应。
-
公开(公告)号:US06728898B2
公开(公告)日:2004-04-27
申请号:US10090728
申请日:2002-03-06
IPC分类号: G06F1100
CPC分类号: G06F11/2082 , G06F11/2074
摘要: Producing a mirror copy using incremental-divergence is performed in a computer system in which write requests are each associated with a reference label. A mirror set may be restored to a state in which the data storage devices contain identical data by copying from the data storage device having “good” data only portions of data which have not been stored on the data storage device having divergent data. Incremental-divergence copying may be accomplished by keeping track of the changes made after a point in which the data storage devices are known to contain identical data.
摘要翻译: 在计算机系统中执行使用增量散度生成镜像副本,其中写入请求各自与参考标签相关联。 可以将镜像集恢复到数据存储装置通过从具有“良好”数据的数据存储装置仅包含尚未存储在具有发散数据的数据存储装置上的数据的部分复制而包含相同数据的状态。 增量分歧复制可以通过跟踪在已知数据存储设备包含相同数据的点之后进行的改变来实现。
-
公开(公告)号:US20150205671A1
公开(公告)日:2015-07-23
申请号:US14571383
申请日:2014-12-16
IPC分类号: G06F11/14
CPC分类号: G06F11/1484
摘要: A method for determining a delay in a dynamic, event driven, checkpoint interval. In one embodiment, the method includes the steps of determining the number of network bits to be transferred; determining the target bit transfer rate; calculating the next cycle delay as the number of bits to be transferred divided by the target bit transfer rate. In another aspect, the invention relates to a method for delaying a checkpoint interval. In one embodiment, the method includes the steps of monitoring the transfer of a prior batch of network data and delaying a subsequent checkpoint until the transfer of a prior batch of network data has reached a certain predetermined level of completion. In another embodiment, the predetermined level of completion is 100%.
摘要翻译: 一种用于确定动态,事件驱动的检查点间隔中的延迟的方法。 在一个实施例中,该方法包括以下步骤:确定要传送的网络位数; 确定目标比特传输速率; 计算下一周期延迟作为要传输的位数除以目标位传输速率。 在另一方面,本发明涉及一种用于延迟检查点间隔的方法。 在一个实施例中,该方法包括以下步骤:监视先前批次的网络数据的传输并延迟后续的检查点,直到先前批次的网络数据的传送已经达到一定的预定的完成水平。 在另一个实施例中,预定的完成水平为100%。
-
公开(公告)号:US06473869B2
公开(公告)日:2002-10-29
申请号:US09925487
申请日:2001-08-10
申请人: Thomas D. Bissett , Paul A. Leveille , Erik Muench
发明人: Thomas D. Bissett , Paul A. Leveille , Erik Muench
IPC分类号: G06F1100
CPC分类号: G06F11/1633 , G06F11/1641 , G06F11/165 , G06F11/1658 , G06F11/1679 , G06F11/1687 , G06F11/1691
摘要: A fault tolerant/fault resilient computer system includes at least two compute elements connected to at least one controller. Each compute element has clocks that operate asynchronously to clocks of the other compute elements. The compute elements operate in a first mode in which the compute elements each execute a first stream of instructions in emulated clock lockstep, and in a second mode in which the compute elements each execute a second stream of instructions in instruction lockstep. Each compute element may be a multi-processor compute element.
-
公开(公告)号:US06279119B1
公开(公告)日:2001-08-21
申请号:US09190269
申请日:1998-11-13
申请人: Thomas D. Bissett , Paul A. Leveille , Erik Muench
发明人: Thomas D. Bissett , Paul A. Leveille , Erik Muench
IPC分类号: G06F1100
CPC分类号: G06F11/1633 , G06F11/1641 , G06F11/165 , G06F11/1658 , G06F11/1679 , G06F11/1687 , G06F11/1691
摘要: A fault tolerant/fault resilient computer system includes at least two compute elements connected to at least one controller. Each compute element has clocks that operate asynchronously to clocks of the other compute elements. The compute elements operate in a first mode in which the compute elements each execute a first stream of instructions in emulated clock lockstep, and in a second mode in which the compute elements each execute a second stream of instructions in instruction lockstep. Each compute element may be a multi-processor compute element.
摘要翻译: 容错/故障恢复计算机系统包括连接到至少一个控制器的至少两个计算元件。 每个计算元件具有与其他计算元素的时钟异步运行的时钟。 计算元件以第一模式工作,其中计算元件各自在仿真时钟锁步骤中执行指令的第一流,并且在第二模式中,计算元件在指令锁定步骤中每个执行第二指令流。 每个计算元件可以是多处理器计算元件。
-
-
-
-
-
-
-
-
-