发明申请
- 专利标题: Enhancing throughput and fault-tolerance in a parallel-processing system
- 专利标题(中): 提高并行处理系统的吞吐量和容错能力
-
申请号: US11371998申请日: 2006-03-08
-
公开(公告)号: US20070214394A1公开(公告)日: 2007-09-13
- 发明人: Kenny Gross , Alan Wood
- 申请人: Kenny Gross , Alan Wood
- 主分类号: G06F11/00
- IPC分类号: G06F11/00
摘要:
One embodiment of the present invention provides a system that enhances throughput and fault-tolerance in a parallel-processing system. During operation, the system first receives a task. Next, the system partitions N computing nodes into M set-aside nodes and N-M primary computing nodes, wherein M≧1. The system then processes the task in parallel across the N-M primary computing nodes. While doing so, the system proactively monitors the health of each of the N-M primary computing nodes. If the system detects a node in the N-M primary computing nodes to be at risk of failure, the system copies the portion of the task associated with the at-risk node to a subset of the M set-aside nodes. The system then processes the portion of the task in parallel across the subset of the M set-aside nodes while the N-M primary computing nodes continue executing.
公开/授权文献
信息查询