I/O forwarding in a cache coherent shared disk computer system
    11.
    发明授权
    I/O forwarding in a cache coherent shared disk computer system 失效
    I / O转发在缓存一致的共享磁盘计算机系统中

    公开(公告)号:US6112281A

    公开(公告)日:2000-08-29

    申请号:US946084

    申请日:1997-10-07

    摘要: A method and apparatus for I/O forwarding in a cache coherent shared disk computer system is provided. According to the method, a requesting node transmits a request for requested data to a managing node. The managing node receives the read request from the requesting node and grants a lock on the requested data. The managing node then forwards data that identifies the requested data to a disk controller. The disk controller receives the data that identifies the requested data from the managing node and reads a data item, based on the data that identifies the requested data, from a shared disk. After reading the data item from the shared disk, the disk controller transmits the data item to the requesting node. In one embodiment, an I/O destination handle is generated that identifies a read request and a buffer cache address to which the data item should be copied. The I/O destination handle is transmitted to the disk controller to facilitate transmission and processing of the data item from the disk controller to the requesting node. As a result of forwarding data that identifies the requested data directly from the managing node to the disk controller ("I/O forwarding"), the duration of a stall is reduced, contention on resources of the system is reduced and a context switch is eliminated.

    摘要翻译: 提供了一种用于缓存一致共享磁盘计算机系统中的I / O转发的方法和装置。 根据该方法,请求节点向管理节点发送请求数据的请求。 管理节点从请求节点接收读请求,并对所请求的数据进行锁定。 管理节点然后将识别所请求数据的数据转发到磁盘控制器。 磁盘控制器从管理节点接收标识所请求数据的数据,并从共享磁盘读取数据项,基于标识所请求数据的数据。 从共享磁盘读取数据项后,磁盘控制器将数据项发送到请求节点。 在一个实施例中,生成识别读取请求的I / O目的地句柄和应该复制数据项的缓冲器高速缓存地址。 将I / O目的地句柄发送到磁盘控制器,以便于将数据项从磁盘控制器发送到处理请求节点。 作为将从管理节点直接识别请求的数据的数据转发到磁盘控制器(“I / O转发”)的结果,减少停顿的持续时间,减少对系统的资源的争用,并且上下文切换是 消除了

    Transferring a resource between caches of different nodes
    12.
    发明授权
    Transferring a resource between caches of different nodes 有权
    在不同节点的缓存之间传输资源

    公开(公告)号:US06564230B2

    公开(公告)日:2003-05-13

    申请号:US09894636

    申请日:2001-06-27

    IPC分类号: G06F1730

    摘要: A method and apparatus are provided for transferring a resource from the cache of one database server to the cache of another database server without first writing the resource to disk. When a database server (Requestor) desires to modify a resource, the Requestor asks for the current version of the resource. The database server that has the current version (Holder) directly ships the current version to the Requestor. Upon shipping the version, the Holder loses permission to modify the resource, but continues to retain the resource in memory. When the retained version of the resource, or a later version thereof, is written to disk, the Holder can discard the retained version of the resource. Otherwise, the Holder does not discard the retained version. Using this technique, single-server failures are recovered without having to merge the recovery logs of the various database servers that had access to the resource.

    摘要翻译: 提供了一种方法和装置,用于将资源从一个数据库服务器的高速缓存传送到另一个数据库服务器的高速缓存,而无需首先将资源写入磁盘。 当数据库服务器(请求者)希望修改资源时,请求者要求资源的当前版本。 具有当前版本(Holder)的数据库服务器将当前版本直接发送到请求者。 运送版本后,持有人将失去修改资源的权限,但继续将资源保留在内存中。 当资源的保留版本或其更新版本写入磁盘时,持有者可以丢弃资源的保留版本。 否则,持有人不会丢弃保留的版本。 使用此技术,恢复单服务器故障,而无需合并可访问资源的各种数据库服务器的恢复日志。

    Managing recovery of data after failure of one or more caches

    公开(公告)号:US06411968B1

    公开(公告)日:2002-06-25

    申请号:US09894325

    申请日:2001-06-27

    IPC分类号: G06F1730

    摘要: A method and apparatus are provided for transferring a resource from the cache of one database server to the cache of another database server without first writing the resource to disk. When a database server (Requestor) desires to modify a resource, the Requestor asks for the current version of the resource. The database server that has the current version (Holder) directly ships the current version to the Requestor. Upon shipping the version, the Holder loses permission to modify the resource, but continues to retain the resource in memory. When the retained version of the resource, or a later version thereof, is written to disk, the Holder can discard the retained version of the resource. Otherwise, the Holder does not discard the retained version. Using this technique, single-server failures are recovered without having to merge the recovery logs of the various database servers that had access to the resource.

    Object hashing with incremental changes
    14.
    发明授权
    Object hashing with incremental changes 有权
    对象散列增量更改

    公开(公告)号:US06363396B1

    公开(公告)日:2002-03-26

    申请号:US09218864

    申请日:1998-12-21

    IPC分类号: G06F1700

    摘要: A method and system are provided for reconfiguring a multiple node system after an epoch change in a manner that reduces the overhead and system unavailability typically incurred during reconfiguration. A resource-to-master mapping is established using the combination of a resource-to-bucket hash function and a bucket-to-node hash function. The resource-to-bucket hash function is not changed in response to an epoch change. The bucket-to-node hash function does change in response to epoch changes. Techniques are disclosed for adjusting the dynamic bucket-to-node hash function after an epoch change in a manner that load balances among the new number of nodes in the system. Further, the changes to the bucket-to-node assignments are performed in a way that reduces the number of resources that have to be remastered. In one embodiment, only those resources that lose their masters during an epoch change are assigned new masters during an initial reconfiguration. Load balancing is then gradually achieved by migrating resources after the system has been made available. The old masters of resources forward access requests to new masters of resources once they have transferred the master resource objects for the requested resources. In addition, techniques are disclosed for migrating resources from a node in anticipation of a planned shutdown of the node.

    摘要翻译: 提供了一种方法和系统,用于在减少在重新配置期间通常引起的开销和系统不可用性的方式在重新构造时代改变之后重新配置多节点系统。 使用资源到桶哈希函数和桶到节点哈希函数的组合来建立资源对主映射。 资源到桶哈希函数不响应时代变化而改变。 桶到节点哈希函数根据时代变化而改变。 公开了用于在以系统中的新数量的节点之间的负载平衡的方式在调整历史变化之后的动态桶到节点散列函数的技术。 此外,以对减少必须重新分配的资源数量的方式来执行对桶到节点分配的改变。 在一个实施例中,在初始重新配置期间,仅分配在时代变化期间失去其主人的资源被分配新的主人。 然后在系统可用后通过资源迁移逐步实现负载平衡。 一旦资源转移了所需资源的主资源对象,资源的老主人就会向新的资源主管理请求。 此外,在预期节点的计划关闭的情况下,公开了用于从节点迁移资源的技术。

    Providing a useable version of a data item
    15.
    发明授权
    Providing a useable version of a data item 有权
    提供数据项的可用版本

    公开(公告)号:US06957236B1

    公开(公告)日:2005-10-18

    申请号:US10263493

    申请日:2002-10-02

    IPC分类号: G06F12/00 G06F9/46 G06F17/30

    摘要: Techniques are provided for providing a data item to a transaction in a multi-versioning system in which the data item may exist on multiple versions of a data block, and were versioning is performed at the granularity of the data block. According to one aspect of the invention, the technique involves locating, within volatile memory, a first version of a data block that includes a first version of the data item. It is then determined whether the first version of the data item is useable by the transaction without respect to whether the first version of the data block is generally useable by the transaction. If the first version of the data item is usable by the transaction, then the data item is established as a candidate that can be provided to the transaction. Thus, the data item within a block may be considered a candidate to be provided to a transaction even when the version of the data block on which the data item resides would otherwise disqualify the data block from being seen by that transaction. If the first version of the data item is not usable by the transaction, then a version of the data item that is usable by the transaction is obtained from a second version of the data block that is different from the first version.

    摘要翻译: 提供了用于向多版本系统中的事务提供数据项的技术,其中数据项可以存在于数据块的多个版本上,并且以数据块的粒度执行版本控制。 根据本发明的一个方面,该技术涉及在易失性存储器内定位包括数据项的第一版本的数据块的第一版本。 然后确定该事务的第一版本的数据项是否可用,而不考虑该数据块的第一版本是否通常可被该事务使用。 如果数据项的第一个版本可以由事务使用,则数据项被建立为可以提供给事务的候选。 因此,即使当数据项所驻留的数据块的版本否则将使数据块被该事务看不到资格时,块内的数据项也可以被认为是被提供给事务的候选者。 如果数据项的第一版本不能被事务使用,则可以从与第一版本不同的数据块的第二版本获得事务可使用的数据项的版本。

    Consistent read in a distributed database environment
    16.
    发明授权
    Consistent read in a distributed database environment 有权
    在分布式数据库环境中一致阅读

    公开(公告)号:US07334004B2

    公开(公告)日:2008-02-19

    申请号:US10119672

    申请日:2002-04-09

    IPC分类号: G06F17/30

    摘要: Techniques are provided for determining which data item version to supply to a query. According to the techniques, the determination is made by associating a new field, which indicates the time a data item version was current, with each data item version; associating a new field with each query, which indicates the last change that the query must see made by the transaction to which the query belongs; and determining which data item version to use to answer the query based, in part, on a comparison between the values of the two new fields.

    摘要翻译: 提供了用于确定要向查询提供哪些数据项版本的技术。 根据该技术,通过将表示数据项目版本当前的时间的新字段与每个数据项目版本相关联来进行确定; 将新字段与每个查询相关联,这表示查询必须看到该查询所属的事务所做的最后一次更改; 以及部分地基于两个新字段的值之间的比较来确定用于回答查询的数据项版本。

    Tracking dependencies between transactions in a database
    19.
    发明授权
    Tracking dependencies between transactions in a database 失效
    跟踪数据库中的事务之间的依赖关系

    公开(公告)号:US5806076A

    公开(公告)日:1998-09-08

    申请号:US740544

    申请日:1996-10-29

    IPC分类号: G06F9/46 G06F17/30

    摘要: A method and an apparatus for tracking of the dependencies between transactions is provided. Every time a data item is updated, a record is made of the transaction that updated the data item. Before another transaction locks a data item previously locked by the transaction, the entry is updated to indicate that the transaction committed and the commit time of the transaction. These entries are contained in a list head that is maintained on the same block as the data item, and a list tail that is stored separate from the data block that contains the data item. A depends-on time is maintained for each transaction. Whenever the transaction updates a data item, the depends-on time is set to the greater of the current depends-on time and the commit time of the most recently committed transaction that updated the version of the data item. Whether a transaction depends on a committed transaction is then determined based on a simple comparison between the depends-on time associated with the transaction and the commit time of the committed transaction.

    摘要翻译: 提供了用于跟踪事务之间的依赖关系的方法和装置。 每次更新数据项时,都会对更新数据项的事务进行记录。 在另一个事务锁定先前由事务锁定的数据项之前,将更新该条目以指示事务已提交和事务的提交时间。 这些条目包含在与数据项相同的块上维护的列表头中,以及与包含数据项的数据块分开存储的列表尾。 维护每个交易的依赖时间。 每当事务更新数据项时,依赖时间被设置为当前依赖时间的更大值以及更新数据项版本的最近提交的事务的提交时间。 然后,基于与事务相关联的依赖时间与承诺事务的提交时间之间的简单比较来确定事务是否依赖于提交的事务。

    Semantic response to lock requests to reduce coherence overhead in multi-node systems
    20.
    发明授权
    Semantic response to lock requests to reduce coherence overhead in multi-node systems 有权
    语义响应锁定请求以减少多节点系统中的一致性开销

    公开(公告)号:US08086579B1

    公开(公告)日:2011-12-27

    申请号:US10056716

    申请日:2002-01-22

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    CPC分类号: G06F17/30362

    摘要: Techniques are provided for lock management. The techniques are based on an enhanced lock management system that generates a semantic response in response to lock requests for a resource. The semantic response communicates both the underlying cause blocking the request, and information that may be used by the requester to obtain notification of when the underlying cause should no longer lead to denial of the lock request. The semantic response may be generated by the master of the resource, who provides the semantic response to the local lock manager of the lock requester. The semantic response may be retained by the local lock manager so that the semantic response can be provided to subsequent lock requesters, without need for interacting with another lock manager on another node.

    摘要翻译: 提供了锁管理技术。 这些技术基于增强的锁管理系统,其响应于对资源的锁请求而产生语义响应。 语义响应传达阻止请求的基本原因以及请求者可以使用的信息,以获得何时不再导致拒绝锁定请求的通知。 语义响应可以由资源的主人产生,该资源的主人向锁请求者的本地锁管理器提供语义响应。 语义响应可以由本地锁管理器保留,使得可以将语义响应提供给后续的锁请求者,而不需要与另一节点上的另一个锁管理器进行交互。