摘要:
The present invention provides a method, apparatus, and computer implemented instructions for processing Web and other Internet or Intranet based services. The system for processing Web requests includes a Web server with a connection to the Internet or Intranet with a predefined network bandwidth, a set of primary Web and application server cluster nodes to process the requests, and a dispatcher to allocate requests to nodes; in addition, one or more offload server nodes are connected to the network. Client Web requests arrive at the dispatcher of the Web server, which determines whether the incoming request can be handled at the primary Web server cluster, whether all or part of the user Web request should be offloaded to one of the offload server nodes, or whether the request should be throttled. If the dispatcher determines that the request should be handled by the primary Web server cluster, it is appropriately routed to one of the nodes in the primary Web server cluster; else if the dispatcher determines that the request should be offloaded, one of the offload server nodes or service providers is selected, and the request is either routed to a primary server node with the appropriate indication to offload all or part of the request, or the request is routed to the selected offload service provider; otherwise, the request is throttled by either routing it to a node which returns information that the service is overloaded, or if the Web servers are too busy to provide even an overload indication, then the request is dropped.
摘要:
An on-demand manager provides an improved distributed data processing system for facilitating dynamic allocation of computing resources among multiple domains based on a current workload and service level agreements. Based on a service level agreement, the on-demand manager monitors and predicts the load on the system. If the current or predicted load cannot be handled with the current system configuration, the on-demand manager determines additional resources needed to handle the workload. If the service level agreement violations cannot be handled by reconfiguring resources at a domain, the on-demand manager sends a resource request to other domains. These other domains analyze their own commitments and may accept the resource request, reject the request, or counter-propose with an offer of resources and a corresponding service level agreement. Once the requesting domain has acquired resources, workload load balancers are reconfigured to allocate some of the workload from the requesting site to the acquired remote resources.
摘要:
Identifying traffic patterns to web sites based on templates that characterize the arrival of traffic to the web sites is provided. Based on these templates, determinations are made as to which web sites should be co-located so as to optimize resource allocation. Web sites whose templates are complimentary, i.e. a first web site having a peak in arrival traffic at time t1 and a second web site that has a trough in arrival traffic at time t1, are designated as being candidates for co-location. In addition, the templates identified for the traffic patterns of web sites are used to determine thresholds for offloading traffic to other servers. These thresholds include a first threshold at which offloading should be performed, a second threshold that takes into consideration the lead time needed to begin offloading, and a third threshold that takes into consideration a lag time needed to stop offloading of traffic.
摘要:
An affinity-based router and method for routing and load balancing in an encapsulated cluster of server nodes is disclosed. The system consists of a multi-node server, wherein any of the server nodes can handle a client request, but wherein clients have affinity to one or more of the server nodes that are preferred to handle a client request. Such affinity is due to state at the servers either due to previous routing requests, or data affinity at the server. At the multi-node server, a node may be designated as a TCP router. The address of the TCP router is given out to clients, and client requests are sent thereto. The TCP router selects one of the nodes in the multi-node server to process the client request, and routes the request to this server; in addition, the TCP router maintains affinity tables, containing affinity records, indicating which node a client was routed to. In processing the client request, the server nodes may determine that another node is better suited to handle the client request, and may reset the corresponding TCP router affinity table entry. The server nodes may also create, modify or delete affinity records in the TCP router affinity table. Subsequent requests from this client are routed to server nodes based on any affinity records, possibly combined on other information (such as load).
摘要:
A method and system for recovering from a failure of a processing node in a partitioned shared nothing database processing system are provided. The processing system may include a pair of processing nodes having twin-tailed-connected thereto a storage device. A first processing node of the pair of processing nodes has a first database instance running thereon which accesses a first data partition on the storage device prior to the failure. Upon detection of the failure, access to the first data partition on the storage device is provided to a third, spare processing node through the second processing node of the pair of processing nodes. The third processing node runs a replacement database instance for the first database instance which was running on the first processing node prior to the failure thereof. The replacement database instance accesses the first data partition on the storage device through the second processing node, thereby recovering from the failure of the first processing node. Access to the first data partition may include using a virtual shared disk utility having a server portion on the second processing node and a client portion on the third processing node.
摘要:
Apparatus and methods for identifying traffic patterns to web sites based on templates that characterize the arrival of traffic to the web sites are provided. Based on these templates, determinations are made as to which web sites should be co-located so as to optimize resource allocation. Specifically, web sites whose templates are complimentary, i.e. a first web site having a peak in arrival traffic at time t1 and a second web site that has a trough in arrival traffic at time t1, are designated as being candidates for co-location. In addition, the present invention uses the templates identified for the traffic patterns of web sites to determine thresholds for offloading traffic to other servers. These thresholds include a first threshold at which offloading should be performed, a second threshold that takes into consideration the lead time needed to begin offloading, and a third threshold that takes into consideration a lag time needed to stop all offloading of traffic to the other servers.
摘要:
A system and method for a general and extensible infrastructure providing monitoring and recovery of interdependent systems in a distributed/clustered system is disclosed. Subsystems, built without provision for high availability, are incorporated into the infrastructure without modification to core subsystem function. The infrastructure is comprised of one or more computing nodes connected by one or more interconnection networks, and running one or more distributed subsystems. The infrastructure monitors the computing nodes using one or more heartbeat and membership protocols, and monitors the said distributed subsystems by subsystem-specific monitors. Events detected by monitors are sent to event handlers. Event handlers process events by filtering them through activities such as event correlation, removal of duplicates, and rollup. Filtered events are given by Event Managers to Recovery Drivers which determine the recovery program corresponding to the event, and executing the recovery program or set of recovery actions by coordination among the recovery managers. Given failures of said event handlers or recovery managers, the infrastructure performs the additional steps of: coordinating among remaining event handlers and recovery managers to handle completion or termination of ongoing recovery actions, discovering the current state of the system by resetting the said monitors, and handling any new failure events that may have occurred in the interim.
摘要:
A system and method for hierarchically caching objects includes one or more level 1 nodes, each including at least one level 1 cache; one or more level 2 nodes within which the objects are permanently stored or generated upon request, each level 2 node coupled to at least one of the one or more level 1 nodes and including one or more level 2 caches; and means for storing, in a coordinated manner, one or more objects in at least one level 1 cache and/or at least one level 2 cache, based on a set of one or more criteria. Furthermore, in a system adapted to receive requests for objects from one or more clients, the system having a set of one or more level 1 nodes, each containing at least one level 1 cache, a method for managing a level 1 cache includes the steps of applying, for part of the at least one level 1 cache, a cache replacement policy designed to minimize utilization of a set of one or more resources in the system; and using, for other parts of the at least one level 1 cache, one or more other cache replacement policies designed to minimize utilization of one or more other sets of one or more resources in the system.
摘要:
A system and method for hierarchically caching objects includes one or more level 1 nodes, each including at least one level 1 cache; one or more level 2 nodes within which the objects are permanently stored or generated upon request, each level 2 node coupled to at least one of the one or more level 1 nodes and including one or more level 2 caches; and a storage control device for storing, in a coordinated manner, one or more objects in at least one level 1 cache and/or at least one level 2 cache, based on a set of one or more criteria. Furthermore, in a system for receiving requests for objects from one or more clients, the system having a set of one or more level 1 nodes, each containing at least one level 1 cache, a method for managing a level 1 cache includes the steps of applying, for part of the at least one level 1 cache, a cache replacement policy designed to minimize utilization of a set of one or more resources in the system; and using, for other parts of the at least on level 1 cache, one or more other cache replacement policies designed to minimize utilization of one or more other sets of one or more resources in the system.
摘要:
A method and system for controlling and guaranteeing a service level agreement (SLA) based on a communications outbound link bandwidth usage to a plurality of customers having electronic business activity hosted by at least one server as a server farm, includes monitoring the outbound communications bandwidth usage by each customer traffic to determine a level of service being provided to each customer with respect to the agreed service level agreement in each service cycle time per unit of time. The flow of incoming requests to each customer business activity application is controlled so as to guarantee a level of service previously agreed to the customer by queuing requests to the customer and by selectively dropping requests to the customer to guarantee the agreed service levels to the customer. The controlling process controls and guarantees each outbound link usage based service level agreement by controlling the flow of incoming requests to the at least one server.