Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Saurabh Dileep Baji"

1.

发明授权
Fault tolerant distributed tasks using distributed file systems 有权

公开(公告)号：US09672122B1

公开(公告)日：2017-06-06

申请号：US14500762

申请日：2014-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Mohana Sudhan Gandhi , Rejith George Joseph , Bandish N. Chheda , Saurabh Dileep Baji

IPC: G06F11/07 , G06F11/20 , G06F17/30

CPC classification number: G06F11/1425 , G06F9/5088 , G06F11/1438 , G06F11/1484 , G06F11/203 , G06F11/2035 , G06F11/2041 , G06F11/2048 , G06F11/3433 , G06F17/30215 , G06F2201/805 , G06F2201/84

Abstract: Data files in a distributed system sometimes becomes unavailable. A method for fault tolerance without data loss in a distributed file system includes allocating data nodes of the distributed file system among a plurality of compute groups, replicating a data file among a subset of the plurality of the compute groups such that the data file is located in at least two compute zones, wherein the first compute zone is isolated from the second compute zone, monitoring the accessibility of the data files, and causing a distributed task requiring data in the data file to be executed by a compute instance in the subset of the plurality of the compute groups. Upon detecting a failure in the accessibility of a data node with the data file, the task management node may redistribute the distributed task among other compute instances with access to any replica of the data file.

2.

发明授权
Automatic scaling of resource instance groups within compute clusters 有权

公开(公告)号：US12069128B2

公开(公告)日：2024-08-20

申请号：US17352065

申请日：2021-06-18

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan Daly Einkauf , Luca Natali , Bhargava Ram Kalathuru , Saurabh Dileep Baji , Abhishek Rajnikant Sinha

IPC: H04L41/0893 , G06F9/50 , H04L41/0894 , H04L41/0897 , H04L41/22 , H04L41/5041 , H04L43/0876 , H04L67/10 , H04L67/1031 , H04L67/1074

CPC classification number: H04L67/1076 , G06F9/5077 , G06F9/5083 , H04L41/0893 , H04L41/0894 , H04L41/0897 , H04L41/22 , H04L41/5045 , H04L43/0876 , H04L67/10 , H04L67/1031

Abstract: A service provider may apply customer-selected or customer-defined auto-scaling policies to a cluster of resources (e.g., virtualized computing resource instances or storage resource instances in a MapReduce cluster). Different policies may be applied to different subsets of cluster resources (e.g., different instance groups containing nodes of different types or having different roles). Each policy may define an expression to be evaluated during execution of a distributed application, a scaling action to take if the expression evaluates true, and an amount by which capacity should be increased or decreased. The expression may be dependent on metrics emitted by the application, cluster, or resource instances by default, metrics defined by the client and emitted by the application, or metrics created through aggregation. Metric collection, aggregation and rules evaluation may be performed by a separate service or by cluster components. An API may support auto-scaling policy definition.

3.

发明授权
Automatic scaling of resource instance groups within compute clusters 有权

公开(公告)号：US11044310B2

公开(公告)日：2021-06-22

申请号：US16805412

申请日：2020-02-28

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan Daly Einkauf , Luca Natali , Bhargava Ram Kalathuru , Saurabh Dileep Baji , Abhishek Rajnikant Sinha

IPC: H04L29/08 , H04L12/26 , H04L12/24 , G06F9/50

Abstract: A service provider may apply customer-selected or customer-defined auto-scaling policies to a cluster of resources (e.g., virtualized computing resource instances or storage resource instances in a MapReduce cluster). Different policies may be applied to different subsets of cluster resources (e.g., different instance groups containing nodes of different types or having different roles). Each policy may define an expression to be evaluated during execution of a distributed application, a scaling action to take if the expression evaluates true, and an amount by which capacity should be increased or decreased. The expression may be dependent on metrics emitted by the application, cluster, or resource instances by default, metrics defined by the client and emitted by the application, or metrics created through aggregation. Metric collection, aggregation and rules evaluation may be performed by a separate service or by cluster components. An API may support auto-scaling policy definition.

4.

发明授权
Fault-tolerant parallel computation 有权

公开(公告)号：US10936432B1

公开(公告)日：2021-03-02

申请号：US14495408

申请日：2014-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Tin-Yu Lee , Rejith George Joseph , Scott Michael Le Grand , Saurabh Dileep Baji

IPC: G06F11/00 , G06F11/14

Abstract: Methods, systems, and computer-readable media for implementing a fault-tolerant parallel computation framework are disclosed. Execution of an application comprises execution of a plurality of processes in parallel. Process states for the processes are stored during the execution of the application. The processes use a message passing interface for exchanging messages with one other. The messages are exchanged and the process states are stored at a plurality of checkpoints during execution of the application. A final successful checkpoint is determined after the execution of the application is terminated. The final successful checkpoint represents the most recent checkpoint at which the processes exchanged messages successfully. Execution of the application is resumed from the final successful checkpoint using the process states stored at the final successful checkpoint.

5.

发明授权
Executing parallel jobs with message passing on compute clusters 有权

公开(公告)号：US10148736B1

公开(公告)日：2018-12-04

申请号：US14281582

申请日：2014-05-19

Applicant: Amazon Technologies, Inc.

Inventor： Tin-Yu Lee , Rejith George Joseph , Scott Michael Le Grand , Saurabh Dileep Baji , Peter Sirota

IPC: H04L29/08 , G06F9/50

Abstract: A client may submit a job to a service provider that processes a large data set and that employs a message passing interface (MPI) to coordinate the collective execution of the job on multiple compute nodes. The framework may create a MapReduce cluster (e.g., within a VPC) and may generate a single key pair for the cluster, which may be downloaded by nodes in the cluster and used to establish secure node-to-node communication channels for MPI messaging. A single node may be assigned as a mapper process and may launch the MPI job, which may fork its commands to other nodes in the cluster (e.g., nodes identified in a hostfile associated with the MPI job), according to the MPI interface. A rankfile may be used to synchronize the MPI job and another MPI process used to download portions of the data set to respective nodes in the cluster.

6.

发明申请
Automatic Scaling of Resource Instance Groups Within Compute Clusters 有权

公开(公告)号：US20210392185A1

公开(公告)日：2021-12-16

申请号：US17352065

申请日：2021-06-18

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan Daly Einkauf , Luca Natali , Bhargava Ram Kalathuru , Saurabh Dileep Baji , Abhishek Rajnikant Sinha

IPC: H04L29/08 , H04L12/26 , H04L12/24 , G06F9/50

Abstract: A service provider may apply customer-selected or customer-defined auto-scaling policies to a cluster of resources (e.g., virtualized computing resource instances or storage resource instances in a MapReduce cluster). Different policies may be applied to different subsets of cluster resources (e.g., different instance groups containing nodes of different types or having different roles). Each policy may define an expression to be evaluated during execution of a distributed application, a scaling action to take if the expression evaluates true, and an amount by which capacity should be increased or decreased. The expression may be dependent on metrics emitted by the application, cluster, or resource instances by default, metrics defined by the client and emitted by the application, or metrics created through aggregation. Metric collection, aggregation and rules evaluation may be performed by a separate service or by cluster components. An API may support auto-scaling policy definition.

7.

发明授权
Isolating compute clusters created for a customer 有权

公开(公告)号：US10659523B1

公开(公告)日：2020-05-19

申请号：US14286724

申请日：2014-05-23

Applicant: Amazon Technologies, Inc.

Inventor： Rejith George Joseph , Tin-Yu Lee , Scott Michael Le Grand , Saurabh Dileep Baji

IPC: G06F15/16 , H04L29/08

Abstract: At the request of a customer, a distributed computing service provider may create multiple clusters under a single customer account, and may isolate them from each other. For example, various isolation mechanisms (or combinations of isolation mechanisms) may be applied when creating the clusters to isolate a given cluster of compute nodes from network traffic from compute nodes of other clusters (e.g., by creating the clusters in different VPCs); to restrict access to data, metadata, or resources that are within the given cluster of compute nodes or that are associated with the given cluster of compute nodes by compute nodes of other clusters in the distributed computing system (e.g., using an instance metadata tag and/or a storage system prefix); and/or restricting access to application programming interfaces of the distributed computing service by the given cluster of compute nodes (e.g., using an identity and access manager).

8.

发明授权
Automatic scaling of resource instance groups within compute clusters 有权

公开(公告)号：US09848041B2

公开(公告)日：2017-12-19

申请号：US14702080

申请日：2015-05-01

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan Daly Einkauf , Luca Natali , Bhargava Ram Kalathuru , Saurabh Dileep Baji , Abhishek Rajnikant Sinha

IPC: G06F15/173 , H04L29/08 , H04L12/26 , H04L12/24 , G06F9/50

CPC classification number: H04L67/1076 , G06F9/5077 , G06F9/5083 , H04L41/0893 , H04L43/0876 , H04L67/10 , H04L67/1031

Abstract: A service provider may apply customer-selected or customer-defined auto-scaling policies to a cluster of resources (e.g., virtualized computing resource instances or storage resource instances in a MapReduce cluster). Different policies may be applied to different subsets of cluster resources (e.g., different instance groups containing nodes of different types or having different roles). Each policy may define an expression to be evaluated during execution of a distributed application, a scaling action to take if the expression evaluates true, and an amount by which capacity should be increased or decreased. The expression may be dependent on metrics emitted by the application, cluster, or resource instances by default, metrics defined by the client and emitted by the application, or metrics created through aggregation. Metric collection, aggregation and rules evaluation may be performed by a separate service or by cluster components. An API may support auto-scaling policy definition.

9.

发明申请
FAULT TOLERANT DISTRIBUTED TASKS USING DISTRIBUTED FILE SYSTEMS 审中-公开

公开(公告)号：US20170249215A1

公开(公告)日：2017-08-31

申请号：US15595732

申请日：2017-05-15

Applicant: Amazon Technologies, Inc.

Inventor： Mohana Sudhan Gandhi , Rejith George Joseph , Bandish N. Chheda , Saurabh Dileep Baji

IPC: G06F11/14 , G06F9/50 , G06F11/34 , G06F11/20

CPC classification number: G06F11/1425 , G06F9/5088 , G06F11/1438 , G06F11/1484 , G06F11/203 , G06F11/2035 , G06F11/2041 , G06F11/2048 , G06F16/1844 , G06F2201/84

Abstract: Data files in a distributed system sometimes becomes unavailable. A method for fault tolerance without data loss in a distributed file system includes allocating data nodes of the distributed file system among a plurality of compute groups, replicating a data file among a subset of the plurality of the compute groups such that the data file is located in at least two compute zones, wherein the first compute zone is isolated from the second compute zone, monitoring the accessibility of the data files, and causing a distributed task requiring data in the data file to be executed by a compute instance in the subset of the plurality of the compute groups. Upon detecting a failure in the accessibility of a data node with the data file, the task management node may redistribute the distributed task among other compute instances with access to any replica of the data file.

10.

发明授权
Automatic scaling of resource instance groups within compute clusters 有权

公开(公告)号：US10581964B2

公开(公告)日：2020-03-03

申请号：US15845855

申请日：2017-12-18

Applicant: Amazon Technologies, Inc.

Inventor： Jonathan Daly Einkauf , Luca Natali , Bhargava Ram Kalathuru , Saurabh Dileep Baji , Abhishek Rajnikant Sinha

IPC: H04L29/08 , G06F9/50 , H04L12/24 , H04L12/26

Abstract: A service provider may apply customer-selected or customer-defined auto-scaling policies to a cluster of resources (e.g., virtualized computing resource instances or storage resource instances in a MapReduce cluster). Different policies may be applied to different subsets of cluster resources (e.g., different instance groups containing nodes of different types or having different roles). Each policy may define an expression to be evaluated during execution of a distributed application, a scaling action to take if the expression evaluates true, and an amount by which capacity should be increased or decreased. The expression may be dependent on metrics emitted by the application, cluster, or resource instances by default, metrics defined by the client and emitted by the application, or metrics created through aggregation. Metric collection, aggregation and rules evaluation may be performed by a separate service or by cluster components. An API may support auto-scaling policy definition.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification