METHOD AND SYSTEM FOR DYNAMICALLY MANAGING BIG DATA IN HIERARCHICAL CLOUD STORAGE CLASSES TO IMPROVE DATA STORING AND PROCESSING COST EFFICIENCY
    1.
    发明申请
    METHOD AND SYSTEM FOR DYNAMICALLY MANAGING BIG DATA IN HIERARCHICAL CLOUD STORAGE CLASSES TO IMPROVE DATA STORING AND PROCESSING COST EFFICIENCY 审中-公开
    在分层云存储类中动态管理大数据的方法和系统,以提高数据存储和处理成本效率

    公开(公告)号:US20140325151A1

    公开(公告)日:2014-10-30

    申请号:US13870165

    申请日:2013-04-25

    CPC classification number: G06F16/185

    Abstract: A system and method for autonomic data storage and movement for big data analytics. A cost, such as storing cost and a processing cost are calculated for received data. The processing type associated with the received data is determined in response to the calculated costs. The received data is classified as one of a set of hierarchical storage classes based upon the determined processing type. The hierarchical storage classes include no data store, memory, HDFS, database, disk archive, external clouds, and data removal. The received data is then stored in the storage location associated with that class. In the event that insufficient capacity is available in the location, the priority of the received data and the priority of previously stored data is determined and compared. The priority is calculated based on potential usage, privacy, estimated cost, frequency of usages and the age of data. The lower priority data is then moved to the next lower hierarchical class for storage.

    Abstract translation: 用于大数据分析的自动数据存储和移动的系统和方法。 对接收到的数据计算成本,例如存储成本和处理成本。 与所接收的数据相关联的处理类型是根据所计算的成本来确定的。 基于所确定的处理类型,接收的数据被分类为一组分层存储类别之一。 分级存储类不包括数据存储,内存,HDFS,数据库,磁盘归档,外部云和数据删除。 然后将接收的数据存储在与该类相关联的存储位置中。 在该位置的容量不足的情况下,确定并比较接收到的数据的优先级和先前存储的数据的优先级。 优先权是根据潜在的使用,隐私,估计成本,使用频率和数据年龄计算的。 然后将较低优先级的数据移动到下一个较低级别的类别进行存储。

    SYSTEMS AND METHODS FOR MANAGING DUPLICATION OF OPERATIONS
    2.
    发明申请
    SYSTEMS AND METHODS FOR MANAGING DUPLICATION OF OPERATIONS 有权
    用于管理操作重复的系统和方法

    公开(公告)号:US20140129575A1

    公开(公告)日:2014-05-08

    申请号:US13668772

    申请日:2012-11-05

    CPC classification number: G06F8/36 G06F17/30693

    Abstract: The present invention generally relates to systems and methods for executing scripts (a sequence of declarative operations) on large data sets. Some implementations store descriptions of previously-executed operations and associated input and output data sets. When executing similar operations on the same, a subset of, a superset of, or any fragment of data subsequently, some implementations detect duplication of operations and access previously-stored output data sets in order to re-use data and reduce the amount of execution, thus avoiding time-consuming duplicative computations.

    Abstract translation: 本发明一般涉及用于在大数据集上执行脚本(一系列声明性操作)的系统和方法。 一些实现存储先前执行的操作和关联的输入和输出数据集的描述。 当执行类似的操作时,随后的数据的超集或数据片段的一部分,一些实现检测重复的操作并访问先前存储的输出数据集,以便重新使用数据并减少执行量 ,从而避免耗时的重复计算。

    METHOD AND APPARATUS FOR RIPPLE RATE SENSITIVE AND BOTTLENECK AWARE RESOURCE ADAPTATION FOR REAL-TIME STREAMING WORKFLOWS
    3.
    发明申请
    METHOD AND APPARATUS FOR RIPPLE RATE SENSITIVE AND BOTTLENECK AWARE RESOURCE ADAPTATION FOR REAL-TIME STREAMING WORKFLOWS 有权
    用于实时流动工作流程的RIPPLE RATE SENSITIVE和BOTTLENECK AWARE资源适应性的方法和装置

    公开(公告)号:US20160050151A1

    公开(公告)日:2016-02-18

    申请号:US14462044

    申请日:2014-08-18

    Abstract: A method, non-transitory computer readable medium, and apparatus for adapting resources of the cluster of nodes for a real-time streaming workflow are disclosed. For example, the method receives a notification that a node of the cluster of nodes associated with an instance of a process of the real-time streaming workflow is predicted to be a bottleneck, identifies a number of hops to send a resource statement when the bottleneck is predicted that minimizes a ripple effect associated with transmitting the resource statement, transmits the resource statement to at least one or more nodes of the cluster of nodes within the number of hops, receives a response from one of the at least one or more nodes within the cluster of nodes and adapts a resource usage to the at least one of the one or more nodes within the cluster of nodes that the response was received from.

    Abstract translation: 公开了一种用于调整用于实时流工作流的节点簇的资源的方法,非暂时计算机可读介质和装置。 例如,该方法接收到与实时流工作流的流程的实例相关联的节点簇的节点被预测为瓶颈的通知,当瓶颈时识别发送资源语句的跳数 被预测为最小化与发送资源语句相关联的纹波效应,将资源语句发送到跳数内的节点簇的至少一个或多个节点,从其中的至少一个或多个节点之一接收响应 所述节点群集并且将资源使用适应于从所述节点接收到的节点簇内的所述一个或多个节点中的至少一个节点。

    Systems and methods for managing duplication of operations
    4.
    发明授权
    Systems and methods for managing duplication of operations 有权
    管理重复操作的系统和方法

    公开(公告)号:US09563409B2

    公开(公告)日:2017-02-07

    申请号:US13668772

    申请日:2012-11-05

    CPC classification number: G06F8/36 G06F17/30693

    Abstract: The present invention generally relates to systems and methods for executing scripts (a sequence of declarative operations) on large data sets. Some implementations store descriptions of previously-executed operations and associated input and output data sets. When executing similar operations on the same, a subset of, a superset of, or any fragment of data subsequently, some implementations detect duplication of operations and access previously-stored output data sets in order to re-use data and reduce the amount of execution, thus avoiding time-consuming duplicative computations.

    Abstract translation: 本发明一般涉及用于在大数据集上执行脚本(一系列声明性操作)的系统和方法。 一些实现存储先前执行的操作和关联的输入和输出数据集的描述。 当执行类似的操作时,随后的数据的超集或数据片段的一部分,一些实现检测重复的操作并访问先前存储的输出数据集,以便重新使用数据并减少执行量 ,从而避免耗时的重复计算。

    Method and apparatus for a user-driven priority based job scheduling in a data processing platform
    6.
    发明授权
    Method and apparatus for a user-driven priority based job scheduling in a data processing platform 有权
    一种用于数据处理平台中基于用户优先级的作业调度的方法和装置

    公开(公告)号:US09304817B2

    公开(公告)日:2016-04-05

    申请号:US14089253

    申请日:2013-11-25

    Inventor: Hyun Joo Kim

    Abstract: A method, non-transitory computer readable medium, and apparatus for configuring a scheduling a job request in a data processing platform are disclosed. The method receives a new job request having a priority selected by a user, submits the new job request to an online job queue comprising a plurality of jobs, wherein each one of the plurality of jobs comprises a respective priority selected by a respective user and schedules the new job request and the plurality of jobs in the online job queue to one or more available worker nodes in a unit time slot based upon a comparison of the priority of the new job and the respective priority of the plurality of jobs in the online job queue, wherein the scheduling algorithm is based on one of: blocks having a variable size and a static processing time or blocks having a static size and a variable processing time.

    Abstract translation: 公开了一种在数据处理平台中配置调度作业请求的方法,非暂时计算机可读介质和装置。 该方法接收具有由用户选择的优先级的新作业请求,将新的作业请求提交到包括多个作业的在线作业队列,其中多个作业中的每个作业包括由相应的用户选择的相应优先级和调度 基于新作业的优先级与在线作业中的多个作业的各自的优先级的比较,在单位时间段中将新的作业请求和在线作业队列中的多个作业提供给一个或多个可用的工作者节点 队列,其中调度算法基于以下之一:具有可变大小和静态处理时间的块或具有静态大小和可变处理时间的块。

Patent Agency Ranking