Data clustering based on candidate queries

    公开(公告)号:US10572511B2

    公开(公告)日:2020-02-25

    申请号:US15171168

    申请日:2016-06-02

    摘要: Received data records, each including one or more values in one or more fields, are processed to identify a matched data cluster. The processing includes: for selected data records, generating a query from one or more values; identifying one or more candidate data records from the received data records using the query; determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion.

    Managing a computing cluster using time interval counters

    公开(公告)号:US10558493B2

    公开(公告)日:2020-02-11

    申请号:US16175133

    申请日:2018-10-30

    摘要: A method for processing state update requests in a distributed data processing system with a number of processing nodes includes maintaining a number of counters including a working counter indicating a current time interval, a replication counter indicating a time interval for which all requests associated with that time interval are replicated at multiple processing nodes of the number of processing nodes, and a persistence counter indicating a time interval of the number of time intervals for which all requests associated with that time interval are stored in persistent storage. The counters are used to manage processing of the state update requests.

    Data processing graph compilation
    34.
    发明授权

    公开(公告)号:US10423395B2

    公开(公告)日:2019-09-24

    申请号:US16042205

    申请日:2018-07-23

    IPC分类号: G06F8/41 G06F8/40

    摘要: A received graph-based program specification includes: a plurality of components, each corresponding to at least one operation; and a plurality of directed links each connecting an upstream component to a downstream component. Processed code is generated representing one or more groups of operations by: identifying a possible level of concurrency in a first group of operations based at least in part on a topology of the graph, such that multiple operations in the first group are not prevented by the topology of the graph from executing concurrently; analyzing at least some of the operations in the first group to determine runtime characteristics associated with the analyzed operations; and generating processed code for executing the operations, where the processed code enforces a reduced level of concurrency in the first group, lower than the identified possible level of concurrency, based at least in part on the determined runtime characteristics.

    Storing and retrieving data of a data cube

    公开(公告)号:US10210236B2

    公开(公告)日:2019-02-19

    申请号:US14949391

    申请日:2015-11-23

    IPC分类号: G06F7/00 G06F17/30

    摘要: Among other things, we describe a technique for storing data of a data cube in one or more flat files. We also describe a technique for processing a query to access data of a data cube. These techniques can be implemented in a number of ways, including as a method, system, and/or computer program product stored on a computer readable storage device. One of the techniques includes receiving a set of data records having at least two dimensions, generating a set of grouped data records ordered by cardinality, and generating and storing at least one flat file containing the set of grouped data records, wherein a particular data record of the grouped data records includes a primary key that can be used to identify data of the particular data record in response to a request.

    Specifying and applying rules to data

    公开(公告)号:US10191924B2

    公开(公告)日:2019-01-29

    申请号:US14886541

    申请日:2015-10-19

    IPC分类号: G06F17/30 G06Q30/02

    摘要: A computing system processes data units using one of at least two different modes of applying a rule. In a first mode, data units are received in a particular order and are processed including writing an updated value to at least one state variable based on a result of applying the rule to the data unit. In a second mode, a selection of particular data units is processed including determining a first set of data units including an ordered subset of data units that occur before the particular data unit from the number of data units, prior to applying the rule to the particular data unit, updating at least one state variable to a state that would result from processing the first set of data units in the first mode, and applying the rule to the particular data unit including reading the updated value of the state variable.

    Mapping attributes of keyed entities

    公开(公告)号:US10191863B2

    公开(公告)日:2019-01-29

    申请号:US14658440

    申请日:2015-03-16

    IPC分类号: G06F17/30 G06F13/10 G06Q10/06

    摘要: One or more mappings each define a correspondence between input attributes of an input entity and output attributes of an output entity, where the input out output entities each include one or more key attributes identified as part of a unique key. Computing result information, displayed in a user interface, includes: processing instances of a first input entity to generate instances of a first output entity; determining one or more mapped input attributes of the first input entity that correspond to each of the key attributes of the first output entity; generating the instances of the first output entity based on the determined one or more mapped input attributes; computing a total number of instances of the first input entity that were processed; and computing a total number of instances of the first output entity that were generated.

    Evaluating rules applied to data
    39.
    发明授权

    公开(公告)号:US09984059B2

    公开(公告)日:2018-05-29

    申请号:US14495951

    申请日:2014-09-25

    摘要: Specifying rules for processing data included in fields of elements of a dataset includes rendering user interface elements associated with a respective condition. The user interface elements include: first subsets of user interface elements, at least some of which are associated with an input value derived from at least one field, and second subsets of user interface elements, each configured to receive user input associated with a respective condition. Conditions are applied to at least a first element of the dataset based on user input received from at least some of the user interface elements, in response to receiving user input for a first user interface element associated with a first field. Instructions are generated for applying one or more selected conditions associated with fewer than all of the user interface elements, the selected conditions including at least a condition associated with the first user interface element.