Abstract:
A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.
Abstract:
Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.
Abstract:
Aspects of the present disclosure include a system comprising a computer-readable storage medium storing at least one program and a method for managing access permissions associated with data resources. Example embodiments involve evaluating user access permissions with respect to shared data resources of a group of network applications. The method includes receiving a request, from one of the network applications, to access a particular data resource. The request includes an identifier of a requesting user. The method further includes accessing a policy object associated with the data resource that includes policy information specifying operations the user is authorized to perform with respect to the data resource based on satisfaction of one or more conditions. The method further includes evaluating the user's access permissions with respect to the data resource based on the policy object, and communicating a response to the network application that includes the access permission of the user.
Abstract:
The systems and methods described herein provide highly dynamic and interactive data analysis user interfaces which enable data analysts to quickly and efficiently explore large volume data sources. In particular, a data analysis system, such as described herein, may provide features to enable the data analyst to investigate large volumes of data over many different paths of analysis while maintaining detailed and retraceable steps taken by the data analyst over the course of an investigation, as captured via the data analyst's queries and user interaction with the user interfaces provided by the data analysis system. Data analysis paths may involve exploration of high volume data sets, such as Internet proxy data, which may include trillions of rows of data. The data analyst may pursue a data analysis path that involves, among other things, applying filters, joining to other tables in a database, viewing interactive data visualizations, and so on.
Abstract:
Computer-implemented systems and methods are disclosed for constructing a parser that parses complex data. In some embodiments, a method is provided for receiving a parser definition as an input to a parser generator and generating a parser at least in part from the parser definition. In some embodiments, the generated parser comprises two or more handlers forming a processing pipeline. In some embodiments, the parser receives as input a first string into the processing pipeline. In some embodiments, the parser generates a second string by a first handler and inputs the second string regeneratively into the parsing pipeline, if the first string matches an expression specified for the first handler in the parser definition.
Abstract:
The systems and methods described herein provide highly dynamic and interactive data analysis user interfaces which enable data analysts to quickly and efficiently explore large volume data sources. In particular, a data analysis system, such as described herein, may provide features to enable the data analyst to investigate large volumes of data over many different paths of analysis while maintaining detailed and retraceable steps taken by the data analyst over the course of an investigation, as captured via the data analyst's queries and user interaction with the user interfaces provided by the data analysis system. Data analysis paths may involve exploration of high volume data sets, such as Internet proxy data, which may include trillions of rows of data. The data analyst may pursue a data analysis path that involves, among other things, applying filters, joining to other tables in a database, viewing interactive data visualizations, and so on.
Abstract:
A computer system can receive one or more edits to be made to a canonical dataset and can temporarily store the one or more edits in a buffer. In response to receipt of a query of the canonical dataset, the computer system can rewrite the query to read from the canonical dataset and the buffer; combine the one or more edits from the buffer with the canonical dataset to form a combined dataset based on resolution policies to avoid conflicts between data; rewrite the query to execute on the combined dataset in lieu of the canonical dataset to optimize query performance; and execute the query on the combined dataset.
Abstract:
A method comprises receiving, at a host, a request to set new service configuration information for a target service in a distributed computing environment; retrieving a current revision identifier of a current revision of service configuration information for the target service from a revision index key in a local replica of a configuration store, the revision index key storing one or more key-value pairs, a key in a specific key-value pair identifying the target service; assigning a new revision identifier based on the current revision identifier; writing the new service configuration information into a new revision of the service configuration information in the local replica; updating the revision index key in an atomic compare-and-swap operation, the compare comprising verifying that the current revision identifier in the revision index key has remained the same since the retrieving, the swap comprising updating the specific key-value pair with the new revision identifier.
Abstract:
Systems, methods, and non-transitory computer readable media are provided for providing a redundancy tool for performing transactions. A transaction to be performed on a data stored in a database may be received. A timestamp may be assigned to the transaction. A lock may be assigned on the timestamp. The transaction may be performed on the data. The lock may be refreshed while performing the transaction on the data. A validity of the lock may be checked at after performing the transaction on the data. Responsive to the lock being valid, a result of performing the transaction on the data may be committed.
Abstract:
An apparatus, and a method, performed by one or more processors are disclosed. The method receives a build request associated with performing an external data processing task on a first data set, the first data set being stored in memory associated with a data processing platform to be performed at a system external to the data processing platform. The method generates a task identifier for the data processing task, and provides, in association with the task identifier, the first data set to an agent associated with the external system with an indication of the data processing task, the agent being arranged to cause performance of the task at the external system, to receive a second data set resulting from performance of the task, and to provide the second data set and associated metadata indicative of the transformation. The method receives the second data set and metadata from the agent associated with the external system and stores the second data set and associated metadata.