Abstract:
One or more processors examine source code of one or more software packages that produce output messages and identify, in the source code, one or more call expressions that each represent a logging call. The one or more processors generate a number of search patterns for parsing output messages produced by the one or more software packages, wherein each of the search patterns is based on one or more arguments of a corresponding call expression of the one or more call expressions. The one or more processors further reduce the number of search patterns to be applied to the output messages produced by the one or more software packages to identify log entries among the output messages.
Abstract:
Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.
Abstract:
Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification has a source repository identifier that identifies a source repository including a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction includes a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.
Abstract:
A method and system for serving assets is disclosed, comprising receiving an asset request to serve an asset, wherein the asset request originates at an application, and wherein the asset request comprises an advertisement of an asset to be served and a request for the network address of an asset server configured to serve the requested asset. The method further comprises causing a service discovery server to identify an asset server configured to serve the requested asset, and causing the requested asset to be served to the application.
Abstract:
A computer-implemented system and method for data revision control in a large-scale data analytic systems. In one embodiment, for example, a computer-implemented method comprises the operations of storing a first version of a dataset that is derived by executing a first version of driver program associated with the dataset; and storing a first build catalog entry comprising an identifier of the first version of the dataset and comprising an identifier of the first version of the driver program.
Abstract:
Systems and methods are disclosed for news events detection and visualization. In accordance with one implementation, a method is provided for news events detection and visualization. The method includes, for example, obtaining one or more user inputs, determining, based on the user inputs, an entity and a date range, obtaining one or more documents associated with the entity and with dates within the date range, the one or more documents being grouped into one or more clusters, and the clusters being grouped into one or more megaclusters, and presenting the one or more documents on one or more timelines, wherein the one or more documents are grouped into different megaclusters being presented in a visually distinct way. The method further allows for filtering of the one or more clusters based on a value associated with the one or more clusters.
Abstract:
Systems and methods are disclosed for news events detection and visualization. In accordance with one implementation, a method is provided for news events detection and visualization. The method includes, for example, obtaining a document, obtaining from the document a plurality of tokens, obtaining a document vector based on a plurality of frequencies associated with the plurality of tokens, obtaining one or more clusters of documents, each cluster associated with a plurality of documents and a cluster vector, determining a matching cluster from the one or more clusters based at least on the similarity between the document vector and the cluster vector of the matching cluster, and updating a database to associate the document with the matching cluster.
Abstract:
Disclosed herein are systems and computer-implemented methods that include storing a sequence of events, each event associated with a sequence number indicating a temporal position of an event within the sequence of events; further storing one or more read-offsets that are associated with respective consumers and that indicate the sequence number up to which the respective consumers have read events within the sequence of events; determining a smallest read-offset of all read-offsets; compacting events and/or earlier snapshots with sequence numbers smaller than the smallest read-offset into a new snapshot; and replacing, in the sequence of events, the events and/or earlier snapshots with sequence numbers smaller than the smallest read-offset with the new snapshot.
Abstract:
A method is disclosed. The method comprises receiving, from one or more search nodes of a distributed system, one or more requests for log data, the one or more search nodes being associated with one or more hot storage systems; identifying, from an index catalog, an indexed portion of the log data stored in a cold storage system of one or more cold storage systems based on at least part of the particular request, the index catalog containing pointers to indexed portions of the log data in the one or more cold storage systems, the indexing being performed by one or more indexing nodes independently from the receiving by the one or more search nodes; and sending the indexed portion to the one or more search nodes for storage in the associated one or more hot storage systems, wherein the method is performed using one or more processors.
Abstract:
A computer-implemented system and method for data revision control in a large-scale data analytic systems. In one embodiment, for example, a computer-implemented method comprises the operations of storing a first version of a dataset that is derived by executing a first version of driver program associated with the dataset; and storing a first build catalog entry comprising an identifier of the first version of the dataset and comprising an identifier of the first version of the driver program.