Abstract:
Systems and methods are provided for identifying relevant information for an entity, referred to as a seed entity. A plurality of search queries can be generated each comprising a property of a seed entity or one of the entities associated with the seed entity (seed-linked entities). Preferably, a collection of search queries includes ones representing different properties of the seed entity and properties of different seed-linked entities. Optionally, the collection of search queries is optimized to reduce search burden. Searches can then be conducted with the search queries in one or more data sources to obtain a plurality of search results, wherein each search result comprises a hit entity and one or more entities associated with the hit entity (hit-linked entity). For each of the search results, a score can be determined taking as input (a) likelihood of match between the seed entity and the hit entity or between a seed-linked entity and a hit-linked entity, (b) presence of a new entity in the search result not present in the search queries or a difference between the new entity and an entity present in the search queries, and (c) characteristic of the new entity in the search result. Based on the scores, high priority search results can be presented a user for further analysis.
Abstract:
Systems and methods are validating data in a data set. A data set including data to validate and a validator to use in validating the data is selected based on user input generated based on interactions of a user with a graphical user interface. The validator is applied to the data to determine whether one or more statistics generated through application of the validator to the data is valid or invalid based on a validation routine associated with the validator. A data quality report indicating whether the data set is valid or invalid, based on a determination of whether the one or more statistics is valid or invalid, is generated and selectively presented to the user through the graphical user interface.
Abstract:
Systems and methods are validating data in a data set. A data set including data to validate and a validator to use in validating the data is selected based on user input generated based on interactions of a user with a graphical user interface. The validator is applied to the data to determine whether one or more statistics generated through application of the validator to the data is valid or invalid based on a validation routine associated with the validator. A data quality report indicating whether the data set is valid or invalid, based on a determination of whether the one or more statistics is valid or invalid, is generated and selectively presented to the user through the graphical user interface.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for a feature clustering of users, user correlation database access, and user interface generation system. The system can obtain information stored in different databases located across geographic regions, and determine unique users from the different information. The information can be included in unique records in the databases, with each record describing a particular user, and with each user described with imperfect identifying information. The system can analyze the different information utilizing machine learning models, and can associate each record with a particular unique user. The system can obtain identifications of items associated with each user, and determine the propensity of the user to disassociate with one or more items, or determine likelihoods of future association with different items not presently associated with the user.
Abstract:
A computer-based crime risk forecasting system and corresponding method are provided for generating crime risk forecasts and conveying the forecasts to a user. With the conveyed forecasts, the user can more effectively gauge both the level of increased crime threat and its potential duration. The user can then leverage the information conveyed by the forecasts to take a more proactive approach to law enforcement in the affected areas during the period of increased crime threat.
Abstract:
Systems and methods are disclosed for systems and user interfaces for rapid analysis of viewership information. One of the methods includes accessing databases storing viewership information associated with segments, with each segment being associated with common features of viewers. Measures of association between the segment and content items are maintained for each segment. An interactive user interface is presented via a user device, the interactive user interface enabling creation of a customized viewing audience. The interactive user interface receives user input indicating a segment, identifies similar segments based on associations between features of the segment and of other segments, and presents the identified segments. Analysis information associated with the segments is presented for at least one of the one or more segments, with the segments being included in the customized viewing audience.
Abstract:
Systems and methods are provided for selecting training examples to increase the efficiency of supervised active machine learning processes. Training examples for presentation to a user may be selected according to measure of the model's uncertainty in labeling the examples. A number of training examples may be selected to increase efficiency between the user and the processing system by selecting the number of training examples to minimize user downtime in the machine learning process.
Abstract:
Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
Abstract:
Systems and methods are disclosed for systems and user interfaces for rapid analysis of viewership information. One of the methods includes accessing databases storing viewership information associated with segments, with each segment being associated with common features of viewers. Measures of association between the segment and content items are maintained for each segment. An interactive user interface is presented via a user device, the interactive user interface enabling creation of a customized viewing audience. The interactive user interface receives user input indicating a segment, identifies similar segments based on associations between features of the segment and of other segments, and presents the identified segments. Analysis information associated with the segments is presented for at least one of the one or more segments, with the segments being included in the customized viewing audience.
Abstract:
Systems, methods, and non-transitory computer readable media are provided for labeling depictions of objects within images. An image may be obtained. The image may include a depiction of an object. A user's marking of a set of dots within the image may be received. The set of dots may include one or more dots. The set of dots may be positioned within or near the depiction of the object. The depiction of the object within the image may be labeled based on the set of dots.