Abstract:
Suggesting query items based on database fields is described. A database system receives a character sequence entered in a search box. The database system identifies a first distribution of first field-based items that include the character sequence, and a second distribution of second field-based items that include the character sequence. The database system identifies a first item based on combining the first distribution with a distribution of queried fields, and a second item based on combining the second distribution with the distribution of queried fields. The database system outputs the first item and the second item to a location associated with the search box. The database system executes a search based on any requested item, in response to receiving a request to search for any item output to the location associated with the search box.
Abstract:
Systems and methods are provided for atomic transactions in a NoSQL database. A system writes a pending transaction identifier to write claim data for a first data item in a NoSQL database in response to a determination that the write claim data for the first data item includes a first previous transaction identifier included in last commit data for the first data item. The system writes the pending transaction identifier and a pending commit identifier to the last commit data for the first data item. The system writes a first value associated with a pending transaction to the first data item. The system aborts the pending transaction in response to a determination that the write claim data for the first data item does not include the first previous transaction identifier included in the last commit data for the first data item.
Abstract:
Systems and methods are provided for controlling access to data of heterogeneous origin. A system creates combined access rights from access rights and other access rights for combined data that includes data and other data. The system receives a request to access data that is part of the combined data. The system determines whether to provide access to at least part of the data based on access rights that are part of the combined access rights. The system provides access to at least part of the data in response to a determination to provide access to at least part of the data based on the access rights that are part of the combined access rights.
Abstract:
Methods and systems are provided for evaluating standing queries against updated contact entries configured as a stream of facts. The method includes resolving the standing queries into an array of rules, each rule having a first and a second condition; sorting t one of the facts into a first property and a second property; comparing the first property of the fact to the first condition of each rule in the array of rules to produce a first subset of matching rules; comparing the second property of the fact to the second condition of each rule in the first subset of rules to produce a second subset of matching rules; and reporting at least one of the second subset of rules to an author of the matching rule. The method further includes populating a first hash with indicia of the first subset, and populating a second hash with the second subset.
Abstract:
The technology disclosed describes systems and methods for generating feature vectors from resource description framework (RDF) graphs. Machine learning tasks frequently operate on vectors of features. Available systems for parsing multiple documents often generate RDF graphs. Once a set of interesting features to be considered has been established, the disclosed technology describes systems and methods for generating feature vectors from the RDF graphs for the documents. In one example setting, a machine learning system can use generated feature vectors to determine how interesting a news article might be, or to learn information-of-interest about a specific subject reported in multiple articles. In another example setting, viable interview candidates for a particular job opening can be identified using feature vectors generated from a resume database, using the disclosed systems and methods for generating feature vectors from RDF graphs.
Abstract:
The technology disclosed relates to improving parallel functional processing using abstractions and methods defined based on category theory. In particular, the technology disclosed provides a range of useful categorical functions for processing large data sets in parallel. These categorical functions manage all phases of distributed computing, including dividing a data set into subsets of approximately equal size and combining the results of the subset calculations into a final result, while hiding many of the low-level programming details. These categorical functions are extraordinarily well-ordered and have a sophisticated type system and type inference, which allows for generating maps and reducing them in an elegant and succinct way using concise and expressive programs that can significantly efficientize a whole software development process.
Abstract:
A system determines a count of each item in each item set, sorts each count into ascending order, assigns an ascending identifier to each item corresponding to each sorted count, and sorts each identifier in each item set in descending order. The system partitions item sets into a first group of item sets and a second group of item sets, each item set in the first group including a common largest identifier, determines a count for each subset of each item set of the first group, and determines a count of each subset of each item set by summing each count for each subset of each item set of the first group with each corresponding count for each corresponding subset of each item set of the second group. The system outputs a recommended item set based on the count of each subset of each item set.
Abstract:
An error checking technique for database records. A record is selected and its entities are compared with the entities of other records stored in the database to determine a likelihood that the labels associated with the entities of the selected record are correct. The likelihood for each entity of the selected record being correctly labeled can be determined by comparing the number of times that the entity appears in the database records with that label to the number of times that the entity appears in the database records with any other label. If the likelihood does not exceed a threshold, then an error is likely, and action can be taken to correct the record.
Abstract:
The technology disclosed relates to methods for partitioning sets of features for a Bayesian classifier, finding a data partition that makes the classification process faster and more accurate, while discovering and taking into account feature dependence among sets of features in the data set. It relates to computing class entropy scores for a class label across all tuples that share the feature-subset and arranging the tuples in order of non-decreasing entropy scores for the class label, and constructing a data partition that offers the highest improvement in predictive accuracy for the data set. Also disclosed is a method for partitioning a complete set of records of features in a batch computation, computing increasing predictive power; and also relates to starting with singleton partitions, and using an iterative process to construct a data partition that offers the highest improvement in predictive accuracy for the data set.
Abstract:
Some embodiments of the present invention include determining if updates performed by a second user include a systematic change such as a reversal of an update previously performed by a first user within a time window. The reversal is associated with a record of data used by a gamification application executing in a computer system. A time delay is introduced between the update performed by the second user and rewarding the second user if the update performed by the second user includes the reversal within the time window. An update history of the first user and the second user is evaluated to identify pattern of reversals associated with similar records within the time window. The second user is prevented from being rewarded based on identifying that there are patterns of reversals from the update history occurring within the time window.