摘要:
A computer system includes a database configured to receive a query and to produce a list of User IDs and an anonymization module. The anonymization module is configured to receive a list of user IDs in response to a query, the list of user IDs defining a true user count, generate a noisy user count of the list of user IDs, compare the true user count to a first threshold value stored in memory, compare the noisy user count to a second threshold value stored in memory, and output the noisy user count only if the true user count is greater than the first threshold value and the noisy user count is greater then the second threshold.
摘要:
Data analysis of altered data includes analyzing (64) a test data set (14) with a data analysis technique using one or more configured processors (30) which create one or more analytical measures, and the test data set selected from an altered data set (12) according to a confidence score. At least one reliability measure of the one or more analytical measure is calculated using the configured one or more processors based on similarity of the one or more analytical measures and same analytic measures created from the data analysis technique applied to one or more reliability test data sets (16, 18) selected from the altered data set according to different confidence scores.
摘要:
The present application relates to the field of computer technologies, and in particular, to a stop word identification method used in an information retrieval system. In a stop word identification method, after a first query input by a user is acquired, a second query that belongs to a same session as the first query is acquired, and a stop word in the first query is identified according to a change-based feature of each word in the first query relative to the second query. According to the solution provided by the present application, a stop word in a query can be identified more accurately, and efficiency and precision of an information retrieval system are improved.
摘要:
Systems and methods for using disparate data sets to attribute data to an entity are disclosed. Disparate data sets can be obtained from a variety of data sources. The disclosed systems and methods can obtain a first and second data set. Trajectories can represent multiple data records in a data set associated with an entity. Trajectories from the obtained data sets can be used to associate data stored among the various data sets. The association can be based on the agreement between the trajectories. The associated data records can further be used to associate the entities related to the associated data records.
摘要:
There is provided a method that includes (a) receiving an inquiry to initiate a search for data for a specific individual, (b) determining, based on the inquiry, a strategy and flexible predictiveness equations to search a reference database, (c) searching the reference database, in accordance with the strategy, for a match to the inquiry; and (d) outputting the match. The method may also output flexible feedback related to the match that reflects inferred quality of the match experience which can be used by an end-user to determine the degree to which the matched entity meets that end-user's quality-based criteria. There is also provided a system that performs the method, and a storage medium that contains instructions that control a processor to perform the method.
摘要:
The present invention proposes a method for data mining, the method comprising: making statistics of the feature vectors of each target object according to the records in a target data set so as to constitute a rough data set, each of the feature vectors including the value of at least one attribute data of the target objects corresponding thereto; screening the feature vectors which correspond to all known the first type of target objects from the rough data set, and performing a filter operation onto the screened feature vectors to obtain samples; and building a regression model based on the samples, and then using the built regression model to determine whether each of all known the second type of target objects potentially belongs to the first type of target objects. The method for data mining disclosed in the present invention is capable of mining and classifying the target objects according to the comprehensive features of the target objects.
摘要:
A system, method and computer program product for enabling light weight table comparison with high-accuracy (high confidence) of tables where one is a copy of the other, which copy may be maintained synchronized by replication. The method performs database comparison using a sample-based, statistics-based, or materialized query tables-based approaches. The method first identifies a block comprising a sub-set of rows of data of a source database table and a corresponding block from a target database table, and obtains a statistical value associated with each block. Then the statistical values for the corresponding source and target block are compared and a consistency evaluation of source and target database is determined based on comparing results. Further methods enable a determination of the data as being persistent or not in manner that accounts for real-time data modifications to underlying source and target database tables while identified blocks are being compared.
摘要:
In relation to associating records across lists, wherein the lists include a plurality of records and the plurality of records is associated with a respective entity, a system and method are provided. In accordance with some embodiments, the systems and methods further comprise grouping one or more records from a first list into a first group based on fields of the records in the first list, grouping one or more records from a second list into a second group based on fields of the records in the second list, pairing a record from the first group with a record from the second group, assessing each pair of records based on an evaluation of the respective pair according to fields of the pair, and associating records from the first group and records of the second group with an entity based on the assessment.
摘要:
In one aspect, data, such as information and/or articles, is sorted and prioritized based on a plurality of factors, such as user interest and popularity of data with respect to other users. The data is sorted by initial personal (i.e., user) data, sorted by the most relevant to the user, while passive interaction data is used to continually reorder the articles in real-time, while new stories are being injected into the stream in real time, all while other articles are increasing/decreasing in stature based on popularity with regard to other users and time decay. As such, the system provides that the information is fed to users in an efficient manner, in a manner based on time relevance, assumed interest with regard to that given user based on past actions by that user or information otherwise known about that user, as well as interest in the articles demonstrated by other users.
摘要:
A computer system includes a database configured to receive a query and to produce a list of User IDs and an anonymization module. The anonymization module is configured to receive a list of user IDs in response to a query, the list of user IDs defining a true user count, generate a noisy user count of the list of user IDs, compare the true user count to a first threshold value stored in memory, compare the noisy user count to a second threshold value stored in memory, and output the noisy user count only if the true user count is greater than the first threshold value and the noisy user count is greater then the second threshold.