Abstract:
A plurality of description phrases associated with a first domain may be determined, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain. An entity associated with the first domain may be obtained. An analysis of a second plurality of documents may be initiated to identify co-occurrences of mentions of the obtained entity and one or more of the plurality of description phrases, and contexts associated with each of the co-occurrences of the mentions and description phrases, in each one of the second plurality of documents. A description tag association between the obtained entity and one of the description phrases may be determined, based on an analysis of the identified contexts.
Abstract:
Described is a constraint language and related technology by which complex constraints may be used in selecting configurations for use in physical database design tuning. The complex constraint (or constraints) is processed, e.g., in a search framework, to determine and output at least one configuration that meets the constraint, e.g., a best configuration found before a stopping condition is met. The search framework processes a current configuration into candidate configurations, including by searching for candidate configurations from a current configuration based upon a complex constraint, iteratively evaluating a search space until a stopping condition is satisfied, using transformation rules to generate new candidate configurations, and selecting a best candidate configuration. Transformation rules and pruning rules are applied to efficiently perform the search. Constraints may be specified as assertions that need to be satisfied, or as soft assertions that come close to satisfying the constraint.
Abstract:
This patent application relates to taxonomy editing. One implementation involves a taxonomy editor configured to generate a visual representation of a taxonomy associated with a set of scientific papers. The taxonomy editor includes a properties module configured to identify properties relating to an individual node of the taxonomy and a statistics module configured to determine trends relating to the individual node. The taxonomy editor further includes a similarity module configured to evaluate keyword similarity relative to individual scientific papers associated with the individual node. The taxonomy editor also includes a suggestion module configured to utilize the properties, the trends and the keyword similarity to identify potential modifications to the taxonomy. The taxonomy editor is further configured to present at least some of the potential modifications, the properties, the trends, and the keyword similarity concurrently with the visual representation of the taxonomy.
Abstract:
This patent application relates to foreign-key detection. One implementation obtains a set of data tables. This implementation automatically determines foreign-key relationships of columns from separate tables of the set.
Abstract:
A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.
Abstract:
Techniques for error-tolerant autocompletion are described. While displaying characters of an input string as they are inputted by a user, when a character is added to the input string by the user, matching strings may be selected from among a set of candidate strings by determining which of the candidate strings have a prefix whose characters match the characters of the input string within a given edit distance of the input string.
Abstract:
Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.
Abstract:
Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.
Abstract:
The described implementations relate to filtered index recommendations. In one case a filtered index recommendation (FIR) tool is configured to recommend a final set of filtered indexes to use with a workload. The final set is selected from a first set of candidate filtered indexes and a second set of merged filtered indexes.
Abstract:
A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.