摘要:
A method for finding clusters of units in high-dimensional data having the steps of determining dense units in selected subspaces within a data space of the high-dimensional data, determining each cluster of dense units that are connected to other dense units in the selected subspaces within the data space, determining maximal regions covering each cluster of connected dense units, determining a minimal cover for each cluster of connected dense units, and identifying the minimal cover for each cluster of connected dense units.
摘要:
A computer-implemented method, apparatus, and article of manufacture that constructs graph models from logs of past, unstructured executions of the given process. The graph model so produced conforms to the dependencies and past executions present in the log. By providing graph models that capture the previous executions of the process, this technique allows easier introduction of a workflow system and evaluation and evolution of existing processes.
摘要:
A user can easily organize computerized document folders by associating a few sample documents in the document database with each folder. The present invention learns folder profiles based on the sample documents and moves the remaining documents into the folders accordingly. In this way, the user can construct new folders, or rearrange existing folders, or cause the computer to automatically rearrange and maintain the folders. This is particularly useful for managing a database of perhaps thousands of emails.
摘要:
A dense data-set mining system and method is provided that directly exploits all user-specified constraints including minimum support, minimum confidence, and a new constraint, known as minimum gap, which prunes any rule having conditions that do not contribute to its predictive accuracy. The method maintains efficiency even at low supports on data that is dense in the sense that many items appear with high frequency (e.g. relational data).
摘要:
A system, process, and article of manufacture for organizing a large text database into a hierarchy of topics and for maintaining this organization as documents are added and deleted and as the topic hierarchy changes. Given sample documents belonging to various nodes in the topic hierarchy, the tokens (terms, phrases, dates, or other usable feature in the document) that are most useful at each internal decision node for the purpose of routing new documents to the children of that node are automatically detected. Using feature terms, statistical models are constructed for each topic node. The models are used in an estimation technique to assign topic paths to new unlabeled documents. The hierarchical technique, in which feature terms can be very different at different nodes, leads to an efficient context-sensitive classification technique. The hierarchical technique can handle millions of documents and tens of thousands of topics. A resulting taxonomy and path enhanced retrieval system (TAPER) is used to generate context-dependent document indexing terms. The topic paths are used, in addition to keywords, for better focused searching and browsing of the text database.
摘要:
A tree structure has a node associated with each category of a hierarchy of item categories. Child nodes of the tree are associated with sub-categories of the categories associated with parent nodes. Training data including received queries and indicators of a selected item category for each received query is combined with the tree structure by associating each query with the node corresponding to the selected category of the query. When a query is received, a classifier is applied to the nodes to generate a probability that the query is intended to match an item of the category associated with the node. The classifier is applied until the probability is below a threshold. One or more categories associated with the nodes that are closest to the intent of the received query are selected and indicators of items of those categories that match the received query are output.
摘要:
Methods, computer-readable storage media, and systems are provided to facilitate visually distinguishing common attributes of users an electronic communication network or messaging service. In particular, user profile attributes are compared between a first and second user, and similar attributes are visually highlighted by assigning, for example, a distinct font, font size, color, font effect, and/or other visual effect to the user's screen name to designate which attributes are similar. In addition, or alternatively, when the first user views a user profile of the second user, common user attributes are visually highlighted. In one embodiment, the font, font size, color, and/or font effect assigned to the highlighted attribute indicates a degree of similarity of the attribute. Such implementations may allow users to more easily recognize and interact with others that have similar interests and attributes.
摘要:
An implementation wherein RFID data is shared across independent organizations has been addressed. RFID data is usually spread across different parties, e.g. enterprises in a supply chain and thus, efficient query processing across all parties is required. Traceability is emerging as one of the key applications of RFID technology. A generic data model is introduced for querying RFID data across a network of independently operated data sources. The model can be used to facilitate traceability query processing and give a set of representative traceability queries. A newly designed process-and-forward approach is implemented for executing traceability queries.
摘要:
As provided herein objects from a source catalog, such as a provider's catalog, can be added to a target catalog, such as an enterprise master catalog, in a scalable manner utilizing catalog taxonomies. A baseline classifier determines probabilities for source objects to target catalog classes. Source objects can be assigned to those classes with probabilities that meet a desired threshold and meet a desired rate. A classification cost for target classes can be determined for respective unassigned source objects, which can comprise determining an assignment cost and separation cost for the source objects for respective desired target classes. The separation and assignment costs can be combined to determine the classification cost, and the unassigned source objects can be assigned to those classes having a desired classification cost.
摘要:
Techniques are disclosed herein for providing a custom search engine. In one aspect, a first search query is received from a requestor. First search results contain search result items that match the first search query are obtained. A least one sub-query is generated from the first search results. The generating is based on rules for a particular custom search engine. Second search results that match the sub-query are then obtained. A search result set is formed from a corpus that includes the first search results and the second search results. The generating of the search result set is based on the rules for the particular custom search engine. The search result set is provided to the requester. In one aspect an interface for designing a custom search engine is provided. The interface allows the designer to specify the layout of a search results page.