摘要:
A system (100) for searching and retrieving documents includes a database (106), a memory device (108), a user interface device (102) and a controller (104). The database (106) stores documents. The memory device (108) stores software, tokens and an index. The software performs methods according to a background routine (118) and a foreground routine (116). Each token (e.g., speed) has related expressions (e.g., miles per hour, mph, kilometers per hour, and kph) assigned thereto that define the token. The index has documents, having an occurrence of one of the related expressions for one of the tokens, assigned to the one of the tokens. The user interface device (102) accepts and sends search queries having a token and receives information related to the documents, having an occurrence of the related expressions for the token, responsive to a user interface process (120). The controller (104) is electrically coupled to the memory device (108), the user interface device (102) and the database (106). The controller (104) manages communications between the memory device (108) and the user interface device (102) responsive to the foreground routine (116) in the software to respond to the search queries having the token. The controller (104) also manages communications between the memory device (108) and the database (106) responsive to the background routine (118) in the software to create the index.
摘要:
A system, method and search engine for searching images for data contained therein. Training images are provided and image attributes are extracted from the training images. Attributes extracted from training images include image features characteristic of a particular numerically generated image type, such as horizontal lines, vertical lines, percentage white area, circular arcs and text. Then, the training images are classified according to extracted attributes and a particular classifier is selected for each group of training images. Classifiers can include classification trees, discriminant functions, regression trees, support vector machines, neural nets and hidden Markov models. Available images are collected from remotely connected computers, e.g., over the Internet. Collected images are indexed and provided for interrogation by users. As a user enters queries, indexed images are identified and returned to the user. The user may provide additional data as supplemental data to the extracted image data. A chart, representative of the supplemented data, may be generated and provided to the user in response to a particular query.
摘要:
A system, method and search engine for searching images for data contained therein. Training images are provided and image attributes are extracted from the training images. Attributes extracted from training images include image features characteristic of a particular numerically generated image type, such as horizontal lines, vertical lines, percentage white area, circular arcs and text. Then, the training images are classified according to extracted attributes and a particular classifier is selected for each group of training images. Classifiers can include classification trees, discriminant functions, regression trees, support vector machines, neural nets and hidden Markov models. Available images are collected from remotely connected computers, e.g., over the Internet. Collected images are indexed and provided for interrogation by users. As a user enters queries, indexed images are identified and returned to the user. The user may provide additional data as supplemental data to the extracted image data. A chart, representative of the supplemented data, may be generated and provided to the user in response to a particular query.
摘要:
A method, device, and computer program product are provided for regular expression learning is provided. An initial regular expression may be received from a user. The initial regular expression is executed over a database. Positive matches and negative matches are labeled. The initial regular expression and the labeled positive and negative matches are input in a transformation process. The transformation process may iteratively execute character class restrictions, quantifier restrictions, negative lookaheads on the initial regular expression to transform the initial regular expression into the pool of candidate regular expressions. The transformation process may execute, one at a time, the character class restrictions, quantifier restrictions, the negative lookaheads. A candidate regular expression is selected from the pool of candidate regular expressions, where the selected candidate regular expression has a best F-Measure out of the pool of candidate regular expressions.
摘要:
A system, method and program storage device implementing a method for modeling a data generating process, wherein the modeling comprises observing a data sequence comprising irregularly sampled data, obtaining an observation sequence based on the observed data sequence, assigning a time index sequence to the data sequence, obtaining a hidden state sequence of the data sequence, and decoding the data sequence based on a combination of the time index sequence and the hidden state sequence to model the data sequence. The method further comprises assigning a probability distribution over time stamp values of the observation sequence, wherein the decoding comprises using a Hidden Markov Model. The method further comprises using an expectation maximization methodology to learn the Hidden Markov Model.
摘要:
Disclosed are embodiments of a method for online analytic processing of queries and, and more particularly, of a method that extends the on-line analytic processing (OLAP) data model to represent data ambiguity, such as imprecision and uncertainty, in data values. Specifically, the embodiments of the method incorporate a statistical model that allows for uncertain measures to be modeled as conditional probabilities. Additionally, an embodiment of the method further identifies natural query properties (e.g., consistency and faithfulness) and uses them to shed light on alternative query semantics. Lastly, an embodiment of the method further introduces an allocation-based approach to the semantics of aggregation queries over such data.
摘要:
A text annotation structured storage system stores text annotations with associated type information in a structured data store. The present system persists or stores annotations in a structured data store in an indexable and queryable format. Exemplary structured data stores comprise XML databases and relational databases. The system exploits type information in a type system to develop corresponding schemas in a structured data model. The system comprises techniques for mapping annotations to an XML data model and a relational data model. The system captures various features of the type system, such as complex types and inheritance, in the schema for the persistent store. In particular, the repository provides support for path navigation over the hierarchical type system starting at any type.
摘要:
Disclosed is a system, method, and program storage device of aggregating opinions comprising consolidating a plurality of expressed opinions on various dimensions of topics as discrete probability distributions, generating an aggregate opinion as a single point probability distribution by minimizing a sum of weighted divergences between a plurality of the discrete probability distributions, and presenting the aggregate opinion as a Bayesian network, wherein the divergences comprise Kullback-Liebler distance divergences, and wherein the expressed opinions are generated by experts and comprise opinions on sentiments of products and services. Moreover, the aggregate opinion predicts success of the products and services. Furthermore, the experts are arranged in a hierarchy of knowledge, wherein the knowledge comprises the various dimensions of topics for which opinions may be expressed upon.
摘要:
Given a log of previous web-surfer behavior listing the order in which each surfer downloaded specific items at the web site, and given a meaningful classification of those same items, future surfer behavior is predicted by the present invention. The algorithm utilizes a quantitative model relating items downloaded prior to some specified event to items downloaded after that same event. When the model is applied to a new surfer's session prior to an analogous event, the present invention predicts the likely behavior of the surfer subsequent to that event. The predicted behavior is then further analyzed to derive a quantitative value for the utility of the expected behavior. By randomly selecting sample sessions from a web log, multiple models of surfer behavior can be generated. The multiple models can then be applied to a new surfer's session to produce a predicted behavior/utility distribution and thus a confidence interval for the predicted behavior/utility.
摘要:
A method and apparatus for visualizing a multi-dimensional data set in which the multi-dimensional data set is clustered into k clusters, with each cluster having a centroid. Then, either two distinct current centroids or three distinct non-collinear current centroids are selected. A current 2-dimensional cluster projection is generated based on the selected current centroids. In the case when two distinct current centroids are selected, two distinct target centroids are selected, with at least one of the two target centroids being different from the two current centroids. In the case when three distinct current centroids are selected, three distinct non-collinear target centroids are selected, with at least one of the three target centroids being different from the three current centroids. An intermediate 2-dimensional cluster projection is generated based on a set of interpolated centroids, with each interpolated centroid corresponding to a current centroid and to a target centroid associated with the current centroid. Each interpolated centroid is interpolated between the corresponding current centroid and the target centroid associated with the current centroid. Alternatively, the intermediate 2-dimensional cluster projection is generated based on an interpolated 2-dimensional nonlinear cluster projection that is based on the selected current centroids and the selected target centroids.