摘要:
Methods, apparatus and computer program products are provided for retrieving information from a text data collection and for classifying a document into none, one or more of a plurality of predefined classes. In each aspect, a representation of at least a portion of the original matrix is projected into a lower dimensional subspace and those portions of the subspace representation that relate to the term(s) of the query are weighted following the projection into the lower dimensional subspace. In order to retrieve the documents that are most relevant with respect to a query, the documents are then scored with documents having better scores being of generally greater relevance. Alternatively, in order to classify a document, the relationship of the document to the classes of documents is scored with the document then being classified in those classes, if any, that have the best scores.
摘要:
A text mining program is provided that allows a user to perform text mining operations, such as: information retrieval, term and document visualization, term and document clustering, term and document classification, summarization of individual documents and groups of documents, and document cross-referencing. This is accomplished by representing the text of a document collection using subspace transformations. This subspace transformation representation is performed by: constructing a term frequency matrix of the term frequencies for each of the documents, transforming the term frequencies for statistical purposes, and projecting the documents or the terms into a lower dimensional subspace. As the document collection is updated, the subspace is dynamically updated to reflect the new document collection.
摘要:
A plurality of data from a first coordinate system is transformed into a plurality of metadata, each metadata comprising a location identifier and a value summarizing an amount of data points in the first coordinate system associated with a corresponding location in the second coordinate identified by the location identifier. A metadata is formed only when a non-zero value is assigned to a location.
摘要:
A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.
摘要:
An ultrasonic stimulus pulse is emitted incident to a laminar structure and recorded as pulse data. Echoes resulting from the stimulus pulse are recorded as echo data. One or more vectors are derived by way of time-shifting the recorded pulse data by respective amounts and a matrix Φ is defined including the one or more vectors. An echo vector Y is defined using the recorded echo data. A solution vector X is determined in accordance with: Y=Φ*X, typically within a predetermined tolerance. B-scan display or other analysis of one or more distinct solution vectors enables user and/or automated identification and measurement of any anomalies within the laminate material.
摘要:
A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.
摘要:
A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.
摘要:
Provided are improved methods, apparatus, and computer program products for text differentiation which involves identifying differences between documents with similar content, not merely similar terms, and generating results. Text differentiation provides the ability to find non-similar, or different, content hidden within documents with similar overall content, but not exactly the same content. Text differentiation may be used to quickly identify key differences between similar documents.
摘要:
A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.
摘要:
Provided are improved methods, apparatus, and computer program products for text differentiation which involves identifying differences between documents with similar content, not merely similar terms, and generating results. Text differentiation provides the ability to find non-similar, or different, content hidden within documents with similar overall content, but not exactly the same content. Text differentiation may be used to quickly identify key differences between similar documents.