摘要:
In one embodiment a method to create a system to manage documents with sensitive or classified content comprises extracting a list of text features enabling interaction with the user developing the system to create a rule-based classifier based on the list of text features and one or more synonymous features, applying the rule-based classifier to one or more selected documents to tag a set of documents with the sensitive or classified information they contain, training a statistical text classifier using the tagged documents generated as a training set, applying the trained statistical text classifier to the training set, and reapplying the refined rule-based classifier to the one or more documents to tag a set of documents with the sensitive or classified information they contain. Other embodiments may be described.
摘要:
A method, apparatus and computer program product are provided for identifying a part name within a data record. A maintenance expression is initially identified within a data record and a candidate part name string is then identified by identifying a head noun within a window that is positioned within the data record based upon the expression. In addition to identifying the head noun, a modifier may also be identified adjacent to or near any occurrence of the head noun in the course of identifying the candidate part name string. The candidate part name string may then be separately matched to respective ones of a plurality of standard names with each of a plurality of string matching techniques. The resulting potential matches are then analyzed to determine a best match.
摘要:
Methods and maintenance systems for use in analyzing data related to maintenance of at least one vehicle are disclosed. One example method includes receiving at least one fault message, receiving a maintenance event log for a vehicle. The maintenance event log including at least one maintenance event associated with the at least one fault message. The example method further includes automatically identifying a corrective action within a most recent maintenance event of the at least one maintenance event and storing a diagnostic entry including the at least one fault message and the identified corrective action, such that the diagnostic entry is retrievable from the computing device to evaluate a subsequent like fault message.
摘要:
Technologies are described herein for providing automated analysis and summarization of free-form comments in survey response data. A number of topic words are identified from the survey response comments, and a numeric weight is calculated for each topic word that reflects the relevance of the topic word to each comment. Each topic word is associated with one or more topics and the comments relevant to each topic is then determined based on the weights of the associated topic words in each comment. A report is generated which summarizes the topics and their relative importance in the survey response comments based upon the number of comments relevant to each.
摘要:
A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.
摘要:
Provided are improved methods, apparatus, and computer program products for text differentiation which involves identifying differences between documents with similar content, not merely similar terms, and generating results. Text differentiation provides the ability to find non-similar, or different, content hidden within documents with similar overall content, but not exactly the same content. Text differentiation may be used to quickly identify key differences between similar documents.
摘要:
Methods and maintenance systems for use in analyzing data related to maintenance of at least one vehicle are disclosed. One example method includes receiving at least one fault message, receiving a maintenance event log for a vehicle. The maintenance event log including at least one maintenance event associated with the at least one fault message. The example method further includes automatically identifying a corrective action within a most recent maintenance event of the at least one maintenance event and storing a diagnostic entry including the at least one fault message and the identified corrective action, such that the diagnostic entry is retrievable from the computing device to evaluate a subsequent like fault message.
摘要:
A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.
摘要:
A method and apparatus are provided to efficiently generate a fulsome query in order to increase the recall and/or precision provided by the search. A method may construct a query by receiving the one or more initial search terms and then defining a concept for each search term. In order to define a concept, the method may determine if a concept associated with a respective search term has been previously defined. In an instance in which a concept associated with a respective search term has been previously defined, the method at least initially utilizes the previously defined concept. However, in an instance in which a concept associated with a respective search term has not been previously defined, the method constructs the concept based on terms related to the respective search term. The method may then combine the concepts defined for the one or more search terms to generate the query.
摘要:
A system generates rules for classifying documents are generated by building a vocabulary of features (e.g., words, phrases, acronyms, etc.) that are related to classifying concepts. The system includes a security document reader receives a security document that defines security concepts for a particular project and parses the security document to separate the security concepts. A vocabulary builder receives samples provided by the user that contain information related to the project. For each security concept, the vocabulary builder uses statistical analysis techniques to find features in the samples that are related to that concept. A rule generation assistant, for each security concept, generates rules based on the built vocabulary and the samples. The rule generation assistant uses statistical analysis techniques on the vocabulary and samples to determine features that optimally predict a particular concept. The rules can be used by a downgrader to process information to be distributed.