Secure information classification
    1.
    发明授权
    Secure information classification 有权
    安全信息分类

    公开(公告)号:US08751424B1

    公开(公告)日:2014-06-10

    申请号:US13327046

    申请日:2011-12-15

    IPC分类号: G06F15/18

    摘要: In one embodiment a method to create a system to manage documents with sensitive or classified content comprises extracting a list of text features enabling interaction with the user developing the system to create a rule-based classifier based on the list of text features and one or more synonymous features, applying the rule-based classifier to one or more selected documents to tag a set of documents with the sensitive or classified information they contain, training a statistical text classifier using the tagged documents generated as a training set, applying the trained statistical text classifier to the training set, and reapplying the refined rule-based classifier to the one or more documents to tag a set of documents with the sensitive or classified information they contain. Other embodiments may be described.

    摘要翻译: 在一个实施例中,创建用于管理具有敏感或分类内容的文档的系统的方法包括提取文本特征列表,使得能够与开发系统的用户进行交互,以便基于文本特征列表和一个或多个 同义词特征,将基于规则的分类器应用于一个或多个所选择的文档以使用其包含的敏感或分类信息来标记一组文档,使用生成为训练集的标记文档来训练统计文本分类器,应用经过训练的统计文本 分类器到训练集合,并且将精细的基于规则的分类器重新应用于一个或多个文档,以使用它们包含的敏感或分类信息标记一组文档。 可以描述其他实施例。

    Automated analysis and summarization of comments in survey response data
    4.
    发明授权
    Automated analysis and summarization of comments in survey response data 有权
    调查回应数据中的意见的自动分析和总结

    公开(公告)号:US08577884B2

    公开(公告)日:2013-11-05

    申请号:US12119697

    申请日:2008-05-13

    IPC分类号: G06F17/30

    CPC分类号: G06Q30/02

    摘要: Technologies are described herein for providing automated analysis and summarization of free-form comments in survey response data. A number of topic words are identified from the survey response comments, and a numeric weight is calculated for each topic word that reflects the relevance of the topic word to each comment. Each topic word is associated with one or more topics and the comments relevant to each topic is then determined based on the weights of the associated topic words in each comment. A report is generated which summarizes the topics and their relative importance in the survey response comments based upon the number of comments relevant to each.

    摘要翻译: 本文描述了技术,用于在调查响应数据中提供自由分析和摘要自由形式的评论。 从调查回应评论中确定了一些主题词,并为每个主题词计算出反映主题词与每个评论的相关性的数字权重。 每个主题词与一个或多个主题相关联,然后基于每个注释中相关联的主题词的权重来确定与每个主题相关的评论。 根据与每个相关的意见数量,总结了调查回应评论中的主题及其相对重要性的报告。

    Query-based text summarization
    5.
    发明授权
    Query-based text summarization 有权
    基于查询的文本摘要

    公开(公告)号:US07752204B2

    公开(公告)日:2010-07-06

    申请号:US11281499

    申请日:2005-11-18

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30719 Y10S707/917

    摘要: A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.

    摘要翻译: 文本摘要器识别文档中的相关术语,对术语加权并提取一个或多个片段以生成摘要或摘要。 特定的各种术语相对于现有的文档集进行加权。 术语权重计算机计算文档中的术语的术语权重,并且阈值比较器比较术语权重以确定相应的术语是否与文档集合相关。 接下来,术语权重加法器在文档的各个段中将每个相关项的每次出现的项权重相加,并且求和比较器比较求和以识别表示文档的文本摘要段。 可选地,相关术语可以在术语汇总段中突出显示。

    Text differentiation methods, systems, and computer program products for content analysis
    6.
    发明授权
    Text differentiation methods, systems, and computer program products for content analysis 有权
    文本分类方法,系统和计算机程序产品进行内容分析

    公开(公告)号:US07403932B2

    公开(公告)日:2008-07-22

    申请号:US11173600

    申请日:2005-07-01

    IPC分类号: G06N5/00

    CPC分类号: G06F17/2211 G06F17/30719

    摘要: Provided are improved methods, apparatus, and computer program products for text differentiation which involves identifying differences between documents with similar content, not merely similar terms, and generating results. Text differentiation provides the ability to find non-similar, or different, content hidden within documents with similar overall content, but not exactly the same content. Text differentiation may be used to quickly identify key differences between similar documents.

    摘要翻译: 提供了用于文本区分的改进的方法,装置和计算机程序产品,其涉及识别具有相似内容的文档之间的差异,而不仅仅是类似的术语,并且产生结果。 文本区分提供了找到隐藏在具有类似总体内容但不完全相同的内容的文档内的不相似或不同的内容的能力。 文本差异可能用于快速识别类似文档之间的关键差异。

    Streaming text data mining method and apparatus using multidimensional subspaces
    8.
    发明授权
    Streaming text data mining method and apparatus using multidimensional subspaces 有权
    使用多维子空间的流文本数据挖掘方法和装置

    公开(公告)号:US08234279B2

    公开(公告)日:2012-07-31

    申请号:US11246195

    申请日:2005-10-11

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: G06F17/30705 G06F17/30616

    摘要: A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.

    摘要翻译: 流文本数据比较器在流文本数据上执行实时文本数据挖掘。 比较器接收流文本数据文档并生成与现有文档集合相关的术语频率的向量表示。 比较器然后将术语频率矢量转换成表示原始文档集合的预计算多维子空间中的投影。 比较器还计算表示向量表示和子空间之间的相似性或差异的关系值,并将关系值与预定阈值进行比较,以确定流文本数据文档是否与原始文档集合相关。 如果流文本数据文档相关,则流文本数据比较器将新文档插入到文档集合中。 如果新文档不相关,则比较器可以存储或删除不相关的文档。

    Method and apparatus for constructing a query based upon concepts associated with one or more search terms
    9.
    发明授权
    Method and apparatus for constructing a query based upon concepts associated with one or more search terms 有权
    基于与一个或多个搜索项相关联的概念构建查询的方法和装置

    公开(公告)号:US09589053B1

    公开(公告)日:2017-03-07

    申请号:US12971799

    申请日:2010-12-17

    IPC分类号: G06F17/30 G06Q30/02

    CPC分类号: G06F17/30864 G06Q30/02

    摘要: A method and apparatus are provided to efficiently generate a fulsome query in order to increase the recall and/or precision provided by the search. A method may construct a query by receiving the one or more initial search terms and then defining a concept for each search term. In order to define a concept, the method may determine if a concept associated with a respective search term has been previously defined. In an instance in which a concept associated with a respective search term has been previously defined, the method at least initially utilizes the previously defined concept. However, in an instance in which a concept associated with a respective search term has not been previously defined, the method constructs the concept based on terms related to the respective search term. The method may then combine the concepts defined for the one or more search terms to generate the query.

    摘要翻译: 提供了一种方法和装置来有效地产生一个fulsome查询,以便增加由搜索提供的召回和/或精确度。 方法可以通过接收一个或多个初始搜索项然后为每个搜索项定义概念来构造查询。 为了定义概念,该方法可以确定与相应搜索项相关联的概念是否已经被预先定义。 在其中先前已经定义与相应搜索项相关联的概念的情况下,该方法至少最初利用先前定义的概念。 然而,在与之前未定义相关搜索项相关联的概念的情况下,该方法基于与相应搜索项相关的术语来构造概念。 然后,该方法可以组合为一个或多个搜索项定义的概念以生成查询。

    Automated rule generation for a secure downgrader
    10.
    发明授权
    Automated rule generation for a secure downgrader 有权
    为安全降级程序自动生成规则

    公开(公告)号:US08272064B2

    公开(公告)日:2012-09-18

    申请号:US11280610

    申请日:2005-11-16

    IPC分类号: H04L29/06

    摘要: A system generates rules for classifying documents are generated by building a vocabulary of features (e.g., words, phrases, acronyms, etc.) that are related to classifying concepts. The system includes a security document reader receives a security document that defines security concepts for a particular project and parses the security document to separate the security concepts. A vocabulary builder receives samples provided by the user that contain information related to the project. For each security concept, the vocabulary builder uses statistical analysis techniques to find features in the samples that are related to that concept. A rule generation assistant, for each security concept, generates rules based on the built vocabulary and the samples. The rule generation assistant uses statistical analysis techniques on the vocabulary and samples to determine features that optimally predict a particular concept. The rules can be used by a downgrader to process information to be distributed.

    摘要翻译: 通过构建与分类概念相关的特征(例如,单词,短语,缩略词等)的词汇表来生成用于分类文档的系统的生成规则。 该系统包括安全文档读取器,其接收定义特定项目的安全概念的安全文档,并解析安全文档以分离安全概念。 词汇构建器接收由用户提供的包含与该项目相关的信息的样本。 对于每个安全概念,词汇构建器使用统计分析技术来查找与该概念相关的样本中的特征。 规则生成助理针对每个安全概念,根据构建的词汇表和样本生成规则。 规则生成助理使用统计分析技术对词汇和样本来确定最佳预测特定概念的特征。 这些规则可以由降级程序用于处理要发布的信息。