Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree
    1.
    发明授权
    Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree 失效
    用于通过使用语义数据树获得用字符串模式表示替换相似字符的改进的文本相似性度量的方法

    公开(公告)号:US07945525B2

    公开(公告)日:2011-05-17

    申请号:US11937550

    申请日:2007-11-09

    IPC分类号: G06F17/00

    摘要: The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.

    摘要翻译: 本发明的实施例提供了用于获得改进的文本相似性度量的方法。 更具体地说,一种测量至少两个电子文档之间的相似性的方法,首先是识别电子文档之间的类似术语。 这包括在模式上的类似术语之间的基础相似性,其中模式可以包括字模式,字母模式,数字模式和/或字母数字模式。 类似术语的识别还包括识别电子文档之间的多种模式类型。 此外,模式上的相似性的基础确定电子文档内的层次结构类别内的术语。 具体地,术语的识别审查分层数据树,其中树的节点表示电子文档内的术语。 树的下层节点有特定的术语; 并且其中树的较高节点具有一般术语。

    METHODS FOR OBTAINING IMPROVED TEXT SIMILARITY MEASURES
    2.
    发明申请
    METHODS FOR OBTAINING IMPROVED TEXT SIMILARITY MEASURES 失效
    用于获取改进的文本相似性度量的方法

    公开(公告)号:US20090125805A1

    公开(公告)日:2009-05-14

    申请号:US11937550

    申请日:2007-11-09

    IPC分类号: G06F17/24

    摘要: The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.

    摘要翻译: 本发明的实施例提供了用于获得改进的文本相似性度量的方法。 更具体地说,一种测量至少两个电子文档之间的相似性的方法,首先是识别电子文档之间的类似术语。 这包括在模式上的类似术语之间的基础相似性,其中模式可以包括字模式,字母模式,数字模式和/或字母数字模式。 类似术语的识别还包括识别电子文档之间的多种模式类型。 此外,模式上的相似性的基础确定电子文档内的层次结构类别内的术语。 具体地,术语的识别审查分层数据树,其中树的节点表示电子文档内的术语。 树的下层节点有特定的术语; 并且其中树的较高节点具有一般术语。

    DATA OBFUSCATION OF TEXT DATA USING ENTITY DETECTION AND REPLACEMENT
    4.
    发明申请
    DATA OBFUSCATION OF TEXT DATA USING ENTITY DETECTION AND REPLACEMENT 失效
    使用实体检测和替换的文本数据的数据欺骗

    公开(公告)号:US20080118150A1

    公开(公告)日:2008-05-22

    申请号:US11562559

    申请日:2006-11-22

    IPC分类号: G06K9/34

    摘要: Data obfuscation of text data using entity detection and replacement A data obfuscation method, apparatus and computer program product are disclosed in which at least selected text entities such as words or abbreviations in a document are obfuscated to prevent the disclosure of private information if the document is disclosed. A user establishes various configuration parameters for selected text entities desired to obfuscated. The document is processed and text entities matching the configuration parameters are tagged for obfuscation. The tagged entities are then substituted in the document with obfuscating text. The obfuscating text can be derived from a hash table. The hash table may be used to provide a reverse obfuscation method by which original data can be restored to an obfuscated document.

    摘要翻译: 使用实体检测和替换的文本数据的数据模糊公开了一种数据混淆方法,装置和计算机程序产品,其中至少选择的文本实体(例如文档中的单词或缩写)被模糊以防止如果文档是 披露 用户为所需的模糊化文本实体建立各种配置参数。 文档被处理,与配置参数匹配的文本实体被标记为混淆。 标记的实体然后在文档中用混淆文本替换。 混淆文本可以从散列表导出。 哈希表可以用于提供反向混淆方法,通过该方法可以将原始数据恢复到混淆的文档。

    AUTOMATICALLY ASSESSING DOCUMENT QUALITY FOR DOMAIN-SPECIFIC DOCUMENTATION
    6.
    发明申请
    AUTOMATICALLY ASSESSING DOCUMENT QUALITY FOR DOMAIN-SPECIFIC DOCUMENTATION 审中-公开
    自动评估文件特定文档的质量

    公开(公告)号:US20120123767A1

    公开(公告)日:2012-05-17

    申请号:US12944970

    申请日:2010-11-12

    IPC分类号: G06F17/27

    CPC分类号: G06F17/274

    摘要: Methods and arrangements for document quality assessment. Documents are accepted and a quality specification containing predetermined quality criteria is assimilated. Each document is assessed based on the predetermined quality criteria, and a quality score is assigned to each document, the quality score being a function of positive and negative attributes assessed for each document.

    摘要翻译: 文件质量评估的方法和安排 文件被接受,含有预定质量标准的质量规范被吸收。 每个文件根据预定的质量标准进行评估,并且将质量分数分配给每个文档,质量得分是对每个文档评估的正和负属性的函数。

    Data obfuscation of text data using entity detection and replacement
    8.
    发明授权
    Data obfuscation of text data using entity detection and replacement 失效
    使用实体检测和替换对文本数据进行数据混淆

    公开(公告)号:US07724918B2

    公开(公告)日:2010-05-25

    申请号:US11562559

    申请日:2006-11-22

    IPC分类号: G06K9/00

    摘要: A data obfuscation method, apparatus and computer program product are disclosed in which at least selected text entities such as words or abbreviations in a document are obfuscated to prevent the disclosure of private information if the document is disclosed. A user establishes various configuration parameters for selected text entities desired to obfuscated. The document is processed and text entities matching the configuration parameters are tagged for obfuscation. The tagged entities are then substituted in the document with obfuscating text. The obfuscating text can be derived from a hash table. The hash table may be used to provide a reverse obfuscation method by which original data can be restored to an obfuscated document.

    摘要翻译: 公开了一种数据混淆方法,装置和计算机程序产品,其中至少选择的文本实体(例如文档中的单词或缩写)被模糊化,以防止在披露文档时公开私人信息。 用户为所需的模糊化文本实体建立各种配置参数。 文档被处理,与配置参数匹配的文本实体被标记为混淆。 标记的实体然后在文档中用混淆文本替换。 混淆文本可以从散列表导出。 哈希表可以用于提供反向混淆方法,通过该方法可以将原始数据恢复到混淆的文档。

    DATA OBFUSCATION OF TEXT DATA USING ENTITY DETECTION AND REPLACEMENT
    9.
    发明申请
    DATA OBFUSCATION OF TEXT DATA USING ENTITY DETECTION AND REPLACEMENT 失效
    使用实体检测和替换的文本数据的数据欺骗

    公开(公告)号:US20080181396A1

    公开(公告)日:2008-07-31

    申请号:US12061783

    申请日:2008-04-03

    IPC分类号: H04L9/28 G06F17/00

    摘要: A data obfuscation method, apparatus and computer program product are disclosed in which at least selected text entities such as words or abbreviations in a document are obfuscated to prevent the disclosure of private information if the document is disclosed. A user establishes various configuration parameters for selected text entities desired to obfuscated. The document is processed and text entities matching the configuration parameters are tagged for obfuscation. The tagged entities are then substituted in the document with obfuscating text. The obfuscating text can be derived from a hash table. The hash table may be used to provide a reverse obfuscation method by which original data can be restored to an obfuscated document.

    摘要翻译: 公开了一种数据混淆方法,装置和计算机程序产品,其中至少选择的文本实体(例如文档中的单词或缩写)被模糊化,以防止在披露文档时公开私人信息。 用户为所需的模糊化文本实体建立各种配置参数。 文档被处理,与配置参数匹配的文本实体被标记为混淆。 标记的实体然后在文档中用混淆文本替换。 混淆文本可以从散列表导出。 哈希表可以用于提供反向混淆方法,通过该方法可以将原始数据恢复到混淆的文档。