Content grouping systems and methods
    3.
    发明授权
    Content grouping systems and methods 失效
    内容分组系统和方法

    公开(公告)号:US08577887B2

    公开(公告)日:2013-11-05

    申请号:US12639768

    申请日:2009-12-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30911

    摘要: A method of grouping a plurality of media content is provided. The method includes converting at least a portion of the media content into at least one document object model (“DOM”) using a processor. The DOM can include a plurality of block elements, each comprising at least one content object. The method includes apportioning the content objects into a relevant portion and an irrelevant portion and extracting a set of keywords, the set comprising at least one keyword, within the relevant portion of the content objects. The method includes apportioning the relevant portion of the content objects into a related portion and an unrelated portion using at least a portion of the set of keywords and grouping the related portion of the content to provide a group of related content.

    摘要翻译: 提供了一种分组多个媒体内容的方法。 该方法包括使用处理器将媒体内容的至少一部分转换成至少一个文档对象模型(“DOM”)。 DOM可以包括多个块元素,每个块元素包括至少一个内容对象。 该方法包括将内容对象分配到相关部分和不相关部分中,并且在内容对象的相关部分内提取一组关键字,该集合包括至少一个关键字。 该方法包括使用该组关键字的至少一部分将内容对象的相关部分分配到相关部分和不相关部分中,并且对内容的相关部分进行分组以提供一组相关内容。

    CONTENT GROUPING SYSTEMS AND METHODS
    4.
    发明申请
    CONTENT GROUPING SYSTEMS AND METHODS 失效
    内容分组系统和方法

    公开(公告)号:US20110145249A1

    公开(公告)日:2011-06-16

    申请号:US12639768

    申请日:2009-12-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30911

    摘要: A method of grouping a plurality of media content is provided. The method includes converting at least a portion of the media content into at least one document object model (“DOM”) using a processor. The DOM can include a plurality of block elements, each comprising at least one content object. The method includes apportioning the content objects into a relevant portion and an irrelevant portion and extracting a set of keywords, the set comprising at least one keyword, within the relevant portion of the content objects. The method includes apportioning the relevant portion of the content objects into a related portion and an unrelated portion using at least a portion of the set of keywords and grouping the related portion of the content to provide a group of related content.

    摘要翻译: 提供了一种分组多个媒体内容的方法。 该方法包括使用处理器将媒体内容的至少一部分转换成至少一个文档对象模型(“DOM”)。 DOM可以包括多个块元素,每个块元素包括至少一个内容对象。 该方法包括将内容对象分配到相关部分和不相关部分中,并且在内容对象的相关部分内提取一组关键字,该集合包括至少一个关键字。 该方法包括使用该组关键字的至少一部分将内容对象的相关部分分配到相关部分和不相关部分中,并且对内容的相关部分进行分组以提供一组相关内容。

    SYSTEMS AND METHODS FOR ADDING COMMERCIAL CONTENT TO PRINTOUTS
    5.
    发明申请
    SYSTEMS AND METHODS FOR ADDING COMMERCIAL CONTENT TO PRINTOUTS 审中-公开
    将商业内容添加到打印机的系统和方法

    公开(公告)号:US20150138605A1

    公开(公告)日:2015-05-21

    申请号:US13821356

    申请日:2010-09-21

    IPC分类号: G06Q30/02 G06F3/12 G06K15/02

    摘要: Systems, devices and methods are provided which relate to detecting a print command on a client computer, the print command reflecting an interest to print content of an electronic document, accessible by a client computer, as a hard copy printout. One method includes analyzing the electronic document content to determine its underlying subject matter, identifying commercial content relevant to the underlying subject matter, and creating and formatting a new, printable document that includes the electronic document content and the identified commercial content.

    摘要翻译: 提供了与检测客户端计算机上的打印命令相关的系统,设备和方法,该打印命令反映了将由客户端计算机访问的电子文档的内容打印出来的兴趣,作为硬拷贝打印输出。 一种方法包括分析电子文档内容以确定其基本主题,识别与底层主题相关的商业内容,以及创建和格式化包括电子文档内容和所识别的商业内容的新的可打印文档。

    Document Key Phrase Extraction Method
    6.
    发明申请
    Document Key Phrase Extraction Method 有权
    文献关键短语提取方法

    公开(公告)号:US20120047149A1

    公开(公告)日:2012-02-23

    申请号:US13264806

    申请日:2009-05-12

    IPC分类号: G06F17/30

    摘要: A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not linked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings. A computer program for implementing the steps of this method when executed on a computer is also disclosed.

    摘要翻译: 公开了一种从文档中提取关键短语的计算机实现的方法,包括以下步骤:访问包含链接对象的存储库,所述存储库包括表示使用不同表示标准的所述对象之间的关系的第一和第二数据结构; 基于第二数据结构中的所述对象之间的进一步的关系,通过去除主体之间的链接来修剪第一数据结构; 将所述文档中的短语与修剪的第一数据结构中的对象匹配; 通过删除与匹配对象无关的不匹配的主题,进一步修剪已修剪的第一个数据结构; 确定每个匹配对象的排名; 并使用确定的受试者排名选择关键短语。 还公开了一种用于在计算机上执行时实现该方法的步骤的计算机程序。

    Document key phrase extraction method
    7.
    发明授权
    Document key phrase extraction method 有权
    文献关键词提取方法

    公开(公告)号:US08935260B2

    公开(公告)日:2015-01-13

    申请号:US13264806

    申请日:2009-05-12

    IPC分类号: G06F17/30 G06F17/27

    摘要: A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not linked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings. A computer program for implementing the steps of this method when executed on a computer is also disclosed.

    摘要翻译: 公开了一种从文档中提取关键短语的计算机实现的方法,包括以下步骤:访问包含链接对象的存储库,所述存储库包括表示使用不同表示标准的所述对象之间的关系的第一和第二数据结构; 基于所述第二数据结构中的所述对象之间的进一步关系,通过去除主体之间的链接来修剪第一数据结构; 将所述文档中的短语与修剪的第一数据结构中的对象匹配; 通过删除与匹配对象无关的不匹配的主题,进一步修剪已修剪的第一个数据结构; 确定每个匹配对象的排名; 并使用确定的受试者排名选择关键短语。 还公开了一种用于在计算机上执行时实现该方法的步骤的计算机程序。

    METHOD FOR KEYWORD EXTRACTION
    8.
    发明申请
    METHOD FOR KEYWORD EXTRACTION 审中-公开
    关键词提取方法

    公开(公告)号:US20130036076A1

    公开(公告)日:2013-02-07

    申请号:US13641054

    申请日:2010-04-14

    IPC分类号: G06F17/30 G06F15/18

    CPC分类号: G06F16/313 G06N5/04

    摘要: Presented is a method of extracting keywords. The method includes obtaining a corpus of documents, determining a first set of words that appear as keywords in a document present in the corpus of documents, determining a second set of words that appear in the corpus of documents but not necessarily appear as keywords in the document, and determining a final set of keywords for the document by combining the first set of words with the second set of words.

    摘要翻译: 提出的是提取关键字的方法。 该方法包括获得文档语料库,确定出现在文档语料库中的文档中作为关键词出现的第一组单词,确定出现在文档语料库中但不一定作为关键词出现的第二组单词 通过将第一组词与第二组词相结合来确定文档的最终关键词集合。

    System and Method for Automatically Extracting Metadata from Unstructured Electronic Documents
    9.
    发明申请
    System and Method for Automatically Extracting Metadata from Unstructured Electronic Documents 有权
    从非结构化电子文档自动提取元数据的系统和方法

    公开(公告)号:US20120278705A1

    公开(公告)日:2012-11-01

    申请号:US13258484

    申请日:2010-01-18

    IPC分类号: G06F17/21

    CPC分类号: G06F17/2745 G06F17/30722

    摘要: A system and method for automatically extracting meta data from unstructured electronic documents is disclosed. In one embodiment, the unstructured electronic document is converted into a plain text document. Further, a document header of the unstructured electronic document is extracted from the plain text document using a rule-based document header extractor, where the rule-based document header extractor may be based on a rule that includes determining a ratio of a number of words with their initial letters capitalized in a text line over a total number of words in the text line in the plain text document. Moreover, meta data is extracted from the extracted document header using a heuristic approach.

    摘要翻译: 公开了一种用于从非结构化电子文档自动提取元数据的系统和方法。 在一个实施例中,非结构化电子文档被转换成纯文本文档。 此外,使用基于规则的文档头提取器从明文文档中提取非结构化电子文档的文档头,其中基于规则的文档头提取器可以基于包括确定字数的比率的规则 他们的初始字母在纯文本文本中的文本行中的文字总数中以大写字母大写。 此外,使用启发式方法从提取的文档头中提取元数据。

    Classification of a document according to a weighted search tree created by genetic algorithms
    10.
    发明授权
    Classification of a document according to a weighted search tree created by genetic algorithms 有权
    根据由遗传算法创建的加权搜索树的文档分类

    公开(公告)号:US08639643B2

    公开(公告)日:2014-01-28

    申请号:US13119936

    申请日:2008-10-31

    IPC分类号: G06F15/18 G06N3/00 G06N3/12

    CPC分类号: G06F17/30707

    摘要: A device for classifying a document comprises a module to generate a data tree structure and configured to assign terms to a first plurality of nodes of the data tree structure, where each of the first plurality of nodes is assigned a weight. In assigning the weights of the first plurality of nodes, a first generation of combinations of possible weights assignable as the weights of the first plurality of nodes is obtained, and a second generation of combinations of possible weights assignable as the weights of the first plurality of nodes is obtained by performing the genetic algorithms in the first generation of combinations of possible weights. The device determines whether the document is in a document class based at least the weights of the first plurality of nodes.

    摘要翻译: 用于对文档进行分类的设备包括用于生成数据树结构并被配置为向数据树结构的第一多个节点分配术语的模块,其中第一多个节点中的每一个被分配权重。 在分配第一多个节点的权重时,获得可分配为第一多个节点的权重的可能权重的组合的第一代,以及可分配的可能权重的组合的第二代,可分配为第一多个节点的权重 通过在第一代可能权重的组合中执行遗传算法来获得节点。 该装置至少基于第一多个节点的权重来确定文档是否在文档类中。