Method and apparatus for document filtering using ensemble filters
    1.
    发明授权
    Method and apparatus for document filtering using ensemble filters 失效
    使用集成滤波器进行文档过滤的方法和装置

    公开(公告)号:US07398269B2

    公开(公告)日:2008-07-08

    申请号:US10713592

    申请日:2003-11-14

    IPC分类号: G06F7/00

    摘要: A technique for representing an information need and employing one or more filters to select documents that satisfy the represented information need, including a technique of creating filters that involves (a) dividing a set of documents into one or more subsets such that each subset can be used as the source of features for creating a filtering profile or used to set or validate the score threshold for the profile and (b) determining whether multiple profiles are required and how to combine them to create an effective filter. Multiple profiles can be incorporated into an individual filter and the individual filters combined to create an ensemble filter. Ensemble filters can then be further combined to create meta filters.

    摘要翻译: 用于表示信息的技术需要并采用一个或多个过滤器来选择满足所表示的信息的文档,包括创建过滤器的技术,该技术涉及(a)将一组文档划分成一个或多个子集,使得每个子集可以是 用作创建过滤配置文件或用于设置或验证配置文件的分数阈值的功能的来源,以及(b)确定是否需要多个配置文件,以及如何组合它们以创建有效的过滤器。 多个配置文件可以并入到单个过滤器中,并且各个过滤器组合以创建整体过滤器。 然后可以将组合过滤器进一步组合以创建元过滤器。

    Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
    3.
    发明授权
    Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance 失效
    用于构建紧凑型相似度结构并用于分析文档相关性的方法和装置

    公开(公告)号:US07949644B2

    公开(公告)日:2011-05-24

    申请号:US12152522

    申请日:2008-05-15

    IPC分类号: G06F7/00

    摘要: A computer-readable medium comprises data structure for providing information about levels of similarity between pairs of N documents. The data structure comprises a plurality of entries of similarity values representing levels of similarity for a plurality of pairs of the documents. Each of the similarity values represents a level of similarity of one document of a given pair relative to the other document of the given pair. The similarity value of each entry is greater than a threshold similarity value that is greater than zero. The plurality of similarity-value entries are fewer than N2−N in number if the similarity values are asymmetric with regard to document pairing, and the plurality of similarity-value entries are fewer than N 2 - N 2 in number if the similarity values are symmetric with regard to document pairing. A method and apparatus for generating the data structure are described.

    摘要翻译: 计算机可读介质包括用于提供关于N个文档对之间的相似性级别的信息的数据结构。 数据结构包括表示多对文档对象的相似度级的多个相似度条目。 每个相似度值表示给定对的一个文档相对于给定对的另一个文档的相似度级别。 每个条目的相似度值大于大于零的阈值相似度值。 如果相似性值对于文档配对是不对称的,则多个相似值条目数量少于数目中的N2-N,并且如果相似度值是相似度值,则多个相似值条目数量少于N 2 -N 2 对于文件配对。 描述了用于生成数据结构的方法和装置。

    Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
    4.
    发明授权
    Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance 失效
    用于构建紧凑型相似度结构并用于分析文档相关性的方法和装置

    公开(公告)号:US07472131B2

    公开(公告)日:2008-12-30

    申请号:US11298500

    申请日:2005-12-12

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    摘要: A computer-readable medium comprises data structure for providing information about levels of similarity between pairs of N documents. The data structure comprises a plurality of entries of similarity values representing levels of similarity for a plurality of pairs of the documents. Each of the similarity values represents a level of similarity of one document of a given pair relative to the other document of the given pair. The similarity value of each entry is greater than a threshold similarity value that is greater than zero. The plurality of similarity-value entries are fewer than N2−N in number if the similarity values are asymmetric with regard to document pairing, and the plurality of similarity-value entries are fewer than N 2 - N 2 in number if the similarity values are symmetric with regard to document pairing. A method and apparatus for generating the data structure are described.

    摘要翻译: 计算机可读介质包括用于提供关于N个文档对之间的相似性级别的信息的数据结构。 数据结构包括表示多对文档对象的相似度级的多个相似度条目。 每个相似度值表示给定对的一个文档相对于给定对的另一个文档的相似度级别。 每个条目的相似度值大于大于零的阈值相似度值。 如果相似度值对于文档配对是不对称的,则多个相似值条目数量少于数目中的N2-N,并且多个相似度值条目少于 N 2 - 如果相似度值对于文档配对,则数字中的 2 。 描述了用于生成数据结构的方法和装置。

      Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
      5.
      发明申请
      Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance 失效
      用于构建紧凑型相似度结构并用于分析文档相关性的方法和装置

      公开(公告)号:US20080275870A1

      公开(公告)日:2008-11-06

      申请号:US12152522

      申请日:2008-05-15

      IPC分类号: G06F17/30

      摘要: A computer-readable medium comprises data structure for providing information about levels of similarity between pairs of N documents. The data structure comprises a plurality of entries of similarity values representing levels of similarity for a plurality of pairs of the documents. Each of the similarity values represents a level of similarity of one document of a given pair relative to the other document of the given pair. The similarity value of each entry is greater than a threshold similarity value that is greater than zero. The plurality of similarity-value entries are fewer than N2−N in number if the similarity values are asymmetric with regard to document pairing, and the plurality of similarity-value entries are fewer than N 2 - N 2 in number if the similarity values are symmetric with regard to document pairing. A method and apparatus for generating the data structure are described.

      摘要翻译: 计算机可读介质包括用于提供关于N个文档对之间的相似性级别的信息的数据结构。 数据结构包括表示多对文档对象的相似度级的多个相似度条目。 每个相似度值表示给定对的一个文档相对于给定对的另一个文档的相似度级别。 每个条目的相似度值大于大于零的阈值相似度值。 如果相似度值对于文档配对是不对称的,则多个相似值条目数量少于N≥2,并且多个相似值条目少于 N 2 如果相似性值是对称的,则数字中的 - N 2 文件配对。 描述了用于生成数据结构的方法和装置。

        Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering
        6.
        发明授权
        Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering 失效
        用于调整用于文本分类和过滤的支持向量机的模型阈值的方法和装置

        公开(公告)号:US07356187B2

        公开(公告)日:2008-04-08

        申请号:US10822327

        申请日:2004-04-12

        IPC分类号: G06K9/62

        CPC分类号: G06F17/3069 G06K9/6269

        摘要: An information need can be modeled by a binary classifier such as support vector machine (SVM). SVMs can exhibit very conservative precision oriented behavior when modeling information needs. This conservative behavior can be overcome by adjusting the position of the hyperplane, the geometric representation of a SVM. The present invention describes a couple of automatic techniques for adjusting the position of an SVM model based upon a beta-gamma thresholding procedure, cross fold validation and retrofitting. This adjustment technique can also be applied to other types of learning strategies.

        摘要翻译: 信息需求可以由诸如支持向量机(SVM)的二进制分类器来建模。 当建模信息需求时,SVM可以表现出非常保守的精确定向行为。 这种保守的行为可以通过调整超平面的位置,SVM的几何表示来克服。 本发明描述了一些用于基于β-gamma阈值处理程序,交叉验证和翻新来调整SVM模型的位置的自动技术。 这种调整技术也可以应用于其他类型的学习策略。

        Query parser derivation computing device and method for making a query parser for parsing unstructured search queries
        7.
        发明授权
        Query parser derivation computing device and method for making a query parser for parsing unstructured search queries 有权
        查询解析器导出计算设备和用于制作用于解析非结构化搜索查询的查询解析器的方法

        公开(公告)号:US09218390B2

        公开(公告)日:2015-12-22

        申请号:US13194887

        申请日:2011-07-29

        IPC分类号: G06F7/00 G06F17/30

        CPC分类号: G06F17/30401 G06F17/3087

        摘要: A system and method is provided which may comprise parsing an unstructured geographic web-search query into a field-based format, by utilizing conditional random fields, learned by semi-supervised automated learning, to parse structured information from the unstructured geographic web-search query. The system and method may also comprise establishing semi-supervised conditional random fields utilizing one of a rule-based finite state machine model and a statistics-based conditional random field model. Systematic geographic parsing may be used with the one of the rule-based finite state machine model and the statistics-based conditional random field model. Parsing an unstructured local geographical web-based query in local domain may be done by applying a learned model parser to the query, using at least one class-based query log from a form-based query system. The learned model parser may comprise at least one class-level n-gram language model-based feature harvested from a structured query log.

        摘要翻译: 提供了一种系统和方法,其可以包括通过利用通过半监督自动化学习学习的条件随机字段将非结构化地理网络搜索查询解析为基于字段的格式来从非结构化地理网络搜索查询中解析结构化信息 。 系统和方法还可以包括利用基于规则的有限状态机模型和基于统计的条件随机场模型之一建立半监督条件随机场。 系统地理解析可以与基于规则的有限状态机模型和基于统计的条件随机场模型之一一起使用。 在本地域中解析非结构化的本地地理网络查询可以通过使用基于表单的查询系统中至少一个基于类的查询日志将学习的模型解析器应用于查询来完成。 所学习的模型解析器可以包括从结构化查询日志中收集的至少一个基于类级别的基于n-gram语言模型的特征。

        Query Parser Derivation Computing Device and Method for Making a Query Parser for Parsing Unstructured Search Queries
        8.
        发明申请
        Query Parser Derivation Computing Device and Method for Making a Query Parser for Parsing Unstructured Search Queries 有权
        查询解析器推导计算设备和方法用于分析非结构化搜索查询的查询解析器

        公开(公告)号:US20130031113A1

        公开(公告)日:2013-01-31

        申请号:US13194887

        申请日:2011-07-29

        IPC分类号: G06F17/30

        CPC分类号: G06F17/30401 G06F17/3087

        摘要: A system and method is provided which may comprise parsing an unstructured geographic web-search query into a field-based format, by utilizing conditional random fields, learned by semi-supervised automated learning, to parse structured information from the unstructured geographic web-search query. The system and method may also comprise establishing semi-supervised conditional random fields utilizing one of a rule-based finite state machine model and a statistics-based conditional random field model. Systematic geographic parsing may be used with the one of the rule-based finite state machine model and the statistics-based conditional random field model. Parsing an unstructured local geographical web-based query in local domain may be done by applying a learned model parser to the query, using at least one class-based query log from a form-based query system. The learned model parser may comprise at least one class-level n-gram language model-based feature harvested from a structured query log.

        摘要翻译: 提供了一种系统和方法,其可以包括通过利用通过半监督自动化学习学习的条件随机字段将非结构化地理网络搜索查询解析为基于字段的格式来从非结构化地理网络搜索查询中解析结构化信息 。 系统和方法还可以包括利用基于规则的有限状态机模型和基于统计的条件随机场模型之一建立半监督条件随机场。 系统地理解析可以与基于规则的有限状态机模型和基于统计的条件随机场模型之一一起使用。 在本地域中解析非结构化的本地地理网络查询可以通过使用基于表单的查询系统中至少一个基于类的查询日志将学习的模型解析器应用于查询来完成。 所学习的模型解析器可以包括从结构化查询日志中收集的至少一个基于类级别的基于n-gram语言模型的特征。

        Fuzzy text categorizer
        9.
        发明授权
        Fuzzy text categorizer 有权
        模糊文本分类器

        公开(公告)号:US06868411B2

        公开(公告)日:2005-03-15

        申请号:US09928619

        申请日:2001-08-13

        申请人: James G. Shanahan

        发明人: James G. Shanahan

        CPC分类号: G06F17/30707

        摘要: A text categorizer classifies a text object into one or more classes. The text categorizer includes a pre-processing module, a knowledge base, and an approximate reasoning module. The pre-processing module performs feature extraction, feature reduction, and fuzzy set generation to represent an unlabelled text object in terms of one or more fuzzy sets. The approximate reasoning module uses a measured degree of match between the one or more fuzzy set and categories represented by fuzzy rules in the knowledge base to assign labels of those categories that satisfy a selected decision making rule.

        摘要翻译: 文本分类器将文本对象分类为一个或多个类。 文本分类器包括预处理模块,知识库和近似推理模块。 预处理模块执行特征提取,特征缩减和模糊集生成,以根据一个或多个模糊集来表示未标记的文本对象。 近似推理模块使用知识库中的一个或多个模糊集和由模糊规则表示的类别之间的测量的匹配度来分配满足选择的决策规则的那些类别的标签。