System and method for providing interactive feature selection for training a document classification system
    1.
    发明申请
    System and method for providing interactive feature selection for training a document classification system 审中-公开
    用于提供用于训练文档分类系统的交互式特征选择的系统和方法

    公开(公告)号:US20060212142A1

    公开(公告)日:2006-09-21

    申请号:US11376989

    申请日:2006-03-15

    IPC分类号: G05B13/02

    摘要: A method for facilitating development of a document classification function comprises selecting a feature of a document, the feature being less than an entirety of the document; presenting the feature to a human subject; asking the human subject for a feature relevance value of the feature; and generating a classification function using the feature relevance value. The method may also include the steps of presenting the document to the human subject at the same time as presenting the feature; asking the human subject for document relevance value that measures relevance of the document to a category; and wherein the generating the classification function also uses the document relevance value.

    摘要翻译: 一种便于开发文档分类功能的方法包括选择文档的特征,该特征小于该文档的整体; 将特征呈现给人类主体; 向人类主体询问该特征的特征相关性值; 以及使用所述特征相关性值来生成分类函数。 该方法还可以包括以下步骤:在呈现特征的同时将文档呈现给人类对象; 向人类主体询问衡量文件对某一类别的相关性的文件相关性价值; 并且其中生成所述分类功能也使用所述文档相关性值。

    System and method for biasing search results based on topic familiarity
    2.
    发明申请
    System and method for biasing search results based on topic familiarity 有权
    基于主题熟悉度偏好搜索结果的系统和方法

    公开(公告)号:US20060212423A1

    公开(公告)日:2006-09-21

    申请号:US11378871

    申请日:2006-03-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30666

    摘要: A familiarity level classifier comprises a stopwords engine for conducting a stopwords analysis of stopwords, e.g., introductory level stopwords and advanced level stopwords, in a document, e.g., a website; and a familiarity level classifier module for generating a document familiarity level based on the stopwords analysis. The classifier may be in an indexing module, a search engine, a user computer, or elsewhere in a computer network. The classifier may also include a reading level engine for conducting a reading level analysis of the document, and wherein the familiarity level classifier module is configured to generate the familiarity level also based on the reading level analysis. The classifier may also include a document features engine for conducting a feature analysis of the document, and wherein the familiarity level classifier module is configured to generate the document familiarity level also based on the feature analysis.

    摘要翻译: 熟悉度级别分类器包括用于在文档(例如网站)中进行无障碍词语分析的停用词引擎,例如介绍级别禁用词和高级级别的禁用词; 以及一个熟悉级别的分类器模块,用于基于该词语分析生成文档熟悉程度。 分类器可以在索引模块,搜索引擎,用户计算机或计算机网络中的其它地方。 分类器还可以包括读取级别引擎,用于对文档进行读取级别分析,并且其中熟悉度级别分类器模块被配置为也基于读取级别分析生成熟悉度级别。 分类器还可以包括用于进行文档的特征分析的文档特征引擎,并且其中熟悉级别分类器模块被配置为也基于特征分析生成文档熟悉度级别。

    System and method for biasing search results based on topic familiarity
    3.
    发明授权
    System and method for biasing search results based on topic familiarity 有权
    基于主题熟悉度偏好搜索结果的系统和方法

    公开(公告)号:US08095487B2

    公开(公告)日:2012-01-10

    申请号:US11378871

    申请日:2006-03-16

    IPC分类号: G06E1/00 G06E3/00

    CPC分类号: G06F17/30864 G06F17/30666

    摘要: A familiarity level classifier comprises a stopwords engine for conducting a stopwords analysis of stopwords, e.g., introductory level stopwords and advanced level stopwords, in a document, e.g., a website; and a familiarity level classifier module for generating a document familiarity level based on the stopwords analysis. The classifier may be in an indexing module, a search engine, a user computer, or elsewhere in a computer network. The classifier may also include a reading level engine for conducting a reading level analysis of the document, and wherein the familiarity level classifier module is configured to generate the familiarity level also based on the reading level analysis. The classifier may also include a document features engine for conducting a feature analysis of the document, and wherein the familiarity level classifier module is configured to generate the document familiarity level also based on the feature analysis.

    摘要翻译: 熟悉度级别分类器包括用于在文档(例如网站)中进行无障碍词语分析的停用词引擎,例如介绍级别禁用词和高级级别的禁用词; 以及一个熟悉级别的分类器模块,用于基于该词语分析生成文档熟悉程度。 分类器可以在索引模块,搜索引擎,用户计算机或计算机网络中的其它地方。 分类器还可以包括读取级别引擎,用于对文档进行读取级别分析,并且其中熟悉度级别分类器模块被配置为也基于读取级别分析生成熟悉度级别。 分类器还可以包括用于进行文档的特征分析的文档特征引擎,并且其中熟悉级别分类器模块被配置为也基于特征分析生成文档熟悉度级别。

    System and method for learning a network of categories using prediction
    4.
    发明授权
    System and method for learning a network of categories using prediction 有权
    使用预测学习类别网络的系统和方法

    公开(公告)号:US07877335B2

    公开(公告)日:2011-01-25

    申请号:US11975483

    申请日:2007-10-18

    申请人: Omid Madani

    发明人: Omid Madani

    IPC分类号: G06F15/18

    CPC分类号: G06F17/276

    摘要: An improved system and method is provided for efficiently learning a network of categories using prediction. A learning engine may receive a stream of characters and incrementally segment the stream of characters beginning with individual characters into larger and larger categories. To do so, a prediction engine may be provided for predicting a target category from the stream of characters using one or more context categories. Upon predicting the target category, the edges of the network of categories may be updated. A category composer may also be provided for composing a new category from existing categories in the network of categories, and a new category composed may then be added to the network of categories. Advantageously, iterative episodes of prediction and learning of categories for large scale applications may result in hundreds of thousands of categories connected by millions of prediction edges.

    摘要翻译: 提供了一种改进的系统和方法,用于使用预测有效地学习类别网络。 学习引擎可以接收一串字符,并将从单个字符开始的字符流逐渐地分割成更大和更大的类别。 为此,可以提供预测引擎,用于使用一个或多个上下文类别从字符串流中预测目标类别。 通过预测目标类别,可以更新类别网络的边缘。 还可以提供类别作曲者用于从类别网络中的现有类别中构成新的类别,然后将组合的新类别添加到类别网络中。 有利的是,用于大规模应用的类别的预测和学习的迭代事件可能导致数十万个类别通过数百万个预测边缘连接。

    System and method for learning a weighted index to categorize objects
    5.
    发明授权
    System and method for learning a weighted index to categorize objects 有权
    用于学习加权索引以分类对象的系统和方法

    公开(公告)号:US07756845B2

    公开(公告)日:2010-07-13

    申请号:US11648323

    申请日:2006-12-28

    IPC分类号: G06F17/30

    CPC分类号: G06N99/005 G06F17/30707

    摘要: An improved system and method is provided for learning a weighted index to categorize objects using ranked recall. In an offline embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by training during an entire initial pass of a training sequence of a collection of objects. In an online embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by dynamically updating the weighted index as each instance of the collection of objects may be categorized. Advantageously, an instance of a large collection of objects may be accurately and efficiently recalled for many large scale applications with hundreds of thousands of categories by quickly identifying a small set of candidate categories for the given instance of the object.

    摘要翻译: 提供了一种改进的系统和方法,用于学习加权索引以使用排序回忆对对象进行分类。 在离线实施例中,学习引擎可以学习加权索引,用于在对象集合的训练序列的整个初始通过期间通过训练使用排序回忆来对对象进行分类。 在在线实施例中,学习引擎可以通过动态地更新加权索引来学习加权索引,用于对使用排序回忆进行分类的加权索引,因为每个对象集合的实例可以被分类。 有利的是,通过快速识别对象的给定实例的一小组候选类别,可以为具有成千上万个类别的许多大规模应用程序准确有效地调用大量对象的实例。

    Concept learning system and method
    7.
    发明申请
    Concept learning system and method 审中-公开
    概念学习系统和方法

    公开(公告)号:US20080050712A1

    公开(公告)日:2008-02-28

    申请号:US11502949

    申请日:2006-08-11

    IPC分类号: G09B3/00

    CPC分类号: G09B7/02

    摘要: According to a preferred embodiment, a concept learning system and method is used for classifying instances, which, for example, may include web pages or text documents. An instance is input into the system. One or more candidate concepts are recalled from a set of candidate concepts. For each recalled concept, a classifier that corresponds to it is applied to the instance to determine if the recalled concept is related to the instance. Samples are selected from a training set. A learning method is applied, and a set of candidate concepts are updated according to the results from applying the learning method.

    摘要翻译: 根据优选实施例,概念学习系统和方法用于分类实例,例如可以包括网页或文本文档。 一个实例被输入到系统中。 从一组候选概念中召回一个或多个候选概念。 对于每个召回的概念,将对应于它的分类器应用于实例以确定被调用的概念是否与实例相关。 样本是从训练集中选出的。 应用学习方法,根据应用学习方法的结果更新一组候选概念。

    System and method for learning a network of categories using prediction
    8.
    发明申请
    System and method for learning a network of categories using prediction 有权
    使用预测学习类别网络的系统和方法

    公开(公告)号:US20090106022A1

    公开(公告)日:2009-04-23

    申请号:US11975483

    申请日:2007-10-18

    申请人: Omid Madani

    发明人: Omid Madani

    IPC分类号: G10L15/16

    CPC分类号: G06F17/276

    摘要: An improved system and method is provided for efficiently learning a network of categories using prediction. A learning engine may receive a stream of characters and incrementally segment the stream of characters beginning with individual characters into larger and larger categories. To do so, a prediction engine may be provided for predicting a target category from the stream of characters using one or more context categories. Upon predicting the target category, the edges of the network of categories may be updated. A category composer may also be provided for composing a new category from existing categories in the network of categories, and a new category composed may then be added to the network of categories. Advantageously, iterative episodes of prediction and learning of categories for large scale applications may result in hundreds of thousands of categories connected by millions of prediction edges.

    摘要翻译: 提供了一种改进的系统和方法,用于使用预测有效地学习类别网络。 学习引擎可以接收一串字符,并将从单个字符开始的字符流逐渐地分割成更大和更大的类别。 为此,可以提供预测引擎,用于使用一个或多个上下文类别从字符串流中预测目标类别。 通过预测目标类别,可以更新类别网络的边缘。 还可以提供类别作曲者用于从类别网络中的现有类别中构成新的类别,然后将组合的新类别添加到类别网络中。 有利的是,用于大规模应用的类别的预测和学习的迭代事件可能导致数十万个类别通过数百万个预测边缘连接。

    System and method for learning a weighted index to categorize objects
    9.
    发明申请
    System and method for learning a weighted index to categorize objects 有权
    用于学习加权索引以分类对象的系统和方法

    公开(公告)号:US20080162385A1

    公开(公告)日:2008-07-03

    申请号:US11648323

    申请日:2006-12-28

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06F17/30707

    摘要: An improved system and method is provided for learning a weighted index to categorize objects using ranked recall. In an offline embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by training during an entire initial pass of a training sequence of a collection of objects. In an online embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by dynamically updating the weighted index as each instance of the collection of objects may be categorized. Advantageously, an instance of a large collection of objects may be accurately and efficiently recalled for many large scale applications with hundreds of thousands of categories by quickly identifying a small set of candidate categories for the given instance of the object.

    摘要翻译: 提供了一种改进的系统和方法,用于学习加权索引以使用排序回忆对对象进行分类。 在离线实施例中,学习引擎可以学习加权索引,用于在对象集合的训练序列的整个初始通过期间通过训练使用排序回忆来对对象进行分类。 在在线实施例中,学习引擎可以通过动态地更新加权索引来学习加权索引,用于使用排序回忆对对象进行分类,因为每个对象集合的实例可以被分类。 有利地,通过快速识别对象的给定实例的一小组候选类别,可以为具有成千上万个类别的许多大规模应用程序准确有效地调用大量对象的实例。