ASSIGNING INTO ONE SET OF CATEGORIES INFORMATION THAT HAS BEEN ASSIGNED TO OTHER SETS OF CATEGORIES
    12.
    发明申请
    ASSIGNING INTO ONE SET OF CATEGORIES INFORMATION THAT HAS BEEN ASSIGNED TO OTHER SETS OF CATEGORIES 审中-公开
    将其分配给一组已被分配给其他类别的信息

    公开(公告)号:US20110137908A1

    公开(公告)日:2011-06-09

    申请号:US12980162

    申请日:2010-12-28

    IPC分类号: G06F17/30

    CPC分类号: G06F16/353

    摘要: Techniques are described for assigning, to target categories of a target scheme, items that have been obtained from a plurality of sources. In situations in which one or more of the sources has organized its information according to a source scheme that differs from the target scheme, the assignment may be based, in part, on an estimate of the probability that items from a particular source category should be assigned to a particular target category. Such probability estimates may be based on how many training set items associated with the particular source category have been assigned to the particular target category. Source categories may be grouped into clusters. The probability estimates may also be based on how many training set items within the cluster to which the particular source category has been mapped, have been assigned the particular target category.

    摘要翻译: 描述了将目标方案的目标类别分配给从多个源获得的项目的技术。 在一个或多个来源已经根据与目标方案不同的源方案来组织其信息的情况下,分配可以部分地基于来自特定源类别的项目应该是 分配到特定目标类别。 这种概率估计可以基于与特定源类别相关联的多少训练集项目已被分配给特定目标类别。 源类别可以分组成簇。 概率估计还可以基于特定源类别已映射到的集群内的多少个训练集项目已被分配给特定的目标类别。

    Assigning into one set of categories information that has been assigned to other sets of categories
    13.
    发明授权
    Assigning into one set of categories information that has been assigned to other sets of categories 有权
    分配到一组类别已经分配给其他类别的类别的信息

    公开(公告)号:US07885859B2

    公开(公告)日:2011-02-08

    申请号:US11373726

    申请日:2006-03-10

    IPC分类号: G06Q30/00

    CPC分类号: G06F17/30707

    摘要: Techniques are described for assigning, to target categories of a target scheme, items that have been obtained from a plurality of sources. In situations in which one or more of the sources has organized its information according to a source scheme that differs from the target scheme, the assignment may be based, in part, on an estimate of the probability that items from a particular source category should be assigned to a particular target category. Such probability estimates may be based on how many training set items associated with the particular source category have been assigned to the particular target category. Source categories may be grouped into clusters. The probability estimates may also be based on how many training set items within the cluster to which the particular source category has been mapped, have been assigned the particular target category.

    摘要翻译: 描述了将目标方案的目标类别分配给从多个源获得的项目的技术。 在一个或多个来源已经根据与目标方案不同的源方案来组织其信息的情况下,分配可以部分地基于来自特定源类别的项目应该是 分配到特定目标类别。 这种概率估计可以基于与特定源类别相关联的多少训练集项目已被分配给特定目标类别。 源类别可以分组成簇。 概率估计还可以基于特定源类别已映射到的集群内的多少个训练集项目已被分配给特定的目标类别。

    System and method for determining web page quality using collective inference based on local and global information
    14.
    发明申请
    System and method for determining web page quality using collective inference based on local and global information 有权
    基于本地和全球信息的集体推理来确定网页质量的系统和方法

    公开(公告)号:US20080195631A1

    公开(公告)日:2008-08-14

    申请号:US11706025

    申请日:2007-02-13

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30873

    摘要: An improved system and method is provided for determining web page quality using collective inference based on local and global web page information. A classification engine may be provided for classifying a web page using local features of a seed set of web pages and global web graph information about the seed set of web pages. A dual algorithm based on graph regularization formulated as a well-formed optimization solution may be used in an embodiment for applying collective inference for binary classification of the web page using the local web page information and global web graph information of a web page, the local web page information and global web graph information of an authoritative set of web pages, and the local web page information and global web graph information of a non-authoritative set of web pages.

    摘要翻译: 提供了一种改进的系统和方法,用于基于本地和全球网页信息使用集体推理来确定网页质量。 可以提供分类引擎,用于使用网页的种子集的局部特征和关于网页的种子集合的全局网络图信息对网页进行分类。 在一个实施例中,可以使用基于图形正则化的双精度算法,该算法可以用于使用本地网页信息和网页的全局网络图信息来应用集体推理网页的二进制分类的实施例 网页信息和权威的网页集的全球网络图信息,以及非权威性网页集的本地网页信息和全球网络图信息。

    Model selection in machine learning with applications to document clustering
    15.
    发明授权
    Model selection in machine learning with applications to document clustering 失效
    机器学习中的模型选择与应用程序进行文档聚类

    公开(公告)号:US06584456B1

    公开(公告)日:2003-06-24

    申请号:US09597913

    申请日:2000-06-19

    IPC分类号: G06F1700

    摘要: A objective function based on a Bayesian statistical estimation framework is used to determine an optimal model selection by choosing both the optimal number of clusters and the optimal feature set. Heuristics can be applied to find the optimal (or at least sub-optimal) of this objective function in terms of the feature sets and the number of clusters, wherein the maximization of the objective function corresponds to the optimal model structure.

    摘要翻译: 基于贝叶斯统计估计框架的目标函数用于通过选择最佳聚类数和最优特征集来确定最优模型选择。 可以应用启发式来找出该目标函数在特征集和簇数量上的最优(或至少次优),其中目标函数的最大化对应于最优模型结构。

    Enhanced hypertext categorization using hyperlinks
    16.
    发明授权
    Enhanced hypertext categorization using hyperlinks 失效
    使用超链接增强超文本分类

    公开(公告)号:US06389436B1

    公开(公告)日:2002-05-14

    申请号:US08990292

    申请日:1997-12-15

    IPC分类号: G06F1500

    摘要: A method, apparatus, and article of manufacture for a computer implemented hypertext classifier. A new document containing citations to and from other documents is classified. Initially, documents within a neighborhood of the new document are identified. For each document and each class, an initial probability is determined that indicates the probability that the document fits a particular class. Next, iterative relaxation is performed to identify a class for each document using the initial probabilities. A class is selected into which the new document is to be classified based on the initial probabilities and identified classes.

    摘要翻译: 一种用于计算机实现的超文本分类器的方法,装置和制品。 包含来自其他文件的引文的新文档被分类。 最初,识别新文档附近的文档。 对于每个文档和每个类,确定指示文档适合特定类的概率的初始概率。 接下来,执行迭代放松以使用初始概率来识别每个文档的类。 根据初始概率和识别的类别,选择要对新文档进行分类的类。

    Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
    17.
    发明授权
    Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values 失效
    基于使用渔民价值作为歧视价值的培训文件分类的特征的多级分类法

    公开(公告)号:US06233575B1

    公开(公告)日:2001-05-15

    申请号:US09102861

    申请日:1998-06-23

    IPC分类号: G06F1730

    摘要: A system, process, and article of manufacture for organizing a large text database into a hierarchy of topics and for maintaining this organization as documents are added and deleted and as the topic hierarchy changes. Given sample documents belonging to various nodes in the topic hierarchy, the tokens (terms, phrases, dates, or other usable feature in the document) that are most useful at each internal decision node for the purpose of routing new documents to the children of that node are automatically detected. Using feature terms, statistical models are constructed for each topic node. The models are used in an estimation technique to assign topic paths to new unlabeled documents. The hierarchical technique, in which feature terms can be very different at different nodes, leads to an efficient context-sensitive classification technique. The hierarchical technique can handle millions of documents and tens of thousands of topics. A resulting taxonomy and path enhanced retrieval system (TAPER) is used to generate context-dependent document indexing terms. The topic paths are used, in addition to keywords, for better focused searching and browsing of the text database.

    摘要翻译: 将大型文本数据库组织到主题层次结构中并将该组织作为文档进行维护的系统,过程和制品被添加和删除,并且随着主题层级的改变。 给定属于主题层次结构中各种节点的示例文档,在每个内部决策节点最有用的令牌(文档中的术语,短语,日期或其他可用功能),以将新文档路由到该文档的子项 节点被自动检测。 使用特征项,为每个主题节点构建统计模型。 这些模型用于估计技术,以将主题路径分配给新的未标记的文档。 特征项在不同节点上可能非常不同的分层技术导致了一种有效的上下文相关分类技术。 分层技术可以处理数百万个文档和数万个主题。 所得到的分类和路径增强检索系统(TAPER)用于生成与上下文相关的文档索引条款。 除了关键字之外,还使用主题路径,以便更好地集中搜索和浏览文本数据库。

    Video story board user interface for selective downloading and
displaying of desired portions of remote-stored video data objects
    18.
    发明授权
    Video story board user interface for selective downloading and displaying of desired portions of remote-stored video data objects 失效
    视频故事板用户界面,用于选择性地下载和显示远程存储的视频数据对象的期望部分

    公开(公告)号:US6166735A

    公开(公告)日:2000-12-26

    申请号:US984460

    申请日:1997-12-03

    IPC分类号: G06F17/30 G06F19/00

    摘要: A system and method are provided for supporting video browsing over a communication network such as the Internet/World Wide Web. A graphical user interface is provided through a client software tool such as a Web browser. A client/user selects a video data object stored at a remote server. A set of points within the object are displayed at the client's graphical user interface display, as representations, preferably thumbnail images, of the points within the object. The user selects an interval defined by the representations, preferably by using the graphical user interface to select two of the representations. The two selected representations delimit the beginning and end of a portion of the video object. Responsive to this selection, that portion of the video object is downloaded and displayed.

    摘要翻译: 提供了一种用于支持通过诸如因特网/万维网的通信网络进行视频浏览的系统和方法。 通过客户端软件工具(如Web浏览器)提供图形用户界面。 客户/用户选择存储在远程服务器上的视频数据对象。 对象内的一组点在客户端的图形用户界面显示中显示,作为对象内的点的表示,优选缩略图。 用户选择由表示定义的间隔,优选地通过使用图形用户界面来选择两个表示。 两个选择的表示限定视频对象的一部分的开始和结束。 响应于此选择,视频对象的该部分被下载并显示。