UTILIZING OFFLINE CLUSTERS FOR REALTIME CLUSTERING OF SEARCH RESULTS
    1.
    发明申请
    UTILIZING OFFLINE CLUSTERS FOR REALTIME CLUSTERING OF SEARCH RESULTS 审中-公开
    利用搜索结果实时聚类的离线群集

    公开(公告)号:US20120284275A1

    公开(公告)日:2012-11-08

    申请号:US13099197

    申请日:2011-05-02

    IPC分类号: G06F17/30

    CPC分类号: G06F16/358 G06F16/951

    摘要: Techniques for clustering of search results are described. In an example embodiment, a plurality of first clusters is determined, in a corpus of articles, independently of user queries issued against the corpus of articles, where each first cluster represents a group of articles that relate to a news story. One or more cluster identifiers are assigned to each article in the corpus, where the one or more cluster identifiers respectively identify one or more of the plurality of first clusters to which the article belongs. A query that specifies search criteria against the corpus of articles is received. In response to receiving the query, a result for the query is generated by at least selecting, from the corpus of articles, a set of articles based on the search criteria. The selected set of articles is grouped into one or more second clusters based at least on the one or more cluster identifiers that are assigned to each article in the set of articles. In the result for the query, the set of articles is organized according to the one or more second clusters.

    摘要翻译: 描述搜索结果聚类技术。 在示例实施例中,在文章的语料库中,独立于针对文章语料库的用户查询来确定多个第一群集,其中每个第一群集代表与新闻故事相关的一组文章。 一个或多个集群标识符被分配给语料库中的每个文章,其中一个或多个集群标识符分别标识文章所属的多个第一集群中的一个或多个。 接收到针对文章语料库指定搜索条件的查询。 响应于接收到查询,通过至少从文章的语料库中选择基于搜索条件的一组文章来生成查询的结果。 所选择的一组文章至少基于分配给该组文章中的每个文章的一个或多个集群标识符而被分组成一个或多个第二集群。 在查询的结果中,根据一个或多个第二集群来组织文章集。

    SYSTEM FOR INCREMENTALLY CLUSTERING NEWS STORIES
    2.
    发明申请
    SYSTEM FOR INCREMENTALLY CLUSTERING NEWS STORIES 有权
    增加新闻故事的系统

    公开(公告)号:US20120303623A1

    公开(公告)日:2012-11-29

    申请号:US13117022

    申请日:2011-05-26

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: Disclosed are methods and apparatus for clustering news stories, which are to be presented over a computer network. In general, an incremental clustering system is configured to update a current set of news clusters with newly arrived news articles without having to recompute the clusters for the entire corpus, as well as form new clusters for recently generated news topics. In one embodiment, a plurality of news articles are initially obtained via the computer network, and the news articles are clustered into a plurality of initial clusters. For only news articles, including any unclustered news articles, that are less than a predetermined age limit, it is determined in an incremental clustering process whether to form one or more new clusters or assign to the initial clusters. Indications of the initial clusters and the one or more new clusters, if any, are then stored so as to be accessible for sending a portion of the news articles to users in a clustered format based on the initial clusters and the one or more new clusters, if any.

    摘要翻译: 公开了通过计算机网络呈现的用于聚类新闻故事的方法和装置。 通常,增量聚类系统被配置为利用新到达的新闻文章更新当前的新闻集合集,而不必重新计算整个语料库的集群,并且为最近生成的新闻主题形成新的集群。 在一个实施例中,最初通过计算机网络获得多个新闻文章,并且将新闻文章聚类成多个初始簇。 对于仅包含小于预定年龄限制的新闻文章的消息文章,在增量聚类过程中确定是否形成一个或多个新集群或分配给初始集群。 然后存储初始集群和一个或多个新集群的指示(如果有的话),以便可访问以便基于初始集群和一个或多个新集群向组播格式的用户发送一部分新闻文章 如果有的话。

    System for incrementally clustering news stories
    3.
    发明授权
    System for incrementally clustering news stories 有权
    新闻报道逐渐聚集的系统

    公开(公告)号:US08832105B2

    公开(公告)日:2014-09-09

    申请号:US13117022

    申请日:2011-05-26

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/3071

    摘要: Disclosed are methods and apparatus for clustering news stories, which are to be presented over a computer network. In general, an incremental clustering system is configured to update a current set of news clusters with newly arrived news articles without having to recompute the clusters for the entire corpus, as well as form new clusters for recently generated news topics. In one embodiment, a plurality of news articles are initially obtained via the computer network, and the news articles are clustered into a plurality of initial clusters. For only news articles, including any unclustered news articles, that are less than a predetermined age limit, it is determined in an incremental clustering process whether to form one or more new clusters or assign to the initial clusters. Indications of the initial clusters and the one or more new clusters, if any, are then stored so as to be accessible for sending a portion of the news articles to users in a clustered format based on the initial clusters and the one or more new clusters, if any.

    摘要翻译: 公开了通过计算机网络呈现的用于聚类新闻故事的方法和装置。 通常,增量聚类系统被配置为利用新到达的新闻文章更新当前的新闻集合集,而不必重新计算整个语料库的集群,并且为最近生成的新闻主题形成新的集群。 在一个实施例中,最初通过计算机网络获得多个新闻文章,并且将新闻文章聚类成多个初始簇。 对于仅包含小于预定年龄限制的新闻文章的消息文章,在增量聚类过程中确定是否形成一个或多个新集群或分配给初始集群。 然后存储初始集群和一个或多个新集群的指示(如果有的话),以便可访问以便基于初始集群和一个或多个新集群向组播格式的用户发送一部分新闻文章 如果有的话。

    Method and Apparatus for Utilizing Social Network Information for Showing Reviews
    4.
    发明申请
    Method and Apparatus for Utilizing Social Network Information for Showing Reviews 有权
    使用社交网络信息显示评论的方法和设备

    公开(公告)号:US20090287774A1

    公开(公告)日:2009-11-19

    申请号:US12121593

    申请日:2008-05-15

    IPC分类号: G06F15/16

    摘要: A method and apparatus are provided for utilizing social network information to show reviews written by others. In one example, the method includes uploading at least one review written by an entity other than a particular user; filtering the at least one review according to criteria specified by the particular user; and integrating into one central location reviews written by others, wherein the reviews include the at least one review.

    摘要翻译: 提供了一种用于利用社交网络信息来显示由他人写的评论的方法和装置。 在一个示例中,该方法包括上传由特定用户以外的实体写入的至少一个评论; 根据特定用户指定的标准过滤该至少一个审查; 并将其整合成一个中心位置的评论,其中评论包括至少一个审查。

    Annotating HTML segments with functional labels
    5.
    发明授权
    Annotating HTML segments with functional labels 有权
    用功能标签注释HTML细分

    公开(公告)号:US09594730B2

    公开(公告)日:2017-03-14

    申请号:US12829265

    申请日:2010-07-01

    IPC分类号: G06F17/00 G06F17/22

    CPC分类号: G06F17/2241

    摘要: A method and apparatus is described for assigning functional labels to segments of web pages in an application-independent way. In the approach described herein, one of a generic set functional labels are automatically assigned to each segment of a web page, where the generic functional labels may be topic-independent and application-independent. Applications with different needs can determine which segments of the web page to process based on which functional labels correspond to the types of information needed by each application. Thus, the work of classifying the function of each segment of a web page is separated from the work of selecting which segments satisfy the need of a particular application. The work of classification can be performed in an application-independent way, relieving the burden from every application developer from having to create their own classifiers.

    摘要翻译: 描述了以独立于应用的方式将功能标签分配给网页的段的方法和装置。 在本文描述的方法中,通用集合功能标签之一被自动分配给网页的每个片段,其中通用功能标签可以是主题独立的和与应用无关的。 具有不同需求的应用程序可以根据哪些功能标签对应于每个应用程序所需的信息类型来确定要处理的网页的哪些部分。 因此,将网页的每个段的功能分类的工作与选择哪些段满足特定应用的需要的工作分离。 分类工作可以独立于应用程序执行,减轻每个应用程序开发人员不必创建自己的分类器的负担。

    Method and apparatus for utilizing social network information for showing reviews
    6.
    发明授权
    Method and apparatus for utilizing social network information for showing reviews 有权
    利用社交网络信息显示评论的方法和装置

    公开(公告)号:US08407286B2

    公开(公告)日:2013-03-26

    申请号:US12121593

    申请日:2008-05-15

    IPC分类号: G06F15/16

    摘要: A method and apparatus are provided for utilizing social network information to show reviews written by others. In one example, the method includes uploading at least one review written by an entity other than a particular user; filtering the at least one review according to criteria specified by the particular user; and integrating into one central location reviews written by others, wherein the reviews include the at least one review.

    摘要翻译: 提供了一种用于利用社交网络信息来显示由他人写的评论的方法和装置。 在一个示例中,该方法包括上传由特定用户以外的实体写入的至少一个评论; 根据特定用户指定的标准过滤该至少一个审查; 并将其整合成一个中心位置的评论,其中评论包括至少一个审查。

    Annotating HTML Segments With Functional Labels
    7.
    发明申请
    Annotating HTML Segments With Functional Labels 有权
    使用功能标签注释HTML细分

    公开(公告)号:US20120005686A1

    公开(公告)日:2012-01-05

    申请号:US12829265

    申请日:2010-07-01

    IPC分类号: G06F9/46

    CPC分类号: G06F17/2241

    摘要: A method and apparatus is described for assigning functional labels to segments of web pages in an application-independent way. In the approach described herein, one of a generic set functional labels are automatically assigned to each segment of a web page, where the generic functional labels may be topic-independent and application-independent. Applications with different needs can determine which segments of the web page to process based on which functional labels correspond to the types of information needed by each application. Thus, the work of classifying the function of each segment of a web page is separated from the work of selecting which segments satisfy the need of a particular application. The work of classification can be performed in an application-independent way, relieving the burden from every application developer from having to create their own classifiers.

    摘要翻译: 描述了以独立于应用的方式将功能标签分配给网页的段的方法和装置。 在本文描述的方法中,通用集合功能标签之一被自动分配给网页的每个片段,其中通用功能标签可以是主题独立的和与应用无关的。 具有不同需求的应用程序可以根据哪些功能标签对应于每个应用程序所需的信息类型来确定要处理的网页的哪些部分。 因此,将网页的每个段的功能分类的工作与选择哪些段满足特定应用的需要的工作分离。 分类工作可以独立于应用程序执行,减轻每个应用程序开发人员不必创建自己的分类器的负担。

    Sensitivity Categorization of Web Pages
    8.
    发明申请
    Sensitivity Categorization of Web Pages 有权
    网页灵敏度分类

    公开(公告)号:US20110184817A1

    公开(公告)日:2011-07-28

    申请号:US12696006

    申请日:2010-01-28

    CPC分类号: G06Q30/02 G06Q30/0277

    摘要: Methods, systems, and computer programs for categorizing the sensitivity of web pages are presented. In one method, a space of sensitive pages is identified based on the sensitivity categorization of a first plurality of web pages and a second plurality of web pages. The first plurality of web pages is obtained by performing search queries using known sensitive words, and the second plurality of web pages includes randomly selected web pages. Additionally, the method identifies a third plurality of web pages that includes web pages on or near the boundary between the space of sensitive pages and the space of non-sensitive pages. The space of sensitive pages is then redefined based on the sensitivity categorization of the first, second, and third pluralities of web pages. Once the space of sensitive pages is defined, the method is used to determine that a given web page is sensitive when the given web page is in the space of sensitive pages. Web pages are included in a marketing operation when the web pages are not sensitive.

    摘要翻译: 介绍了分类网页敏感度的方法,系统和计算机程序。 在一种方法中,基于第一多个网页和第二多个网页的灵敏度分类来识别敏感页面的空间。 通过使用已知敏感词执行搜索查询获得第一多个网页,并且第二多个网页包括随机选择的网页。 此外,该方法识别在敏感页面的空间和非敏感页面的空间之间的边界上或附近包括网页的第三多个网页。 然后,基于第一,第二和第三多个网页的灵敏度分类,重新定义敏感页面的空间。 一旦定义了敏感页面的空间,当给定的网页位于敏感页面的空间中时,该方法用于确定给定的网页是否敏感。 当网页不敏感时,网页被包含在营销操作中。

    System for training classifiers in multiple categories through active learning
    10.
    发明授权
    System for training classifiers in multiple categories through active learning 有权
    通过主动学习对多个类别的分类器进行训练的系统

    公开(公告)号:US08498950B2

    公开(公告)日:2013-07-30

    申请号:US12905543

    申请日:2010-10-15

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: A system for training classifiers in multiple categories through an active learning system, including a computer having a memory and a processor, the processor programmed to: train an initial set of m binary one-versus-all classifiers, one for each category in a taxonomy, on a labeled dataset of examples stored in a database coupled with the computer; uniformly sample up to a predetermined large number of examples from a second, larger dataset of unlabeled examples stored in a database coupled with the computer; order the sampled unlabeled examples in order of informativeness for each classifier; determine a minimum subset of the unlabeled examples that are most informative for a maximum number of the classifiers to form an active set for learning; and use editorially-labeled versions of the examples of the active set to re-train the classifiers, thereby improving the accuracy of at least some of the classifiers.

    摘要翻译: 一种用于通过主动学习系统来训练分类器的系统,包括具有存储器和处理器的计算机,该处理器被编程为:训练一组初始的二进制一对全分类器,一个分类中的每个类别 在存储在与计算机耦合的数据库中的示例的标记数据集上; 从存储在与计算机耦合的数据库中的未标记示例的第二较大数据集中均匀地采样到预定的大量示例; 按照每个分类器的信息顺序对采样的未标记的示例进行排序; 确定对最大数量的分类器形成用于学习的活动集合的最有帮助的未标记示例的最小子集; 并使用编辑标签的版本的活动集的示例重新训练分类器,从而提高至少一些分类器的准确性。