Topic distillation via subsite retrieval
    91.
    发明申请
    Topic distillation via subsite retrieval 有权
    主题蒸馏通过子网检索

    公开(公告)号:US20070214116A1

    公开(公告)日:2007-09-13

    申请号:US11375612

    申请日:2006-03-13

    IPC分类号: G06F17/30

    摘要: A method and system for generating a search result for a query of hierarchically organized documents based on retrieval of subtrees that are key resources for topic distillation is provided. The retrieval system may identify documents relevant to a query using conventional searching techniques. The retrieval system then calculates a subtree feature for subtrees that have an identified document as their root. After the retrieval system calculates the subtree feature for the subtrees, the retrieval system may generate a subtree relevance score for each subtree based on its subtree feature. The retrieval system may then order the identified documents based on their corresponding subtree relevances.

    摘要翻译: 提供了一种用于基于检索作为主题蒸馏的关键资源的子树来生成用于分层组织的文档的查询的搜索结果的方法和系统。 检索系统可以使用传统的搜索技术来识别与查询相关的文档。 检索系统然后计算具有识别的文档作为其根的子树的子树特征。 在检索系统计算子树的子树特征之后,检索系统可以基于其子树特征为每个子树生成子树相关性分数。 然后,检索系统可以基于它们相应的子树相关性来排序所识别的文档。

    Calculating web page importance based on a conditional Markov random walk
    92.
    发明申请
    Calculating web page importance based on a conditional Markov random walk 有权
    基于条件马尔科夫随机游走计算网页重要性

    公开(公告)号:US20070214108A1

    公开(公告)日:2007-09-13

    申请号:US11375611

    申请日:2006-03-13

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30882

    摘要: An importance system calculates the importance of pages using a conditional Markov random walk model rather than a conventional Markov random walk model. The importance system calculates the importance of pages factoring in the importance of sites that contain those pages. The importance system may factor in the importance of sites based on the strength of the correlation of the importance of a page to the importance of a site. The strength of the correlation may be based upon the depth of the page within the site. The importance system may iteratively calculate the importance of the pages using “conditional” transition probabilities. During each iteration, the importance system may recalculate the conditional transition probabilities based on the importance of sites that are derived from the recalculated importance of pages during the iteration.

    摘要翻译: 重要性系统使用条件马尔可夫随机游走模型而不是传统的马尔可夫随机游走模型来计算页面的重要性。 重要性系统计算页面因素对包含这些页面的网站重要性的重要性。 重要性系统可以基于网页的重要性与网站重要性的相关性的强度来考虑网站的重要性。 相关性的强度可以基于站点内页面的深度。 重要性系统可以迭代地计算使用“条件”转移概率的页面的重要性。 在每次迭代期间,重要性系统可以基于在迭代期间从页面的重新计算的重要性导出的站点的重要性来重新计算条件转换概率。

    Method and system for generating a classifier using inter-sample relationships

    公开(公告)号:US20060110028A1

    公开(公告)日:2006-05-25

    申请号:US10997073

    申请日:2004-11-23

    IPC分类号: G06K9/62

    CPC分类号: G06K9/00711 G06K9/6292

    摘要: A method and system for generating a classifier to classify sub-objects of an object based on a relationship between sub-objects is provided. The classification system provides training sub-objects along with the actual classification of each training sub-object. The classification system may iteratively train sub-classifiers based on feature vectors representing the features of each sub-object, the actual classification of the sub-object, and a weight associated with the sub-object. After a sub-classifier is trained, the classification system classifies the training sub-objects using the trained sub-classifier. The classification system then adjusts the classifications based on relationships between training sub-objects. The classification system assigns a weight for the sub-classifier and weight for each sub-object based on the accuracy of the adjusted classifications.

    Augmenting a training set for document categorization
    94.
    发明授权
    Augmenting a training set for document categorization 有权
    增加文件分类培训

    公开(公告)号:US09058382B2

    公开(公告)日:2015-06-16

    申请号:US12254798

    申请日:2008-10-20

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method and system for augmenting a training set used to train a classifier of documents is provided. The augmentation system augments a training set with training data derived from features of documents based on a document hierarchy. The training data of the initial training set may be derived from the root documents of the hierarchies of documents. The augmentation system generates additional training data that includes an aggregate feature that represents the overall characteristics of a hierarchy of documents, rather than just the root document. After the training data is generated, the augmentation system augments the initial training set with the newly generated training data.

    摘要翻译: 提供了一种用于增加用于训练文档分类器的训练集的方法和系统。 增强系统使用基于文档层次结构的文档特征从训练数据中增加训练集。 初始训练集的训练数据可以从文档层级的根文档中导出。 增强系统生成额外的培训数据,其中包括表示文档层次结构的整体特征的聚合特征,而不仅仅是根文档。 在产生训练数据之后,增强系统利用新生成的训练数据增加初始训练集。

    ATTRACTIVENESS-BASED ONLINE ADVERTISEMENT CLICK PREDICTION
    95.
    发明申请
    ATTRACTIVENESS-BASED ONLINE ADVERTISEMENT CLICK PREDICTION 审中-公开
    基于吸引力的在线广告点击预测

    公开(公告)号:US20130211905A1

    公开(公告)日:2013-08-15

    申请号:US13372358

    申请日:2012-02-13

    IPC分类号: G06Q30/02

    CPC分类号: G06Q30/0242

    摘要: The probability that a user clicks on an online advertisement may be dependent on an attractiveness of the online advertisement. In determining such click probability, an advertisement attractiveness model for estimating an attractiveness of an online advertisement to users may be developed. A click behavior model is then created by combining the advertisement attractiveness model with a relevance model. The relevance model may be used for estimating relevance between the online advertisement and a search query. The click behavior model may be applied to features extracted from the online advertisement to calculate a click probability for the online advertisement.

    摘要翻译: 用户点击在线广告的概率可能取决于在线广告的吸引力。 在确定这种点击概率时,可以开发用于估计在线广告对用户的吸引力的广告吸引力模型。 然后通过将广告吸引力模型与相关性模型相结合来创建点击行为模型。 相关性模型可用于估计在线广告和搜索查询之间的相关性。 点击行为模型可以应用于从在线广告中提取的特征,以计算在线广告的点击概率。

    Look-ahead document ranking system
    96.
    发明授权

    公开(公告)号:US08484193B2

    公开(公告)日:2013-07-09

    申请号:US12503813

    申请日:2009-07-15

    申请人: Tie-Yan Liu

    发明人: Tie-Yan Liu

    IPC分类号: G06F17/00

    摘要: A method and system is provided for calculating importance of documents based on transition probabilities from a source document to a target document based on looking ahead to information content of target documents of the source document. A look-ahead importance system generates transition probabilities of transitioning between any pair of source and target documents based on analysis of links to target documents of the source document. The system may calculate the transition probabilities based on the number of links on documents a look-ahead distance away. The system then solves for the stationary probabilities of the transition probabilities. The stationary probabilities represent the importance of the documents.

    Search Engine Menu-based Advertising
    97.
    发明申请
    Search Engine Menu-based Advertising 审中-公开
    搜索引擎基于菜单的广告

    公开(公告)号:US20130173398A1

    公开(公告)日:2013-07-04

    申请号:US13340195

    申请日:2011-12-29

    IPC分类号: G06Q30/02

    CPC分类号: G06Q30/0256

    摘要: Implementations for providing menu-based advertising are disclosed. A search engine front-end determines non-search engine information pages that are relevant to the user input based on user input entered into a search query field on a search page. A suggestion menu is caused to be displayed on a search page. The suggestion menu includes interactive elements that are interactive to cause a client device to retrieve the non-search engine information pages associated with the interactive elements. The interactive elements may be advertisements, and the suggestion menu may also be used to display search query suggestions.

    摘要翻译: 公开了提供基于菜单的广告的实现。 搜索引擎前端基于输入到搜索页面上的搜索查询字段中的用户输入来确定与用户输入相关的非搜索引擎信息页面。 导致建议菜单显示在搜索页面上。 建议菜单包括交互式的交互式元素,以使客户端设备检索与交互元素相关联的非搜索引擎信息页面。 交互元素可以是广告,并且建议菜单也可以用于显示搜索查询建议。

    Data caching for distributed execution computing
    98.
    发明授权
    Data caching for distributed execution computing 有权
    用于分布式执行计算的数据缓存

    公开(公告)号:US08229968B2

    公开(公告)日:2012-07-24

    申请号:US12055777

    申请日:2008-03-26

    IPC分类号: G06F7/00 G06F17/30

    摘要: Embodiments for caching and accessing Directed Acyclic Graph (DAG) data to and from a computing device of a DAG distributed execution engine during the processing of an iterative algorithm. In accordance with one embodiment, a method includes processing a first subgraph of the plurality of subgraphs from the distributed storage system in the computing device. The first subgraph being processed with associated input values in the computing device to generate first output values in an iteration. The method further includes storing a second subgraph in a cache of the device. The second subgraph being a duplicate of the first subgraph. Moreover, the method also includes processing the second subgraph with the first output values to generate second output values if the device is to process the first subgraph in each of one or more subsequent iterations.

    摘要翻译: 用于在迭代算法的处理期间向DAG分布式执行引擎的计算设备缓存和访问定向非循环图(DAG)数据的实施例。 根据一个实施例,一种方法包括从计算设备中的分布式存储系统处理多个子图的第一子图。 在计算设备中用相关联的输入值处理第一子图,以在迭代中生成第一输出值。 该方法还包括将第二子图存储在设备的高速缓存中。 第二个子图是第一个子图的副本。 此外,该方法还包括用第一输出值处理第二子图以产生第二输出值,如果该设备要在一个或多个后续迭代中的每一个中处理第一子图。

    Distributed hierarchical text classification framework
    99.
    发明授权
    Distributed hierarchical text classification framework 有权
    分布式层级文本分类框架

    公开(公告)号:US07809723B2

    公开(公告)日:2010-10-05

    申请号:US11464761

    申请日:2006-08-15

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06N99/005

    摘要: A method and system for distributed training of a hierarchical classifier for classifying documents using a classification hierarchy is provided. A training system provides training data that includes the documents and classifications of the documents within the classification hierarchy. The training system distributes the training of the classifiers of the hierarchical classifier to various agents so that the classifiers can be trained in parallel. For each classifier, the training system identifies an agent that is to train the classifier. Each agent then trains its classifiers.

    摘要翻译: 提供了一种用于使用分类层次分类文档的分级分类器的分布式训练的方法和系统。 培训系统提供培训数据,其中包括分类层次结构中文档的文档和分类。 训练系统将分级分类器的分类器的训练分配给各种代理,使得分类器可以并行训练。 对于每个分类器,训练系统识别要训练分类器的代理。 每个代理然后训练其分类器。

    Ranking of web sites by aggregating web page ranks
    100.
    发明授权
    Ranking of web sites by aggregating web page ranks 有权
    通过汇总网页排名网站排名

    公开(公告)号:US07634476B2

    公开(公告)日:2009-12-15

    申请号:US11459869

    申请日:2006-07-25

    IPC分类号: G06F17/00

    摘要: A method and system for determining a ranking of web sites based on an aggregation of rankings of the web pages within the web sites is provided. A ranking system identifies for each web site a stationary distribution of a stochastic complement of the transition probabilities between web pages of the web site. The ranking system then calculates transition probabilities between web sites based on the web page transition probabilities weighted by the stationary distribution of the stochastic complements. The ranking system then calculates the stationary distribution of the transition probabilities of the web sites to represent a ranking of the web sites.

    摘要翻译: 提供了一种用于基于网站中的网页的排列的聚合来确定网站的排名的方法和系统。 排名系统为每个网站识别网站网页之间的转移概率的随机补码的固定分布。 然后,排名系统基于由随机补码的固定分布加权的网页转换概率来计算网站之间的转移概率。 排名系统然后计算网站的转移概率的固定分布,以表示网站的排名。