DETERMINING RELEVANCE OF A TERM TO CONTENT USING A COMBINED MODEL
    1.
    发明申请
    DETERMINING RELEVANCE OF A TERM TO CONTENT USING A COMBINED MODEL 审中-公开
    使用组合模型确定期限与内容的相关性

    公开(公告)号:US20080103886A1

    公开(公告)日:2008-05-01

    申请号:US11553897

    申请日:2006-10-27

    IPC分类号: G06Q30/00

    摘要: A method and system for generating and using a combined model to identify whether a bid term is relevant to an advertisement is provided. A relevance system trains a combined model that includes an initial model and a decision tree model that are trained using features that represent relationships between bid terms and advertisements. The relevance system trains the initial model to map initial model features to a modeled relevance. The relevance system trains the decision tree model to map the decision tree features and the modeled relevance to a final relevance. The trained initial model and decision tree model represent the combined model. The relevance system then uses the combined model to determine the relevance of bid terms to advertisements.

    摘要翻译: 提供了一种用于生成和使用组合模型以识别出价项是否与广告相关的方法和系统。 相关系统训练包括初始模型和决策树模型的组合模型,该模型使用表示投标条款和广告之间关系的特征来训练。 相关系统训练初始模型以将初始模型特征映射到建模相关性。 相关系统训练决策树模型,将决策树特征和建模相关性映射到最终相关性。 训练初始模型和决策树模型代表组合模型。 相关系统然后使用组合模型来确定投标条款与广告的相关性。

    Scalable probabilistic latent semantic analysis
    2.
    发明授权
    Scalable probabilistic latent semantic analysis 有权
    可扩展概率潜在语义分析

    公开(公告)号:US07844449B2

    公开(公告)日:2010-11-30

    申请号:US11392763

    申请日:2006-03-30

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2785

    摘要: A scalable two-pass scalable probabilistic latent semantic analysis (PLSA) methodology is disclosed that may perform more efficiently, and in some cases more accurately, than traditional PLSA, especially where large and/or sparse data sets are provided for analysis. The improved methodology can greatly reduce the storage and/or computational costs of training a PLSA model. In the first pass of the two-pass methodology, objects are clustered into groups, and PLSA is performed on the groups instead of the original individual objects. In the second pass, the conditional probability of a latent class, given an object, is obtained. This may be done by extending the training results of the first pass. During the second pass, the most likely latent classes for each object are identified.

    摘要翻译: 公开了一种可扩展的双向可伸缩概率潜在语义分析(PLSA)方法,其可以比传统的PLSA更有效地执行,在某些情况下可以更准确地执行,特别是在提供大型和/或稀疏数据集用于分析的情况下。 改进的方法可以大大降低培训PLSA模型的存储和/或计算成本。 在双路方法的第一遍中,对象被聚集成组,并且PLSA在组而不是原始的单个对象上执行。 在第二遍中,获得给定对象的潜在类的条件概率。 这可以通过扩展第一遍的训练结果来完成。 在第二遍期间,识别每个对象最可能的潜在类。

    IDENTIFYING INFLUENTIAL PERSONS IN A SOCIAL NETWORK
    3.
    发明申请
    IDENTIFYING INFLUENTIAL PERSONS IN A SOCIAL NETWORK 有权
    在社会网络中识别受影响人

    公开(公告)号:US20080070209A1

    公开(公告)日:2008-03-20

    申请号:US11533742

    申请日:2006-09-20

    IPC分类号: G09B19/00

    CPC分类号: G06Q30/02 G06Q10/10

    摘要: An influential persons identification system and method for identifying a set of influential persons (or influencers) in a social network (such as an online social network). The influential persons set is generated such that by sending a message to the set the message will be propagated through the network at the greatest speed and coverage. A ranking of users is generated, and a pruning process is performed starting with the top-ranked user and working down the list. For each user on the list, the user is identified as an influencer and then the user and each of his friends are deleted from the social network users list. Next, the same process is performed for the second-ranked user, the third-ranked user, and so forth. The process terminates when the list of users of the social network is exhausted or the desired number of influencers on the influential person set is reached.

    摘要翻译: 在社交网络(如在线社交网络)中识别一组有影响力的人(或影响者)的有影响力的人员识别系统和方法。 产生有影响力的人员,通过发送消息给消息集,消息将以最大的速度和覆盖率通过网络传播。 生成用户排名,并从顶级用户开始执行修剪过程,并在列表中执行操作。 对于列表中的每个用户,用户被识别为影响者,然后从社交网络用户列表中删除用户和他的每个朋友。 接下来,对于第二等级的用户,第三等级的用户等执行相同的处理。 当社交网络的用户列表用完或者达到期望数量的有影响力的人集合的影响者时,该过程终止。

    Key phrase navigation map for document navigation
    4.
    发明申请
    Key phrase navigation map for document navigation 失效
    关键短语导航地图文件导航

    公开(公告)号:US20070219945A1

    公开(公告)日:2007-09-20

    申请号:US11372365

    申请日:2006-03-09

    IPC分类号: G06F17/30 G06F3/048

    摘要: Computer-readable media having computer-executable instructions and apparatuses provide a keyphrase navigation map (KNM) for a document page. Keyphrases are extracted from the document page. Keyphrase clusters are subsequently formed by a measure of relevancy, and a salient keyphrase is determined for each cluster. A thumbnail is formed with tags corresponding to the salient keyphrases. A selected tag is expanded with associated keyphrases. An associated keyphrase may be further selected in order to facilitate the navigation of the document page. The displayed tags on the thumbnail are positioned in accordance with locations of associated keyphrases in the document page.

    摘要翻译: 具有计算机可执行指令和装置的计算机可读介质为文档页面提供关键词导航映射(KNM)。 从文档页面提取关键短语。 随后通过相关性的量度形成关键词组,并且为每个簇确定显着的关键短语。 使用与突出关键短语相对应的标签形成缩略图。 所选标签用相关的关键短语展开。 可以进一步选择相关联的关键短语,以便于文档页面的导航。 缩略图上显示的标签根据文档页面中相关联的关键短语的位置进行定位。

    Text classification by weighted proximal support vector machine based on positive and negative sample sizes and weights
    5.
    发明授权
    Text classification by weighted proximal support vector machine based on positive and negative sample sizes and weights 有权
    基于正,负样本大小和权重的加权近端支持向量机进行文本分类

    公开(公告)号:US07707129B2

    公开(公告)日:2010-04-27

    申请号:US11384889

    申请日:2006-03-20

    IPC分类号: G06F15/18 G06E1/00 G06E3/00

    CPC分类号: G06F17/30707 G06K9/6269

    摘要: Embodiments of the invention relate to improvements to the support vector machine (SVM) classification model. When text data is significantly unbalanced (i.e., positive and negative labeled data are in disproportion), the classification quality of standard SVM deteriorates. Embodiments of the invention are directed to a weighted proximal SVM (WPSVM) model that achieves substantially the same accuracy as the traditional SVM model while requiring significantly less computational time. A weighted proximal SVM (WPSVM) model in accordance with embodiments of the invention may include a weight for each training error and a method for estimating the weights, which automatically solves the unbalanced data problem. And, instead of solving the optimization problem via the KKT (Karush-Kuhn-Tucker) conditions and the Sherman-Morrison-Woodbury formula, embodiments of the invention use an iterative algorithm to solve an unconstrained optimization problem, which makes WPSVM suitable for classifying relatively high dimensional data.

    摘要翻译: 本发明的实施例涉及对支持向量机(SVM)分类模型的改进。 当文本数据显着不平衡(即正负标签数据不成比例)时,标准SVM的分类质量恶化。 本发明的实施例涉及一种加权近端SVM(WPSVM)模型,其实现与传统SVM模型基本相同的精度,同时需要显着更少的计算时间。 根据本发明的实施例的加权近端SVM(WPSVM)模型可以包括每个训练误差的权重和用于估计权重的方法,其自动地解决不平衡数据问题。 而且,不是通过KKT(Karush-Kuhn-Tucker)条件和Sherman-Morrison-Woodbury公式来解决优化问题,而是本发明的实施例使用迭代算法来解决无约束优化问题,这使得WPSVM适合于相对分类 高维数据。

    Text classification by weighted proximal support vector machine
    6.
    发明申请
    Text classification by weighted proximal support vector machine 有权
    通过加权近端支持向量机进行文本分类

    公开(公告)号:US20070239638A1

    公开(公告)日:2007-10-11

    申请号:US11384889

    申请日:2006-03-20

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707 G06K9/6269

    摘要: Embodiments of the invention relate to improvements to the support vector machine (SVM) classification model. When text data is significantly unbalanced (i.e., positive and negative labeled data are in disproportion), the classification quality of standard SVM deteriorates. Embodiments of the invention are directed to a weighted proximal SVM (WPSVM) model that achieves substantially the same accuracy as the traditional SVM model while requiring significantly less computational time. A weighted proximal SVM (WPSVM) model in accordance with embodiments of the invention may include a weight for each training error and a method for estimating the weights, which automatically solves the unbalanced data problem. And, instead of solving the optimization problem via the KKT (Karush-Kuhn-Tucker) conditions and the Sherman-Morrison-Woodbury formula, embodiments of the invention use an iterative algorithm to solve an unconstrained optimization problem, which makes WPSVM suitable for classifying relatively high dimensional data.

    摘要翻译: 本发明的实施例涉及对支持向量机(SVM)分类模型的改进。 当文本数据显着不平衡(即正负标签数据不成比例)时,标准SVM的分类质量恶化。 本发明的实施例涉及一种加权近端SVM(WPSVM)模型,其实现与传统SVM模型基本相同的精度,同时需要显着更少的计算时间。 根据本发明的实施例的加权近端SVM(WPSVM)模型可以包括每个训练误差的权重以及用于估计权重的方法,其自动地解决不平衡数据问题。 而且,不是通过KKT(Karush-Kuhn-Tucker)条件和Sherman-Morrison-Woodbury公式来解决优化问题,而是本发明的实施例使用迭代算法来解决无约束优化问题,这使得WPSVM适合于相对分类 高维数据。

    Collaborative filtering using cluster-based smoothing
    7.
    发明申请
    Collaborative filtering using cluster-based smoothing 审中-公开
    使用基于群集的平滑的协同过滤

    公开(公告)号:US20070239553A1

    公开(公告)日:2007-10-11

    申请号:US11377130

    申请日:2006-03-16

    IPC分类号: G06Q30/00

    摘要: In an embodiment, a method of predicting an active user's rating for an item is disclosed. A database of users may be sorted into clusters. The data associated with the users in each cluster may be smoothed to filling in ratings for items that the users have not personally rated. An active user may then be compared to a set of users, where the set may be all or some portion of the database, to determine the K users that are most similar to the active user. The ratings of the K users regarding the item may be used to predict the active user's rating for the item. In an embodiment, the rating of each of the K users is assigned a confidence value associated with whether the user personally rated the item or if the rating was generated by the data smoothing process.

    摘要翻译: 在一个实施例中,公开了一种用于预测项目的活跃用户评级的方法。 可以将用户的数据库分类为群集。 可以平滑与每个群集中的用户相关联的数据,以填充用户未被评估的项目的评级。 然后可以将活动用户与一组用户进行比较,其中该集合可以是数据库的全部或部分,以确定与活动用户最相似的K个用户。 关于该项目的K个用户的评级可以用于预测该项目的活动用户的评级。 在一个实施例中,每个K个用户的评级被分配与用户个人评价该项目相关联的置信度值,或者如果该评级是由数据平滑处理产生的。

    Document characterization using a tensor space model
    8.
    发明授权
    Document characterization using a tensor space model 失效
    文档表征使用张量空间模型

    公开(公告)号:US07529719B2

    公开(公告)日:2009-05-05

    申请号:US11378095

    申请日:2006-03-17

    IPC分类号: G06N5/00

    CPC分类号: G06N5/02 G06F17/30705

    摘要: Computer-readable media having computer-executable instructions and apparatuses categorize documents or corpus of documents. A Tensor Space Model (TSM), which models the text by a higher-order tensor, represents a document or a corpus of documents. Supported by techniques of multilinear algebra, TSM provides a framework for analyzing the multifactor structures. TSM is further supported by operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for a reduction of the dimensions of the higher-order tensor. The dimensionally reduced tensor is compared with tensors that represent possible categories. Consequently, a category is selected for the document or corpus of documents. Experimental results on the dataset for 20 Newsgroups suggest that TSM is advantageous to a Vector Space Model (VSM) for text classification.

    摘要翻译: 具有计算机可执行指令和设备的计算机可读介质将文档或语料库分类。 张量空间模型(TSM),其通过高阶张量对文本进行建模,表示文档或文档语料库。 由多线代数技术支持,TSM为多因素结构分析提供了框架。 TSM还受到操作和提出的工具的支持,例如用于降低高阶张量尺寸的高阶奇异值分解(HOSVD)。 将尺寸减小的张量与表示可能类别的张量进行比较。 因此,文档或文档的语料库选择一个类别。 20个新闻组的数据集的实验结果表明,TSM对于文本分类的向量空间模型(VSM)是有利的。

    INTERACTIVELY CRAWLING DATA RECORDS ON WEB PAGES
    9.
    发明申请
    INTERACTIVELY CRAWLING DATA RECORDS ON WEB PAGES 失效
    互联网络数据记录在网页上

    公开(公告)号:US20080016087A1

    公开(公告)日:2008-01-17

    申请号:US11456753

    申请日:2006-07-11

    IPC分类号: G06F7/00

    摘要: The invention provides a method of interactively crawling data records on a web page. Users may select various data records of interest on a web page to generate templates to search for similar data items on the same web page or on different web pages. A tree matching algorithm may be used to compare and extract data matching the generated template.

    摘要翻译: 本发明提供了一种在网页上交互地爬行数据记录的方法。 用户可以在网页上选择感兴趣的各种数据记录,以生成在同一网页或不同网页上搜索类似数据项的模板。 可以使用树匹配算法来比较和提取与生成的模板匹配的数据。

    Identifying influential persons in a social network
    10.
    发明授权
    Identifying influential persons in a social network 有权
    识别社会网络中有影响力的人物

    公开(公告)号:US08359276B2

    公开(公告)日:2013-01-22

    申请号:US11533742

    申请日:2006-09-20

    IPC分类号: G06Q99/00

    CPC分类号: G06Q30/02 G06Q10/10

    摘要: An influential persons identification system and method for identifying a set of influential persons (or influencers) in a social network (such as an online social network). The influential persons set is generated such that by sending a message to the set the message will be propagated through the network at the greatest speed and coverage. A ranking of users is generated, and a pruning process is performed starting with the top-ranked user and working down the list. For each user on the list, the user is identified as an influencer and then the user and each of his friends are deleted from the social network users list. Next, the same process is performed for the second-ranked user, the third-ranked user, and so forth. The process terminates when the list of users of the social network is exhausted or the desired number of influencers on the influential person set is reached.

    摘要翻译: 在社交网络(如在线社交网络)中识别一组有影响力的人(或影响者)的有影响力的人员识别系统和方法。 产生有影响力的人员,通过发送消息给消息集,消息将以最大的速度和覆盖率通过网络传播。 生成用户排名,并从顶级用户开始执行修剪过程,并在列表中执行操作。 对于列表中的每个用户,用户被识别为影响者,然后从社交网络用户列表中删除用户和他的每个朋友。 接下来,对于第二等级的用户,第三等级的用户等执行相同的处理。 当社交网络的用户列表用完或者达到期望数量的有影响力的人集合的影响者时,该过程终止。