Scalable probabilistic latent semantic analysis
    11.
    发明申请
    Scalable probabilistic latent semantic analysis 有权
    可扩展概率潜在语义分析

    公开(公告)号:US20070239431A1

    公开(公告)日:2007-10-11

    申请号:US11392763

    申请日:2006-03-30

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2785

    摘要: A scalable two-pass scalable probabilistic latent semantic analysis (PLSA) methodology is disclosed that may perform more efficiently, and in some cases more accurately, than traditional PLSA, especially where large and/or sparse data sets are provided for analysis. The improved methodology can greatly reduce the storage and/or computational costs of training a PLSA model. In the first pass of the two-pass methodology, objects are clustered into groups, and PLSA is performed on the groups instead of the original individual objects. In the second pass, the conditional probability of a latent class, given an object, is obtained. This may be done by extending the training results of the first pass. During the second pass, the most likely latent classes for each object are identified.

    摘要翻译: 公开了一种可扩展的双向可伸缩概率潜在语义分析(PLSA)方法,其可以比传统的PLSA更有效地执行,在某些情况下可以更准确地执行,特别是在提供大数据集和/或稀疏数据集用于分析的情况下。 改进的方法可以大大降低培训PLSA模型的存储和/或计算成本。 在双路方法的第一遍中,对象被聚集成组,并且PLSA在组而不是原始的单个对象上执行。 在第二遍中,获得给定对象的潜在类的条件概率。 这可以通过扩展第一遍的训练结果来完成。 在第二遍期间,识别每个对象最可能的潜在类。

    User segment suggestion for online advertising
    12.
    发明授权
    User segment suggestion for online advertising 有权
    在线广告的用户分段建议

    公开(公告)号:US07711735B2

    公开(公告)日:2010-05-04

    申请号:US11803503

    申请日:2007-05-15

    IPC分类号: G06F17/30

    CPC分类号: G06Q30/02 G06F17/30867

    摘要: Described is a behavioral targeting technology for online advertising, by which an original attribute is uniformly expanded. Users that meet an original attribute are aggregated into a mid-result used to determine similarity relative to candidate attribute types. The most similar candidate attributes are selected for the expanded attribute. A URL/URL pattern suggestion technology is provided, with similarity computed from users/URLs visited by the users. URLs are separated into URL tree nodes, for calculating the number of users who have visited each URL and the number of users who have visited the URL on a sub-tree whose root is the node. URL/URL patterns are output based on similarity. Domains are also suggested based on user-visits. Similarities between pairs of domains may be computed (e.g., offline), with an output for a given domain provided in based on its similarity with each other domain.

    摘要翻译: 描述了一种用于在线广告的行为定位技术,通过该技术,原始属性被均匀地扩展。 满足原始属性的用户将聚合成中间结果,用于确定与候选属性类型相似度。 为扩展属性选择最相似的候选属性。 提供URL / URL模式建议技术,从用户访问的用户/ URL计算相似度。 URL被分隔成URL树节点,用于计算访问每个URL的用户数和在其根是节点的子树上访问过URL的用户数。 基于相似性输出URL / URL模式。 还可以根据用户访问建议域。 可以基于其与每个其他域的相似性来计算(例如,脱机)对域之间的相似性,其中提供给定域的输出。

    Block tracking mechanism for web personalization
    13.
    发明申请
    Block tracking mechanism for web personalization 有权
    网站个性化的块跟踪机制

    公开(公告)号:US20080281834A1

    公开(公告)日:2008-11-13

    申请号:US11801404

    申请日:2007-05-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30861

    摘要: Described is a technology by which blocks of web pages may be selected, such as for building a user-personalized web page containing selected blocks. A selection mechanism, such as a browser toolbar add-on, provides a user interface for selecting blocks, and records information about selected blocks. A block tracking mechanism (e.g., a daemon program) uses the information to locate selected blocks of the web pages, including when the web page containing the block is updated with respect to content and/or layout. The block tracking mechanism may update a local gadget that when invoked, such as by browsing to a particular web page, which shows updated versions of the block on a personalized web page. Blocks may be efficiently located by processing trees representing web pages into reduced trees, and then by performing a minimum distance mapping algorithm on the reduced trees.

    摘要翻译: 描述了可以选择网页块的技术,诸如用于构建包含所选块的用户个性化网页。 诸如浏览器工具栏附件的选择机制提供用于选择块的用户界面,并且记录关于所选块的信息。 块跟踪机制(例如,守护程序)使用该信息来定位网页的所选块,包括当包含块的网页相对于内容和/或布局被更新时。 块跟踪机制可以更新当调用时​​的本地小工具,诸如通过浏览到特定网页,其显示个性化网页上块的更新版本。 可以通过将表示网页的树处理成缩小的树,然后通过在缩小的树上执行最小距离映射算法来有效地定位块。

    Extracting semantic attributes
    14.
    发明申请
    Extracting semantic attributes 有权
    提取语义属性

    公开(公告)号:US20070239697A1

    公开(公告)日:2007-10-11

    申请号:US11392761

    申请日:2006-03-30

    IPC分类号: G06F17/30

    摘要: Extraction of semantic information and the generation of semantic attributes allows for improved organization and management of data. Semantic attributes are automatically generated and eliminate the need for manual entry of attribute information. A semantic file network may further be constructed based on similarities between files that are based on the semantic attribute information. Semantic links representing a semantic relationship may be built between similar or relevant files. In addition, user operations and user operation patterns may also be considered in building the file network. Semantic attributes and information may further facilitate browsing the file systems as well as improve the accuracy and speed of queries.

    摘要翻译: 语义信息的提取和语义属性的产生可以改善数据的组织和管理。 自动生成语义属性,无需手动输入属性信息。 还可以基于基于语义属性信息的文件之间的相似性来构建语义文件网络。 表示语义关系的语义链接可以在相似或相关文件之间建立。 此外,在构建文件网络时也可以考虑用户操作和用户操作模式。 语义属性和信息可以进一步促进文件系统的浏览以及提高查询的准确性和速度。

    Collaborative filtering using cluster-based smoothing
    15.
    发明申请
    Collaborative filtering using cluster-based smoothing 审中-公开
    使用基于群集的平滑的协同过滤

    公开(公告)号:US20070239553A1

    公开(公告)日:2007-10-11

    申请号:US11377130

    申请日:2006-03-16

    IPC分类号: G06Q30/00

    摘要: In an embodiment, a method of predicting an active user's rating for an item is disclosed. A database of users may be sorted into clusters. The data associated with the users in each cluster may be smoothed to filling in ratings for items that the users have not personally rated. An active user may then be compared to a set of users, where the set may be all or some portion of the database, to determine the K users that are most similar to the active user. The ratings of the K users regarding the item may be used to predict the active user's rating for the item. In an embodiment, the rating of each of the K users is assigned a confidence value associated with whether the user personally rated the item or if the rating was generated by the data smoothing process.

    摘要翻译: 在一个实施例中,公开了一种用于预测项目的活跃用户评级的方法。 可以将用户的数据库分类为群集。 可以平滑与每个群集中的用户相关联的数据,以填充用户未被评估的项目的评级。 然后可以将活动用户与一组用户进行比较,其中该集合可以是数据库的全部或部分,以确定与活动用户最相似的K个用户。 关于该项目的K个用户的评级可以用于预测该项目的活动用户的评级。 在一个实施例中,每个K个用户的评级被分配与用户个人评价该项目相关联的置信度值,或者如果该评级是由数据平滑处理产生的。

    Extracting semantic attributes
    16.
    发明授权
    Extracting semantic attributes 有权
    提取语义属性

    公开(公告)号:US07502785B2

    公开(公告)日:2009-03-10

    申请号:US11392761

    申请日:2006-03-30

    IPC分类号: G06F17/30

    摘要: Extraction of semantic information and the generation of semantic attributes allows for improved organization and management of data. Semantic attributes are automatically generated and eliminate the need for manual entry of attribute information. A semantic file network may further be constructed based on similarities between files that are based on the semantic attribute information. Semantic links representing a semantic relationship may be built between similar or relevant files. In addition, user operations and user operation patterns may also be considered in building the file network. Semantic attributes and information may further facilitate browsing the file systems as well as improve the accuracy and speed of queries.

    摘要翻译: 语义信息的提取和语义属性的产生可以改善数据的组织和管理。 自动生成语义属性,无需手动输入属性信息。 还可以基于基于语义属性信息的文件之间的相似性来构建语义文件网络。 表示语义关系的语义链接可以在相似或相关文件之间建立。 此外,在构建文件网络时也可以考虑用户操作和用户操作模式。 语义属性和信息可以进一步促进文件系统的浏览以及提高查询的准确性和速度。

    Efficient Retrieval Algorithm by Query Term Discrimination
    17.
    发明申请
    Efficient Retrieval Algorithm by Query Term Discrimination 有权
    通过查询词辨别的有效检索算法

    公开(公告)号:US20080215574A1

    公开(公告)日:2008-09-04

    申请号:US12038652

    申请日:2008-02-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675 G06Q10/10

    摘要: An exemplary method for use in information retrieval includes, for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term; receiving a plurality of terms, optionally as a query; ranking the plurality of terms for importance based at least in part on the document sets for the plurality of terms where the ranking comprises using an inverse document frequency algorithm; selecting a number of ranked terms based on importance where each selected, ranked term comprises its corresponding document set wherein each document in a respective document set comprises a document identification number; forming a union set based on the document sets associated with the selected number of ranked terms; and, for a document identification number in the union set, scanning a document set corresponding to an unselected term for a matching document identification number. Various other exemplary systems, methods, devices, etc. are also disclosed.

    摘要翻译: 用于信息检索的示例性方法包括对于多个术语中的每一个,为该术语选择预定数量的最高评分文档以形成用于该术语的对应文档集合; 接收多个术语,可选地作为查询; 至少部分地基于所述多个术语的文档集来排序所述多个重要项,所述术语的排序包括使用逆文档频率算法; 基于重要性选择多个排名项,其中每个所选择的排名项包括其对应的文档集,其中相应文档集中的每个文档包括文档标识号; 基于与选定数量的排名项相关联的文档集合来形成联合集合; 并且对于联合集合中的文档识别号码,扫描与匹配文档识别号码的未选择的术语相对应的文档集。 还公开了各种其它示例性系统,方法,装置等。

    INTERACTIVELY CRAWLING DATA RECORDS ON WEB PAGES
    18.
    发明申请
    INTERACTIVELY CRAWLING DATA RECORDS ON WEB PAGES 失效
    互联网络数据记录在网页上

    公开(公告)号:US20080016087A1

    公开(公告)日:2008-01-17

    申请号:US11456753

    申请日:2006-07-11

    IPC分类号: G06F7/00

    摘要: The invention provides a method of interactively crawling data records on a web page. Users may select various data records of interest on a web page to generate templates to search for similar data items on the same web page or on different web pages. A tree matching algorithm may be used to compare and extract data matching the generated template.

    摘要翻译: 本发明提供了一种在网页上交互地爬行数据记录的方法。 用户可以在网页上选择感兴趣的各种数据记录,以生成在同一网页或不同网页上搜索类似数据项的模板。 可以使用树匹配算法来比较和提取与生成的模板匹配的数据。

    System and method for exploring a semantic file network
    20.
    发明授权
    System and method for exploring a semantic file network 失效
    用于探索语义文件网络的系统和方法

    公开(公告)号:US07624130B2

    公开(公告)日:2009-11-24

    申请号:US11392640

    申请日:2006-03-30

    IPC分类号: G06F17/30

    摘要: Extraction of semantic information and the generation of semantic attributes allows for improved organization and management of data. Semantic attributes are automatically generated and eliminate the need for manual entry of attribute information. A semantic file network may further be constructed based on similarities between files that are based on the semantic attribute information. Semantic links representing a semantic relationship may be built between similar or relevant files. In addition, user operations and user operation patterns may also be considered in building the file network. Semantic attributes and information may further facilitate browsing the file systems as well as improve the accuracy and speed of queries.

    摘要翻译: 语义信息的提取和语义属性的产生可以改善数据的组织和管理。 自动生成语义属性,无需手动输入属性信息。 还可以基于基于语义属性信息的文件之间的相似性来构建语义文件网络。 表示语义关系的语义链接可以在相似或相关文件之间建立。 此外,在构建文件网络时也可以考虑用户操作和用户操作模式。 语义属性和信息可以进一步促进文件系统的浏览以及提高查询的准确性和速度。