Identifying items that have experienced recent interest bursts
    1.
    发明申请
    Identifying items that have experienced recent interest bursts 有权
    识别经历最近兴趣爆发的项目

    公开(公告)号:US20080155465A1

    公开(公告)日:2008-06-26

    申请号:US11644848

    申请日:2006-12-21

    IPC分类号: G06F17/30 G06F7/08 G06F3/048

    CPC分类号: G06Q10/063 G06F17/30867

    摘要: Techniques are described for identifying items that have recently undergone an interest burst. Items that have recently undergone an interest burst are identified by comparing how many interest-actions have been performed on the items during a current time window against how many interest-actions have been performed on the items historically. Various tests are performed to rule out candidates that are not likely to be of interest to other users. In addition, various spam detection techniques are described for reducing the possibility that the items that are listed as interest burst items are listed because of spam.

    摘要翻译: 描述了用于识别最近经历了兴趣突发的项目的技术。 最近经历了兴趣突破的项目通过比较在当前时间窗口内对项目执行了多少兴趣操作来反映在历史上对项目执行了多少个兴趣操作来识别。 执行各种测试以排除不太可能对其他用户感兴趣的候选人。 另外,描述了各种垃圾邮件检测技术,用于减少由于垃圾邮件列出列为兴趣突发项目的项目的可能性。

    Identifying items that have experienced recent interest bursts
    2.
    发明授权
    Identifying items that have experienced recent interest bursts 有权
    识别经历最近兴趣爆发的项目

    公开(公告)号:US08046248B2

    公开(公告)日:2011-10-25

    申请号:US11644848

    申请日:2006-12-21

    IPC分类号: G06Q10/00

    CPC分类号: G06Q10/063 G06F17/30867

    摘要: Techniques are described for identifying items that have recently undergone an interest burst. Items that have recently undergone an interest burst are identified by comparing how many interest-actions have been performed on the items during a current time window against how many interest-actions have been performed on the items historically. Various tests are performed to rule out candidates that are not likely to be of interest to other users. In addition, various spam detection techniques are described for reducing the possibility that the items that are listed as interest burst items are listed because of spam.

    摘要翻译: 描述了用于识别最近经历了兴趣突发的项目的技术。 最近经历了兴趣突破的项目通过比较在当前时间窗口内对项目执行了多少兴趣操作来反映在历史上对项目执行了多少个兴趣操作来识别。 执行各种测试以排除不太可能对其他用户感兴趣的候选人。 另外,描述了各种垃圾邮件检测技术,用于减少由于垃圾邮件列出列为兴趣突发项目的项目的可能性。

    Efficient lexical trending topic detection over streams of data using a modified sequitur algorithm
    3.
    发明授权
    Efficient lexical trending topic detection over streams of data using a modified sequitur algorithm 有权
    使用修改的Sequitur算法对数据流进行有效的词汇趋势主题检测

    公开(公告)号:US08838599B2

    公开(公告)日:2014-09-16

    申请号:US12780850

    申请日:2010-05-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: Embodiments are directed towards a Modified Sequitur algorithm (MSA) using pipelining and indexed arrays to identify trending topics within a plurality of documents having user generated content (UGC). The documents are parallelized and distributed across a plurality of network devices, which place at least some of the received documents into a buffer for which the MSA may then be applied to the documents within the buffer to identify n-grams or phrases within the documents' contents. The identified phrases are further analyzed to remove extraneous co-occurrences of phrases, and/or words based on a part of speech analysis. A weighting of the remaining phrases is used to identify trending topic phrases. Links to content in the plurality of UGC documents that is associated with the trending topic phrases may then be displayed to a client device.

    摘要翻译: 实施例针对使用流水线和索引数组来修改具有用户生成内容(UGC)的多个文档内的趋势主题的修改的序列算法(MSA)。 这些文档被并行化并且分布在多个网络设备上,这些网络设备将至少一些接收到的文档放置在缓冲器中,然后可以将MSA应用于缓冲器中的文档,以识别文档中的n个或多个短语, 内容。 进一步分析识别的短语,以消除基于词性分析的短语和/或单词的无关共存。 使用剩余短语的加权来识别趋势主题短语。 然后可以将与趋势主题短语相关联的多个UGC文档中的内容的链接显示给客户端设备。

    System and method for automatically organizing bookmarks through the use of tag data
    4.
    发明授权
    System and method for automatically organizing bookmarks through the use of tag data 有权
    通过使用标签数据自动组织书签的系统和方法

    公开(公告)号:US08010532B2

    公开(公告)日:2011-08-30

    申请号:US11624072

    申请日:2007-01-17

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30884

    摘要: The present invention is directed towards systems and method for organization of bookmarks. The method according to one embodiment comprises retrieving one or more bookmarks associated with one or more content items, a given bookmark generated by a user of a client device and identifying one or more tags associated with one or uniform resource locators corresponding to the or more bookmarks. A bookmark folder hierarchy is created through use of a clustering algorithm on the basis of the one or more tags associated with the one or more uniform resource locators corresponding to the one or more bookmarks.

    摘要翻译: 本发明涉及用于组织书签的系统和方法。 根据一个实施例的方法包括检索与一个或多个内容项目相关联的一个或多个书签,由客户端设备的用户生成的给定书签,以及识别与一个或多个与该书签相对应的一个或多个统一资源定位符相关联的一个或多个标签 。 基于与一个或多个书签相对应的一个或多个统一资源定位符相关联的一个或多个标签,通过使用聚类算法来创建书签文件夹层次结构。

    AUTOMATED SCREEN SCRAPING VIA GRAMMAR INDUCTION
    5.
    发明申请
    AUTOMATED SCREEN SCRAPING VIA GRAMMAR INDUCTION 有权
    自动筛选通过GRAMMAR感应

    公开(公告)号:US20100256974A1

    公开(公告)日:2010-10-07

    申请号:US12417773

    申请日:2009-04-03

    IPC分类号: G06F17/27

    CPC分类号: G06F17/248 G06F17/2715

    摘要: A method and a computer-readable medium are provided which perform screen scraping via grammar induction. The computer-readable medium stores instructions of the method, the instructions directing a computer processor to intercept display information transmitted to a computer-implemented display device representing information stored in a data source; induce a grammar via statistical analysis of the intercepted display information; provide the grammar to a parser-generator to generate a parser corresponding to the induced grammar; and perform screen scraping using the generated parser.

    摘要翻译: 提供了一种通过语法感应来执行屏幕刮擦的方法和计算机可读介质。 所述计算机可读介质存储所述方法的指令,所述指令指示计算机处理器拦截发送到计算机实现的显示设备的显示信息,所述显示信息表示存储在数据源中的信息; 通过对截取的显示信息的统计分析来引发语法; 为解析器生成器提供语法以产生对应于引导语法的解析器; 并使用生成的解析器执行屏幕抓取。

    Automated screen scraping via grammar induction
    6.
    发明授权
    Automated screen scraping via grammar induction 有权
    通过语法感应自动屏幕刮除

    公开(公告)号:US08838625B2

    公开(公告)日:2014-09-16

    申请号:US12417773

    申请日:2009-04-03

    IPC分类号: G06F17/30 G06F17/24 G06F17/27

    CPC分类号: G06F17/248 G06F17/2715

    摘要: A method and a computer-readable medium are provided which perform screen scraping via grammar induction. The computer-readable medium stores instructions of the method, the instructions directing a computer processor to intercept display information transmitted to a computer-implemented display device representing information stored in a data source; induce a grammar via statistical analysis of the intercepted display information; provide the grammar to a parser-generator to generate a parser corresponding to the induced grammar; and perform screen scraping using the generated parser.

    摘要翻译: 提供了一种通过语法感应来执行屏幕刮擦的方法和计算机可读介质。 所述计算机可读介质存储所述方法的指令,所述指令指示计算机处理器拦截发送到计算机实现的显示设备的显示信息,所述显示信息表示存储在数据源中的信息; 通过对截取的显示信息的统计分析来引发语法; 为解析器生成器提供语法以产生对应于引导语法的解析器; 并使用生成的解析器执行屏幕抓取。

    EFFICIENT LEXICAL TRENDING TOPIC DETECTION OVER STREAMS OF DATA USING A MODIFIED SEQUITUR ALGORITHM
    7.
    发明申请
    EFFICIENT LEXICAL TRENDING TOPIC DETECTION OVER STREAMS OF DATA USING A MODIFIED SEQUITUR ALGORITHM 有权
    使用修改的序列算法在数据流上进行有效的LEXICAL TRENDING主题检测

    公开(公告)号:US20110282874A1

    公开(公告)日:2011-11-17

    申请号:US12780850

    申请日:2010-05-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: Embodiments are directed towards a Modified Sequitur algorithm (MSA) using pipelining and indexed arrays to identify trending topics within a plurality of documents having user generated content (UGC). The documents are parallelized and distributed across a plurality of network devices, which place at least some of the received documents into a buffer for which the MSA may then be applied to the documents within the buffer to identify n-grams or phrases within the documents' contents. The identified phrases are further analyzed to remove extraneous co-occurrences of phrases, and/or words based on a part of speech analysis. A weighting of the remaining phrases is used to identify trending topic phrases. Links to content in the plurality of UGC documents that is associated with the trending topic phrases may then be displayed to a client device.

    摘要翻译: 实施例针对使用流水线和索引数组来修改具有用户生成内容(UGC)的多个文档内的趋势主题的修改的序列算法(MSA)。 这些文档被并行化并且分布在多个网络设备上,这些网络设备将至少一些接收到的文档放置在缓冲器中,然后可以将MSA应用于缓冲器中的文档,以识别文档中的n个或多个短语, 内容。 进一步分析识别的短语,以消除基于词性分析的短语和/或单词的无关共存。 使用剩余短语的加权来识别趋势主题短语。 然后可以将与趋势主题短语相关联的多个UGC文档中的内容的链接显示给客户端设备。

    Dynamic bloom filter for caching query results
    8.
    发明授权
    Dynamic bloom filter for caching query results 有权
    动态布局过滤器用于缓存查询结果

    公开(公告)号:US07548908B2

    公开(公告)日:2009-06-16

    申请号:US11475427

    申请日:2006-06-26

    IPC分类号: G06F7/00

    摘要: Methods, systems, and machine-readable media are disclosed for searching a corpus of information by utilizing a Bloom filter for caching query results. According to one aspect of the present invention, a method of caching information from a corpus of information can include populating one or more Bloom filters with a plurality of bits representative of information in the corpus of information. A search request can be received identifying requested information from the corpus of information. One or more bits in the filter(s) associated with the requested information can be checked and the requested information can be retrieved from the corpus of information based on results of said checking. Furthermore, the filter(s) can be used to determine which information to make available to a particular user in a system where certain information is associated with or access is limited to certain users or groups of users.

    摘要翻译: 公开了用于通过利用布隆过滤器来搜索查询结果来搜索信息语料库的方法,系统和机器可读介质。 根据本发明的一个方面,一种从信息语料库缓存信息的方法可以包括用表示信息语料库中的信息的多个比特填充一个或多个布隆过滤器。 可以从信息语料库中识别搜索请求信息。 可以检查与请求的信息相关联的过滤器中的一个或多个位,并且可以基于所述检查的结果从信息语料库检索所请求的信息。 此外,过滤器可以用于确定哪些信息可用于特定用户在某些信息相关联或访问受限于特定用户或用户组的系统中。

    ENTROPY-BASED MIXING AND PERSONALIZATION
    10.
    发明申请
    ENTROPY-BASED MIXING AND PERSONALIZATION 有权
    基于熵的混合和个性化

    公开(公告)号:US20110010371A1

    公开(公告)日:2011-01-13

    申请号:US12499040

    申请日:2009-07-07

    IPC分类号: G06F17/30 G06Q99/00

    摘要: Techniques are provided for selecting a diverse mix of content items that may be displayed to a user. Content items such as user-generated events are received from a variety of sources. One or more content items are added to a set of content items based on a diversity of characteristics. The diversity of characteristics for the one or more content items may be calculated by measuring a diversity of characteristics of the set as if the one or more content items were added to the set. Content items that produce a greater diversity are selected for addition to the set. The set is displayed to the user, who is provided with a more meaningful mix of content due to the greater diversity in content.

    摘要翻译: 提供了用于选择可以向用户显示的内容项目的不同组合的技术。 从各种来源接收诸如用户生成的事件的内容项目。 基于特征的多样性将一个或多个内容项目添加到一组内容项目中。 可以通过测量集合的特征的多样性来计算一个或多个内容项目的特征的多样性,就好像一个或多个内容项目被添加到组中一样。 选择产生更大分集度的内容项,以添加到集合中。 该集合向用户显示,由于内容的多样性,谁被提供了更有意义的内容组合。