Collaborative team crawling:Large scale information gathering over the internet
    1.
    发明授权
    Collaborative team crawling:Large scale information gathering over the internet 失效
    协作小组爬行:通过互联网收集大量信息

    公开(公告)号:US06182085B2

    公开(公告)日:2001-01-30

    申请号:US09086379

    申请日:1998-05-28

    IPC分类号: G06F1730

    摘要: A distributed collection of web-crawlers to gather information over a large portion of the cyberspace. These crawlers share the overall crawling through a cyberspace partition scheme. They also collaborate with each other through load balancing to maximally utilize the computing resources of each of the crawlers. The invention takes advantage of the hierarchical nature of the cyberspace namespace and uses the syntactic components of the URL structure as the main vehicle for dividing and assigning crawling workload to individual crawler. The partition scheme is completely distributed in which each crawler makes the partitioning decision based on its own crawling status and a globally replicated partition tree data structure.

    摘要翻译: 分布式的网络爬虫收集器,用于在大量网络空间中收集信息。 这些爬虫共享通过网络空间分区方案的整体爬网。 他们还通过负载平衡相互协作,最大限度地利用每个爬虫的计算资源。 本发明利用了网络空间命名空间的层次性,并将URL结构的句法组件作为将抓取工作量分配给各个爬虫的主要工具。 分区方案是完全分布的,其中每个爬行器根据其自身的爬行状态和全局复制的分区树数据结构进行分区决定。

    Outbound information analysis for generating user interest profiles and improving user productivity
    2.
    发明授权
    Outbound information analysis for generating user interest profiles and improving user productivity 失效
    出站信息分析,用于生成用户兴趣配置文件并提高用户生产力

    公开(公告)号:US06654735B1

    公开(公告)日:2003-11-25

    申请号:US09227225

    申请日:1999-01-08

    IPC分类号: G06F1730

    摘要: A system for automatically generating user interest profiles and delivering information to users learns a user's interests by monitoring the user's outbound communication streams, i.e., the information that the user produces either by typing (e.g., while a user is composing an e-mail message or editing a word processor document) or by speaking (e.g., while a user is engaged in a phone conversation or listening to a lecture). The system uses the monitored text to build (and possibly update) a user interest profile. The profile is constructed from current text generated by the user, so that the retrieved information reflects present user interests. In addition, the profile may also retain past user interests, so that the profile reflects a combination of past and present user interests. The system then automatically queries diverse databases for information relevant to the interest profile. The databases may include internet web pages, files stored on the user's local network, and other local or remote data repositories. The queries may use a combination of internet search engines, the specific selection of which may depend upon the nature and/or content of the queries. The information retrieved in response to the queries is then presented to the user. The retrieved information may contain, for example, answers to questions that the user might ask and/or data related to the user's current and continuing interests. Because a user's current speech or typed text is highly correlated with the user's current interests, the retrieved information will be relevant to the user's actual interests. The communication stream monitoring, interest profile building, data base querying, and presentation of retrieved information are all performed automatically, in real time, and in the background of current user activities.

    摘要翻译: 用于自动生成用户兴趣简档并向用户传递信息的系统通过监视用户的出站通信流来学习用户的兴趣,即用户通过键入产生的信息(例如,当用户正在撰写电子邮件消息或 编辑文字处理器文档)或通过说话(例如,当用户进行电话交谈或听讲座时)。 系统使用受监控的文本构建(并可能更新)用户兴趣简档。 该配置文件由用户生成的当前文本构建,以便所检索的信息反映了用户的兴趣。 此外,简档也可以保留过去的用户兴趣,使得简档反映了过去和现在的用户兴趣的组合。 然后,系统会自动查询不同的数据库以获取与兴趣资料相关的信息。 数据库可以包括互联网网页,存储在用户的本地网络上的文件以及其他本地或远程数据存储库。 查询可以使用互联网搜索引擎的组合,其特定选择可以取决于查询的性质和/或内容。 然后将响应于查询检索的信息呈现给用户。 检索到的信息可以包含例如用户可能询问的问题的答案和/或与用户当前和持续兴趣相关的数据。 由于用户当前的语音或类型的文本与用户当前的兴趣高度相关,所检索的信息将与用户的实际兴趣相关。 通信流监控,兴趣信息构建,数据库查询和检索信息的呈现都是在当前用户活动的背景下实时自动执行的。

    Automatic user interest profile generation from structured document access information
    3.
    发明授权
    Automatic user interest profile generation from structured document access information 有权
    从结构化文档访问信息生成自动用户兴趣简档

    公开(公告)号:US06385619B1

    公开(公告)日:2002-05-07

    申请号:US09227117

    申请日:1999-01-08

    IPC分类号: G06F1730

    摘要: A system generates user interest profiles by monitoring and analyzing a user's access to a variety of hierarchical levels within a set of structured documents, e.g., documents available at a web site. Each information document has parts associated with it and the documents are classified into categories using a known taxonomy. The user interest profiles are automatically generated based on the type of content viewed by the user. The type of content is determined by the text within the parts of the documents viewed and the classifications of the documents viewed. In addition, the profiles also are generated based on other factors including the frequency and currency of visits to documents having a given classification, and/or the hierarchical depth of the levels or parts of the documents viewed. User profiles include an interest category code and an interest score to indicate a level of interest in a particular category. The profiles are updated automatically to accurately reflect the current interests of an individual, as well as past interests. A time-dependent decay factor is applied to the past interests. The system presents to the user documents or references to documents that match the current profile.

    摘要翻译: 系统通过监视和分析用户对一组结构化文档(例如,在网站上可获得的文档)中的各种层级的访问来生成用户兴趣简档。 每个信息文档都具有与之相关的部分,并且使用已知分类法将文档分类为类别。 基于用户观看的内容类型,自动生成用户兴趣简档。 内容的类型由所查阅文档的部分内容和查看的文档的分类决定。 另外,还可以基于包括对具有给定分类的文档的访问的频率和货币的其他因素和/或所查看的文档的级别或部分的分级深度来生成简档。 用户资料包括兴趣类别代码和利益分数,以指示特定类别的兴趣水平。 配置文件将自动更新,以准确反映个人的当前利益以及过去的兴趣。 时间依赖衰变因子适用于过去的兴趣。 系统向用户提供与当前配置文件匹配的文档或文档。

    Method and apparatus for parallel profile matching in a large scale webcasting system
    4.
    发明授权
    Method and apparatus for parallel profile matching in a large scale webcasting system 失效
    用于大规模网络广播系统中并行配置匹配的方法和装置

    公开(公告)号:US06169989A

    公开(公告)日:2001-01-02

    申请号:US09082747

    申请日:1998-05-21

    IPC分类号: G06F1700

    摘要: A method and apparatus for efficiently matching a large collection of user profiles against a large volume of data in a webcasting system. The invention generally includes in one embodiment four steps to parallelize the profiles. First, an initial profile set is partitioned into several subsets also referred to as sub-partitions using various heuristic methods. Second, each sub-partition is mapped onto one or more independent processing units. Each processing unit is not required to have equal processing performance. However, for best performance results, subset data should be mapped in one embodiment where the subset with a highest cost is mapped to a fastest processor, and the next highest cost subset mapped to the next fastest processor. Where appropriate, the invention evaluates the relative subset processing speed of each processor and adjusts future subset mapping based upon these evaluations. For each information item I that needs to be matched with a profile predicate, a third and a fourth step are executed. The third step broadcasts I to all processing units, and a fourth step performs a sequential profile match on I.

    摘要翻译: 一种用于在网络广播系统中有效地匹配大量用户简档与大量数据的方法和装置。 本发明通常在一个实施例中包括四个步骤来并行化轮廓。 首先,使用各种启发式方法将初始配置文件集划分为几个也称为子分区的子集。 第二,每个子分区映射到一个或多个独立的处理单元。 每个处理单元不需要具有相同的处理性能。 然而,为获得最佳性能结果,应在一个实施例中映射子集数据,其中具有最高成本的子集被映射到最快的处理器,并且将下一个最高成本子集映射到下一个最快的处理器。 在适当的情况下,本发明评估每个处理器的相对子集处理速度,并根据这些评估调整未来的子集映射。 对于需要与配置文件谓词匹配的每个信息项I,执行第三和第四步骤。 第三步将I广播到所有处理单元,第四步对I执行顺序配置文件匹配。

    Efficient large-scale access control for internet/intranet information systems
    5.
    发明授权
    Efficient large-scale access control for internet/intranet information systems 失效
    有效的大规模访问控制互联网/内部网信息系统

    公开(公告)号:US06219667B1

    公开(公告)日:2001-04-17

    申请号:US09086272

    申请日:1998-05-28

    申请人: Qi Lu Shang-Hua Teng

    发明人: Qi Lu Shang-Hua Teng

    IPC分类号: G06F1700

    摘要: An efficient method and apparatus for regulating access to information objects stored in a database in which there are a large number of users and access groups. The invention uses a representation of a hierarchical access group structure in terms of intervals over a set of integers and a decomposition scheme that reduces any group structure to ones that have interval representation. This representation allows the problem for checking access rights to be reduced to an interval containment problem. An interval tree, a popular data structure in computational geometry, may be implemented to efficiently execute the access-right checking method.

    摘要翻译: 一种有效的方法和装置,用于调节对存储在数据库中的信息对象的访问,其中存在大量的用户和访问组。 本发明使用对一组整数的间隔的分级存取组结构的表示,以及将任何组结构减小为具有间隔表示的组的结构的分解方案。 该表示允许将访问权限检查的问题减少到间隔容纳问题。 可以实现间隔树,即计算几何中的流行数据结构,以有效地执行访问权限检查方法。

    Graphical user interface to query music by examples
    6.
    发明授权
    Graphical user interface to query music by examples 失效
    图形用户界面,通过示例查询音乐

    公开(公告)号:US06674452B1

    公开(公告)日:2004-01-06

    申请号:US09543218

    申请日:2000-04-05

    IPC分类号: G06F1730

    摘要: A According to the invention, a music search system includes a music player, music analyzer, a search engine and a sophisticated user interface that enables users to visually build complex query profiles from the structural information of one or more musical pieces. The complex query profiles are useful for performing searches for musical pieces matching the structural information in the query profile. The system allows the user to supply an existing piece of music, or some components thereof, as query arguments, and lets the music search engine find music that is similar to the given sample by certain similarity measurement.

    摘要翻译: 根据本发明,音乐搜索系统包括音乐播放器,音乐分析器,搜索引擎和复杂的用户界面,使得用户能够从一个或多个音乐作品的结构信息可视地构建复杂的查询配置文件。 复杂查询配置文件对于执行与查询配置文件中的结构信息匹配的音乐作品的搜索很有用。 该系统允许用户提供现有的音乐片段或其一些组件作为查询参数,并且使得音乐搜索引擎通过某种相似性度量来找到与给定样本类似的音乐。

    Local computation of rank contributions
    7.
    发明授权
    Local computation of rank contributions 有权
    地方计算等级的贡献

    公开(公告)号:US08438189B2

    公开(公告)日:2013-05-07

    申请号:US12124239

    申请日:2008-05-21

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: The claimed subject matter relates to an architecture that can identify, store, and/or output local contributions to a rank of a vertex in a directed graph. The architecture can receive a directed graph and a parameter, and examine a local subset of vertices (e.g., local to a given vertex) in order to determine a local supporting set. The local supporting set can include a local set of vertices that each contributes a minimum fraction of the parameter to a rank of the vertex. The local supporting set can be the basis for an estimate of the supporting set and/or rank of the vertex for the entire graph and can be employed as a means for detecting link or web spam as well as other influence-based social network applications.

    摘要翻译: 所要求保护的主题涉及可以识别,存储和/或输出有向图中的顶点等级的局部贡献的架构。 架构可以接收有向图和参数,并检查顶点的局部子集(例如,给定顶点的局部),以便确定本地支持集。 本地支持集可以包括一组局部顶点,每个顶点将参数的最小部分贡献给顶点的等级。 本地支持集可以作为对整个图的顶点的支持集和/或等级的估计的基础,并且可以用作检测链接或网络垃圾邮件以及其他基于影响的社交网络应用的手段。

    Generating local addresses and communication sets for data-parallel
programs
    10.
    发明授权
    Generating local addresses and communication sets for data-parallel programs 失效
    生成数据并行程序的本地地址和通讯组

    公开(公告)号:US5450313A

    公开(公告)日:1995-09-12

    申请号:US217404

    申请日:1994-03-24

    IPC分类号: G06F9/45 G06F15/16

    CPC分类号: G06F8/447 G06F8/45

    摘要: An optimizing compilation process generates executable code which defines the computation and communication actions that are to be taken by each individual processor of a computer having a distributed memory, parallel processor architecture to run a program written in a data-parallel language. To this end, local memory layouts of the one-dimensional and multidimensional arrays that are used in the program are derived from one-level and two-level data mappings consisting of alignment and distribution, so that array elements are laid out in canonical order and local memory space is conserved. Executable code then is generated to produce at program run time, a set of tables for each individual processor for each computation requiring access to a regular section of an array, so that the entries of these tables specify the spacing between successive elements of said regular section resident in the local memory of said processor, and so that all the elements of said regular section can be located in a single pass through local memory using said tables. Further executable code is generated to produce at program run time, another set of tables for each individual processor for each communication action requiring a given processor to transfer array data to another processor, so that the entries of these tables specify the identity of a destination processor to which the array data must be transferred and the location in said destination processor's local memory at which the array data must be stored, and so that all of said array data can be located in a single pass through local memory using these communication tables. And, executable node code is generated for each individual processor that uses the foregoing tables at program run time to perform the necessary computation and communication actions on each individual processor of the parallel computer.

    摘要翻译: 优化编译过程产生可执行代码,其定义将由具有分布式存储器的计算机的每个单独处理器采取的计算和通信动作,并行处理器架构来运行以数据并行语言编写的程序。 为此,在程序中使用的一维和多维数组的本地存储器布局是从由对齐和分布组成的一级和两级数据映射导出的,因此数组元素以规范顺序排列, 本地存储空间是保守的。 然后生成可执行代码以在程序运行时产生用于每个单独处理器的一组表,用于需要访问数组的常规部分的每个计算,使得这些表的条目指定所述常规部分的连续元素之间的间隔 驻留在所述处理器的本地存储器中,并且使得所述常规部分的所有元素可以位于通过使用所述表的本地存储器的单次传递中。 生成进一步的可执行代码以在程序运行时产生用于每个单独处理器的另一组表,用于每个通信动作,要求给定处理器将阵列数据传送到另一个处理器,以便这些表的条目指定目标处理器的标识 必须传送数组数据以及必须存储阵列数据的所述目的地处理器的本地存储器中的位置,并且使得所有所述阵列数据可以位于通过使用这些通信表的本地存储器的单次传递中。 并且,对于在程序运行时使用上述表的每个单独处理器生成可执行节点代码,以在并行计算机的每个单独处理器上执行必要的计算和通信动作。