Method and apparatus for finding mirrored hosts by analyzing connectivity and IP addresses
    1.
    发明授权
    Method and apparatus for finding mirrored hosts by analyzing connectivity and IP addresses 有权
    通过分析连接和IP地址查找镜像主机的方法和设备

    公开(公告)号:US06487555B1

    公开(公告)日:2002-11-26

    申请号:US09307153

    申请日:1999-05-07

    IPC分类号: G06F1730

    CPC分类号: G06F17/30864

    摘要: A method and system that detects mirrored host pairs using information about a large set of pages, including one or more of: URLs, IP addresses, and connectivity information. The identities of the detected mirrored hosts are then saved so that browsers, crawlers, proxy servers, or the like can correctly identify mirrored web sites. The described embodiments of the present invention use one or a combination of techniques to identify mirrors. A first group of techniques involves determining mirrors based on URLs and information about connectivity (i.e., hyperlinks) between pages. A second group of techniques looks at connectivity information at a higher granularity, considering all links from all pages on a host as one group and ignoring the target of each link beyond the host level.

    摘要翻译: 使用关于大量页面的信息来检测镜像主机对的方法和系统,包括以下一个或多个:URL,IP地址和连接信息。 然后保存检测到的镜像主机的身份,以便浏览器,爬网程序,代理服务器等可以正确识别镜像的网站。 所描述的本发明的实施例使用一种或技术的组合来识别反射镜。 第一组技术涉及基于URL和关于页面之间的连接性(即,超链接)的信息来确定镜像。 第二组技术以更高的粒度来考虑连接信息,考虑到主机上所有页面的所有链接为一个组,并忽略超出主机级别的每个链接的目标。

    Method for identifying near duplicate pages in a hyperlinked database
    4.
    发明授权
    Method for identifying near duplicate pages in a hyperlinked database 有权
    在超链接数据库中识别近重复页面的方法

    公开(公告)号:US6138113A

    公开(公告)日:2000-10-24

    申请号:US131469

    申请日:1998-08-10

    IPC分类号: G06F17/30

    摘要: A method is described for identifying pages that are near duplicates in a linked database. In the linked database, pages can have incoming links and outgoing links. Two pages are selected, a first page and a second page. For each selected page, the number of outgoing links is determined. The two pages are marked as near duplicates based on the number of common outgoing links for the two pages.

    摘要翻译: 描述了一种用于识别链接数据库中几乎重复的页面的方法。 在链接的数据库中,页面可以具有传入链接和传出链接。 选择两页,第一页和第二页。 对于每个所选页面,确定输出链接的数量。 这两个页面根据两页的通用传出链接的数量被标记为近似的重复。

    Method and apparatus for preventing topic drift in queries in hyperlinked environments
    5.
    发明授权
    Method and apparatus for preventing topic drift in queries in hyperlinked environments 有权
    用于在超链接环境中的查询中防止主题漂移的方法和装置

    公开(公告)号:US06321220B1

    公开(公告)日:2001-11-20

    申请号:US09207215

    申请日:1998-12-07

    IPC分类号: G06F1730

    摘要: A method and apparatus for preventing topic drift in queries in hyperlinked environments uses equivalence components for ranking pages containing information that is relevant to the topic of a user query input to a search engine. The method includes the step of providing a query to a search engine, where the query represents a predetermined topic; retrieving at least one page associated with the query; constructing a graph representing the pages in memory; creating at least one equivalence component representing a subset of the graph; processing each equivalence component; eliminating the equivalence component in accordance with whether it matches the predetermined topic; and ranking the remaining pages.

    摘要翻译: 用于防止在超链接环境中的查询中的主题漂移的方法和装置使用等价组件来排列包含与搜索引擎输入的用户查询的主题相关的信息的页面。 该方法包括向搜索引擎提供查询的步骤,其中查询表示预定的主题; 检索与查询相关联的至少一个页面; 构建表示存储器中的页面的图形; 创建表示图的子集的至少一个等价分量; 处理每个等价分量; 根据是否匹配预定的主题来消除等价分量; 并排列剩下的页面。

    System and method for impromptu shared communication spaces
    6.
    发明授权
    System and method for impromptu shared communication spaces 有权
    即兴共享通信空间的系统和方法

    公开(公告)号:US09425971B1

    公开(公告)日:2016-08-23

    申请号:US13616467

    申请日:2012-09-14

    IPC分类号: G06F17/30 H04L12/18 G06F15/16

    摘要: Communications between entities who may share common interests. For entities determined to be sharing common interests (e.g., searching using the same terms or topics, browsing a page, a site or a groups of topically related sites), options for communication among the entities are provided. For example, a chat room may be dynamically created for persons who are currently searching or browsing the same or related information. As another example, a “homepage” may be created for each query and contain various types of information related to the query. A permission module controls which entities may participate, what types of information (and from what sources) an entity can (or desires to) receive, what types of information the entity may (or desires to) share.

    摘要翻译: 可能有共同利益的实体之间的沟通。 对于确定为共享共同兴趣的实体(例如,使用相同的术语或主题进行搜索,浏览页面,站点或局部相关站点的组),提供了实体之间的通信选项。 例如,可以为正在搜索或浏览相同或相关信息的人员动态地创建聊天室。 作为另一示例,可以为每个查询创建“主页”,并且包含与查询相关的各种类型的信息。 许可模块控制哪些实体可以参与,实体可以(或期望)接收哪些类型的信息(以及从什么来源),实体可能(或希望)共享什么类型的信息。

    System and Method for Analyzing Data Records
    9.
    发明申请
    System and Method for Analyzing Data Records 有权
    用于分析数据记录的系统和方法

    公开(公告)号:US20120215787A1

    公开(公告)日:2012-08-23

    申请号:US13407632

    申请日:2012-02-28

    IPC分类号: G06F17/30

    摘要: A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

    摘要翻译: 用于分析数据记录的方法和系统包括:将记录组分配给并行执行的第一多个进程的各个进程。 在第一多个处理的每个相应处理中,对于分配给相应处理的记录组中的每个记录,将对该记录应用查询以产生零个或多个值。 将零个或更多个发射操作符应用于零或更多产生的值中的每一个,以便将相应的信息添加到中间数据结构。 来自多个中间数据结构的信息被聚合以产生输出数据。