Providing an interface to browse links or redirects to a particular webpage
    1.
    发明授权
    Providing an interface to browse links or redirects to a particular webpage 有权
    提供一个界面来浏览链接或重定向到一个特定的网页

    公开(公告)号:US08554869B2

    公开(公告)日:2013-10-08

    申请号:US11498540

    申请日:2006-08-02

    IPC分类号: G06F15/16 G06F7/00

    CPC分类号: G06F17/30873

    摘要: Disclosed herein is a technique for providing an interface that allows a user to navigate backwards through linked webpages. Initially, a request to display inlinks of linking webpages that contain a link to a particular webpage is received. In response to the request, a new page that contains a set of inlinks that correspond to a set of linking webpages that each contain a link to the particular webpage is provided. Each of the inlinks may be associated with a particular clickable item. An indication of a selection of a clickable item, associated with a particular inlink is received. In response, a second new page which contains a second set of inlinks that correspond to a second set of linking webpages that each contain a link to the webpage that corresponds to the particular inlink is provided. Some of the displayed inlinks may correspond to webpages that redirect to the particular webpage.

    摘要翻译: 这里公开了一种用于提供允许用户通过链接的网页向后导航的接口的技术。 最初,接收到显示链接到包含链接到特定网页的网页的链接的请求。 响应于该请求,提供了一个新页面,该页面包含一组对应于一组链接网页的链接,每个链接网页中包含指向特定网页的链接。 每个链接可以与特定可点击项相关联。 接收与特定的inlink相关联的可点击项目的选择的指示。 作为响应,提供第二新页面,该第二新页面包含对应于第二组链接网页的第二组内联链接网页,每个网页包含与特定链接对应的网页的链接。 一些显示的链接可能对应于重定向到特定网页的网页。

    Providing an interface to browse links or redirects to a particular webpage
    2.
    发明申请
    Providing an interface to browse links or redirects to a particular webpage 有权
    提供一个界面来浏览链接或重定向到一个特定的网页

    公开(公告)号:US20080034059A1

    公开(公告)日:2008-02-07

    申请号:US11498540

    申请日:2006-08-02

    IPC分类号: G06F15/16

    CPC分类号: G06F17/30873

    摘要: Disclosed herein is a technique for providing an interface that allows a user to navigate backwards through linked webpages. Initially, a request to display inlinks of linking webpages that contain a link to a particular webpage is received. In response to the request, a new page that contains a set of inlinks that correspond to a set of linking webpages that each contain a link to the particular webpage is provided. Each of the inlinks may be associated with a particular clickable item. An indication of a selection of a clickable item, associated with a particular inlink is received. In response, a second new page which contains a second set of inlinks that correspond to a second set of linking webpages that each contain a link to the webpage that corresponds to the particular inlink is provided. Some of the displayed inlinks may correspond to webpages that redirect to the particular webpage.

    摘要翻译: 这里公开了一种用于提供允许用户通过链接的网页向后导航的接口的技术。 最初,接收到显示链接到包含链接到特定网页的网页的链接的请求。 响应于该请求,提供了一个新页面,该页面包含一组对应于一组链接网页的链接,每个链接网页中包含指向特定网页的链接。 每个链接可以与特定可点击项相关联。 接收与特定的inlink相关联的可点击项目的选择的指示。 作为响应,提供第二新页面,该第二新页面包含对应于第二组链接网页的第二组内联链接网页,每个网页包含与特定链接对应的网页的链接。 一些显示的链接可能对应于重定向到特定网页的网页。

    Techniques for detecting duplicate web pages
    3.
    发明授权
    Techniques for detecting duplicate web pages 有权
    检测重复网页的技术

    公开(公告)号:US07698317B2

    公开(公告)日:2010-04-13

    申请号:US11788505

    申请日:2007-04-20

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30864 G06F17/2211

    摘要: Techniques are disclosed for detecting web pages with duplicate content. In one embodiment, a set of shingles is computed for each page of a group of pages. An aggregate set of shingles is determined based on the sets of shingles computed for the group of pages. A first subset from the aggregate set of shingles is determined by selecting, from the aggregate set, shingles whose frequencies in the aggregate set exceed a specified threshold. A modified set of shingles is generated for each page of the group of pages by removing, from the set of shingles for that page, any shingle included in the first subset. One or more duplicate pages in the group of pages are determined based at least in part on the modified sets of shingles generated for the group of pages.

    摘要翻译: 公开了用于检测具有重复内容的网页的技术。 在一个实施例中,针对一组页面的每个页面计算一组带状块。 基于为该组页面计算的带状块的集合确定聚合的带状块组。 通过从聚合集合中选择聚合集合中的频率超过指定阈值的带状键确定来自聚合散列集合的第一子集。 通过从该页面的一组带状键移除包括在第一子集中的任何瓦片,为该组页面的每个页面生成经修改的带状块组。 至少部分地基于为该组页生成的带状块的修改的集合来确定该组页面中的一个或多个重复页面。

    Techniques for detecting duplicate web pages
    4.
    发明申请
    Techniques for detecting duplicate web pages 有权
    检测重复网页的技术

    公开(公告)号:US20080263026A1

    公开(公告)日:2008-10-23

    申请号:US11788505

    申请日:2007-04-20

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/2211

    摘要: Techniques are disclosed for detecting web pages with duplicate content. In one embodiment, a set of shingles is computed for each page of a group of pages. An aggregate set of shingles is determined based on the sets of shingles computed for the group of pages. A first subset from the aggregate set of shingles is determined by selecting, from the aggregate set, shingles whose frequencies in the aggregate set exceed a specified threshold. A modified set of shingles is generated for each page of the group of pages by removing, from the set of shingles for that page, any shingle included in the first subset. One or more duplicate pages in the group of pages are determined based at least in part on the modified sets of shingles generated for the group of pages.

    摘要翻译: 公开了用于检测具有重复内容的网页的技术。 在一个实施例中,针对一组页面的每个页面计算一组带状块。 基于为该组页面计算的带状块的集合确定聚合的带状块组。 通过从聚合集合中选择聚合集合中的频率超过指定阈值的带状键确定来自聚合散列集合的第一子集。 通过从该页面的一组带状键移除包括在第一子集中的任何瓦片,为该组页面的每个页面生成经修改的带状块组。 至少部分地基于为该组页生成的带状块的修改的集合来确定该组页面中的一个或多个重复页面。

    SEARCH ENGINE RECENCY USING CONTENT PREVIEW
    5.
    发明申请
    SEARCH ENGINE RECENCY USING CONTENT PREVIEW 有权
    使用内容预览搜索引擎

    公开(公告)号:US20110173180A1

    公开(公告)日:2011-07-14

    申请号:US12687596

    申请日:2010-01-14

    IPC分类号: G06F17/30

    摘要: Disclosed herein is use of a preview of content from a target document, as provided by a content preview source such as a Really Simple Syndication (RSS) feed, by a search engine. The content preview source includes the preview of the target document's content and a reference, e.g., a Universal Resource Locator (URL) or other link. A content preview document is generated using data extracted from the content preview source. The content preview document is made available in a searchable index used by a search engine to respond to a search query. A fetch operation is scheduled to fetch the target document using the reference provided in the content preview source. Once fetched, the data extracted from the content preview source can be associated with the target document, and can be used in presenting the target document in search results.

    摘要翻译: 这里公开的是通过由搜索引擎的诸如真正简单聚合(RSS)馈送的内容预览源提供的来自目标文档的内容的预览的使用。 内容预览源包括目标文档的内容和参考的预览,例如通用资源定位符(URL)或其他链接。 使用从内容预览源提取的数据生成内容预览文档。 内容预览文档在搜索引擎使用的可搜索索引中可用,以响应搜索查询。 计划提取操作使用内容预览源中提供的引用来获取目标文档。 一旦获取,从内容预览源提取的数据可以与目标文档相关联,并且可以用于在搜索结果中呈现目标文档。

    Search engine recency using content preview
    6.
    发明授权
    Search engine recency using content preview 有权
    搜索引擎新近使用内容预览

    公开(公告)号:US09465879B2

    公开(公告)日:2016-10-11

    申请号:US12687596

    申请日:2010-01-14

    IPC分类号: G06F17/30

    摘要: Disclosed herein is use of a preview of content from a target document, as provided by a content preview source such as a Really Simple Syndication (RSS) feed, by a search engine. The content preview source includes the preview of the target document's content and a reference, e.g., a Universal Resource Locator (URL) or other link. A content preview document is generated using data extracted from the content preview source. The content preview document is made available in a searchable index used by a search engine to respond to a search query. A fetch operation is scheduled to fetch the target document using the reference provided in the content preview source. Once fetched, the data extracted from the content preview source can be associated with the target document, and can be used in presenting the target document in search results.

    摘要翻译: 这里公开的是通过搜索引擎由诸如真正简单聚合(RSS)馈送的内容预览源提供的来自目标文档的内容的预览的使用。 内容预览源包括目标文档的内容和参考的预览,例如通用资源定位符(URL)或其他链接。 使用从内容预览源提取的数据生成内容预览文档。 内容预览文档在搜索引擎使用的可搜索索引中可用,以响应搜索查询。 计划提取操作使用内容预览源中提供的引用来获取目标文档。 一旦获取,从内容预览源提取的数据可以与目标文档相关联,并且可以用于在搜索结果中呈现目标文档。

    Consecutive crawling to identify transient links
    7.
    发明申请
    Consecutive crawling to identify transient links 审中-公开
    连续爬行以识别短暂的链接

    公开(公告)号:US20070226206A1

    公开(公告)日:2007-09-27

    申请号:US11388681

    申请日:2006-03-23

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: According to the approach described herein, an approach is provided for identifying transient links on a Web page by crawling a Web page consecutively after a brief interval and comparing the links from each crawl to identify transient links. The approach ensures that transient links are not crawled and archived, thereby saving resources for crawling valid links leading to useful information

    摘要翻译: 根据本文描述的方法,提供了一种用于通过在短暂间隔之后连续爬行网页来识别网页上的瞬态链接并比较来自每个爬行的链接以识别瞬时链接的方法。 该方法确保临时链接不被爬网和归档,从而节省了用于爬行有效链接的资源,从而获得有用的信息