专利检索 ap:("Tal Cohen" OR "Ziv Bar-Yossef" OR "Igor Tsvetkov" OR "Tomer Kol" OR "Adi Mano" OR "Oren Naim" OR "Nitsan Oz" OR "Pravir K. Gupta" OR "Kavi J. Goel") AND inv:"Ziv Bar-Yossef" 第 2 页

11.

发明申请
Methods and Apparatus for Assessing Web Page Decay 审中-公开

公开(公告)号：US20080097977A1

公开(公告)日：2008-04-24

申请号：US11955471

申请日：2007-12-13

申请人： Andrei Broder , Ziv Bar-Yossef , Shanmagasundaram Ravikumar , Andrew Tomkins

发明人： Andrei Broder , Ziv Bar-Yossef , Shanmagasundaram Ravikumar , Andrew Tomkins

IPC分类号： G06F17/30

CPC分类号： G06F16/958

摘要： Systems and methods are herein disclosed for assessing the staleness of a web page. In particular, in one method of the present invention, the staleness of a web page is assessed by examining internal date references within the web page. In another method of the present invention, the staleness of a web page is assessed by examining the meta-data associated with the web page. In a further method of the present invention, the staleness of a hyperlinked web page is determined by examining the link status of the hyperlinks. If the web page has a relatively large number of dead links, it is assessed as being a stale web page. In a still further method of the present invention, the link status of web pages in the neighborhood of the web page being assessed is likewise examined.

12.

发明申请
System and method for detecting matches of small edit distance 审中-公开
标题翻译：用于检测小编辑距离匹配的系统和方法

公开(公告)号：US20070085716A1

公开(公告)日：2007-04-19

申请号：US11241468

申请日：2005-09-30

申请人： Ziv Bar-Yossef , Robert Krauthgamer , Shanmugasundaram Ravikumar , Jayram Thathachar

发明人： Ziv Bar-Yossef , Robert Krauthgamer , Shanmugasundaram Ravikumar , Jayram Thathachar

IPC分类号： H03M7/30

CPC分类号： G06F16/90344

摘要： A system and method of approximating edit distance for a set of character strings in a database includes producing a representative sketch for each of the character strings; and approximating an edit distance between two selected character strings based only on the representative sketch for each of the selected character strings. The character strings may comprise text, wherein the method further comprises encoding positions of substrings in the text using anchors, wherein the anchors comprise identical substrings occurring in two input character strings at a nearby position. A set of anchors may be used in a correlated manner, wherein character strings with a sufficiently small edit distance are likely to use a same sequence of anchors. The character strings may be substantially non-repetitive. The representative sketch of a first character string is preferably constructed absent knowledge of a second character string. A size of the representative sketch may be constant.

摘要翻译： 近似数据库中的一组字符串的编辑距离的系统和方法包括为每个字符串产生代表性的草图; 并且仅基于每个所选择的字符串的代表性草图来近似两个所选字符串之间的编辑距离。字符串可以包括文本，其中该方法还包括使用锚点对文本中的子串的位置进行编码，其中锚点包括在附近位置处的两个输入字符串中出现的相同的子串。可以以相关方式使用一组锚，其中具有足够小的编辑距离的字符串可能使用相同的锚点序列。字符串可以是基本上不重复的。优选地构造第一个字符串的代表性草图而不知道第二个字符串。代表性草图的大小可能不变。

13.

发明申请
Methods and apparatus for assessing web page decay 审中-公开
标题翻译：评估网页衰变的方法和设备

公开(公告)号：US20060112089A1

公开(公告)日：2006-05-25

申请号：US10995770

申请日：2004-11-22

申请人： Andrei Broder , Ziv Bar-Yossef , Shanmagasundaram Ravikumar , Andrew Tomkins

发明人： Andrei Broder , Ziv Bar-Yossef , Shanmagasundaram Ravikumar , Andrew Tomkins

IPC分类号： G06F17/30

CPC分类号： G06F16/958

摘要： Systems and methods are herein disclosed for assessing the staleness of a web page. In particular, in one method of the present invention, the staleness of a web page is assessed by examining internal date references within the web page. In another method of the present invention, the staleness of a web page is assessed by examining the meta-data associated with the web page. In a further method of the present invention, the staleness of a hyperlinked web page is determined by examining the link status of the hyperlinks. If the web page has a relatively large number of dead links, it is assessed as being a stale web page. In a still further method of the present invention, the link status of web pages in the neighborhood of the web page being assessed is likewise examined.

摘要翻译： 本文公开了用于评估网页的陈旧性的系统和方法。特别地，在本发明的一种方法中，通过检查网页中的内部日期参考来评估网页的陈旧性。在本发明的另一种方法中，通过检查与网页相关联的元数据来评估网页的陈旧性。在本发明的另一方法中，通过检查超链接的链接状态来确定超链接网页的陈旧性。如果网页的死链接数量相对较多，则会被视为一个陈旧的网页。在本发明的又一方法中，同样检查正在评估的网页附近的网页的链接状态。

14.

发明申请
IDENTIFYING TOPICAL ENTITIES 审中-公开
标题翻译：识别主题实体

公开(公告)号：US20150278366A1

公开(公告)日：2015-10-01

申请号：US13153365

申请日：2011-06-03

申请人： Haran Pilpel , Tomer Shmiel , Eran Ofek , Eldad Barkai , Ziv Bar-Yossef

发明人： Haran Pilpel , Tomer Shmiel , Eran Ofek , Eldad Barkai , Ziv Bar-Yossef

IPC分类号： G06F17/30

CPC分类号： G06F17/30867 , G06F17/30958

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for identifying topical entities. In one aspect, a method includes obtaining a plurality of entities that are associated with a first resource; for one or more of the identified entities, receiving search results for a search query derived from the entity; determining that search results for a search query including a particular entity include a specific type of search results; and determining that the particular entity is a topical entity of the first resource based at least in part on the particular entity appearing in a title or a resource locator of the first resource, wherein the topical entity of the first resource represents a predominant topic of the first resource.

摘要翻译： 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于识别局部实体。一方面，一种方法包括获得与第一资源相关联的多个实体; 对于一个或多个所识别的实体，接收从该实体导出的搜索查询的搜索结果; 确定包括特定实体的搜索查询的搜索结果包括特定类型的搜索结果; 以及至少部分地基于出现在所述第一资源的标题或资源定位符中的所述特定实体来确定所述特定实体是所述第一资源的主题实体，其中所述第一资源的所述主体实体表示所述第一资源的主要主题第一资源。

15.

发明授权
Counting unique search results 失效
标题翻译：计数唯一的搜索结果

公开(公告)号：US08065309B1

公开(公告)日：2011-11-22

申请号：US12106860

申请日：2008-04-21

申请人： Ziv Bar-Yossef , Kfir Karmon

发明人： Ziv Bar-Yossef , Kfir Karmon

IPC分类号： G06F17/30 , G06F15/16

CPC分类号： G06F17/30979

摘要： The subject matter of this specification can be embodied in, among other things, a computer-implemented method for counting one or more unique search results within a plurality of search results includes creating hash values for information in each of the search results using a first hash function. The first hash function has a predetermined hash value range size. The method further includes identifying a predetermined number of smallest hash values within the created hash values. The method further includes estimating a first number of unique search results based on the predetermined hash value range size, the predetermined number, and a largest hash value in the smallest hash values.

摘要翻译： 本说明书的主题尤其可以体现在用于对多个搜索结果内的一个或多个唯一搜索结果进行计数的计算机实现的方法中，包括使用第一散列来为每个搜索结果中的信息创建哈希值功能。第一散列函数具有预定的散列值范围大小。该方法还包括在所创建的散列值内识别预定数量的最小散列值。该方法还包括基于最小哈希值中的预定散列值范围大小，预定数量和最大哈希值来估计第一数量的唯一搜索结果。

16.

发明申请
Methods and Apparatus for Assessing Web Page Decay 审中-公开
标题翻译：评估Web页面衰减的方法和设备

公开(公告)号：US20080097978A1

公开(公告)日：2008-04-24

申请号：US11955481

申请日：2007-12-13

申请人： Andrei Broder , Ziv Bar-Yossef , Shanmagasundaram Ravikumar , Andrew Tomkins

发明人： Andrei Broder , Ziv Bar-Yossef , Shanmagasundaram Ravikumar , Andrew Tomkins

IPC分类号： G06F17/30

CPC分类号： G06F16/958

摘要： Systems and methods are herein disclosed for assessing the staleness of a web page. In particular, in one method of the present invention, the staleness of a web page is assessed by examining internal date references within the web page. In another method of the present invention, the staleness of a web page is assessed by examining the meta-data associated with the web page. In a further method of the present invention, the staleness of a hyperlinked web page is determined by examining the link status of the hyperlinks. If the web page has a relatively large number of dead links, it is assessed as being a stale web page. In a still further method of the present invention, the link status of web pages in the neighborhood of the web page being assessed is likewise examined.

摘要翻译： 本文公开了用于评估网页的陈旧性的系统和方法。特别地，在本发明的一种方法中，通过检查网页中的内部日期参考来评估网页的陈旧性。在本发明的另一种方法中，通过检查与网页相关联的元数据来评估网页的陈旧性。在本发明的另一方法中，通过检查超链接的链接状态来确定超链接网页的陈旧性。如果网页的死链接数量相对较多，则会被视为一个陈旧的网页。在本发明的又一方法中，同样检查正在评估的网页附近的网页的链接状态。

17.

发明申请
System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages 失效

公开(公告)号：US20060122998A1

公开(公告)日：2006-06-08

申请号：US11004412

申请日：2004-12-04

申请人： Ziv Bar-Yossef , Tapas Kanungo , Robert Krauthgamer

发明人： Ziv Bar-Yossef , Tapas Kanungo , Robert Krauthgamer

IPC分类号： G06F17/30 , G06F17/24

CPC分类号： G06F17/30864

摘要： A focused random walk system produces samples of on-topic pages from a collection of hyper-linked pages such as Web pages. The focused random walk system utilizes a focused random walk to produce a focused sample, which is a random sample of Web pages focused on a topic. The focused random walk system uniformly samples pages iteratively, where each iteration follows a random link from a union of the in-links and out-links of a page. The system then classifies this randomly selected link to determine whether the page is on-topic. The random walk sampling process could comprise a hard-focus method that selects only on-topic pages at each step of the focused random walk, or a soft-focus method that allows limited divergence to off-topic pages.

18.

发明授权
Method and system for improving data quality in large hyperlinked text databases using pagelets and templates 有权
标题翻译：使用小页面和模板在大型超链接文本数据库中提高数据质量的方法和系统

公开(公告)号：US06968331B2

公开(公告)日：2005-11-22

申请号：US10055586

申请日：2002-01-22

申请人： Ziv Bar-Yossef , Sridhar Rajagopalan

发明人： Ziv Bar-Yossef , Sridhar Rajagopalan

IPC分类号： G06F17/30 , G06F7/00

CPC分类号： G06F17/30882 , Y10S707/99932

摘要： A computing system and method clean a set of hypertext documents to minimize violations of a Hypertext Information Retrieval (IR) rule set. Then, the system and method performs an information retrieval operation on the resulting cleaned data. The cleaning process includes decomposing each page of the set of hypertext documents into one or more pagelets; identifying possible templates; and eliminating the templates from the data. Traditional IR search and mining algorithms can then be used to search on the remaining pagelets, as opposed to the original pages, to provide cleaner, more precise results.

摘要翻译： 计算系统和方法清理一组超文本文件以最小化对超文本信息检索（IR）规则集的违规。然后，系统和方法对所得到的清理数据执行信息检索操作。清洁过程包括将该组超文本文件的每一页分解成一个或多个小页; 识别可能的模板; 并从数据中消除模板。然后可以使用传统的IR搜索和挖掘算法来搜索剩余的小页面，而不是原始页面，以提供更清晰，更精确的结果。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类