专利检索 ap:("Samson J. Liu" OR "Suk Hwan Lim" OR "Jian-Ming Jin" OR "Yuhong Xiong" OR "Parag M. Joshi" OR "Nina Bhatti" OR "Jerry J. Liu" OR "Jian Fan" OR "Sheng-Wen Yang") AND inv:"Jian Fan" 第 1 页

1.

发明授权
Semantically ranking content in a website 有权
标题翻译：在网站上语义上排名内容

公开(公告)号：US08918403B2

公开(公告)日：2014-12-23

申请号：US13635412

申请日：2010-04-19

申请人： Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang

发明人： Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , G06F17/3089

摘要： Semantically ranking content in a website (110) with a computerized ranking device (105) includes: parsing content from the website (110) into multiple autonomous content blocks (415-1 to 415-17) with the computerized ranking device (105) and assigning an importance ranking with said computerized ranking device (105) to each of the content blocks (415-1 to 415-17) based on a degree to which a substance of the content block (415-1 to 415-17) is relevant to one of a plurality of predefined categories.

摘要翻译： 使用计算机化排名设备（105）在网站（110）中语义地排序内容包括：使用所述计算机化排名设备（105）将来自所述网站（110）的内容解析为多个自主内容块（415-1至415-17），以及基于内容块（415-1至415-17）的实质相关的程度，将所述计算机化排名设备（105）的重要性排名分配给每个内容块（415-1至415-17）到多个预定类别之一。

2.

发明申请
Semantically Ranking Content in a Website 有权
标题翻译：在网站上语义上排名内容

公开(公告)号：US20130114105A1

公开(公告)日：2013-05-09

申请号：US13635412

申请日：2010-04-19

申请人： Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang

发明人： Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , G06F17/3089

摘要： Semantically ranking content in a website (110) with a computerized ranking device (105) includes: parsing content from the website (110) into multiple autonomous content blocks (415-1 to 415-17) with the computerized ranking device (105) and assigning an importance ranking with said computerized ranking device (105) to each of the content blocks (415-1 to 415-17) based on a degree to which a substance of the content block (415-1 to 415-17) is relevant to one of a plurality of predefined categories.

摘要翻译： 使用计算机化排名设备（105）在网站（110）中语义地排序内容包括：使用所述计算机化排名设备（105）将来自所述网站（110）的内容解析为多个自主内容块（415-1至415-17），以及基于内容块（415-1至415-17）的实质相关的程度，将所述计算机化排名设备（105）的重要性排名分配给每个内容块（415-1至415-17）到多个预定类别之一。

3.

发明申请
DETERMIINING SIMILARITY BETWEEN ELEMENTS OF AN ELECTRONIC DOCUMENT 审中-公开
标题翻译：消除电子文件元素之间的相似性

公开(公告)号：US20130091150A1

公开(公告)日：2013-04-11

申请号：US13805212

申请日：2010-06-30

申请人： Jian-Ming Jin , Suk Hwan Lim , Li-Wei Zheng , Jian Fan , Eamonn O'Brien-Strain , Yuhong Xiong , Jerry J. Liu

发明人： Jian-Ming Jin , Suk Hwan Lim , Li-Wei Zheng , Jian Fan , Eamonn O'Brien-Strain , Yuhong Xiong , Jerry J. Liu

IPC分类号： G06F17/30

CPC分类号： G06F16/24578 , G06F16/951

摘要： Disclosed is a computer-implemented method of determining smarty between first and second elements of an electronic document. The method uses a computer to calculate a plurality of measures of similarity between the first and second elements in at least two representations of the electronic document. A computer program product and system implementing this method are also disclosed.

摘要翻译： 公开了一种计算机实现的确定电子文档的第一和第二元素之间的智能的方法。该方法使用计算机来计算电子文档的至少两个表示中的第一和第二元素之间的多个相似度量度。还公开了一种实现该方法的计算机程序产品和系统。

4.

发明授权
Producing web page content 有权
标题翻译：制作网页内容

公开(公告)号：US09218322B2

公开(公告)日：2015-12-22

申请号：US13811912

申请日：2010-07-28

申请人： Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong

发明人： Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong

IPC分类号： G06F17/00 , G06F17/21 , G06F17/30 , G06F17/22

CPC分类号： G06F17/21 , G06F17/212 , G06F17/227 , G06F17/30896

摘要： A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.

摘要翻译： 用于产生网页内容的方法包括识别网页内的块。块被选择性地组装成部分。这些部分被选择性地组装成文章候选人。包含文章内容的文章候选人与不包含文章内容的文章候选人区分开来。内容仅从作为包含文章内容的文章候选人生成。

5.

发明申请
PRODUCING WEB PAGE CONTENT 有权
标题翻译：制作网页内容

公开(公告)号：US20130124953A1

公开(公告)日：2013-05-16

申请号：US13811912

申请日：2010-07-28

申请人： Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong

发明人： Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong

IPC分类号： G06F17/21

CPC分类号： G06F17/21 , G06F17/212 , G06F17/227 , G06F17/30896

摘要： A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.

摘要翻译： 用于产生网页内容的方法包括识别网页内的块。块被选择性地组装成部分。这些部分被选择性地组装成文章候选人。包含文章内容的文章候选人与不包含文章内容的文章候选人区分开来。内容仅从作为包含文章内容的文章候选人生成。

6.

发明申请
Obtaining Rendering Co-ordinates Of Visible Text Elements 审中-公开
标题翻译：获取可见文本元素的渲染坐标

公开(公告)号：US20130159889A1

公开(公告)日：2013-06-20

申请号：US13808856

申请日：2010-07-07

申请人： Li-Wei Zheng , De-Miao Lin , Jian-Ming Lin , Suk Hwan Lim , Jian Fan , Eamonn O'Brien-Strain , Yuhong Xiong , Jerry J. Liu

发明人： Li-Wei Zheng , De-Miao Lin , Jian-Ming Lin , Suk Hwan Lim , Jian Fan , Eamonn O'Brien-Strain , Yuhong Xiong , Jerry J. Liu

IPC分类号： G06F3/0481

CPC分类号： G06F3/0481 , G06F16/986 , G06F17/218

摘要： A computer-implemented method for obtaining the rendering co-ordinates of visible text elements on a web page is disclosed. The web page is represented by an input data structure comprising a plurality of text nodes, each of which represents a text element on the web page. The method comprises the following steps: a) using a computer device, wrapping each of the plurality of text nodes in a pair of mark-up language tags; b) using said computer device, obtaining the co-ordinates of a bounding rectangle for each text node using the mark-up language tags; c) using said computer device, attaching an attribute specifying the co-ordinates of the bounding rectangle to each text node; and d) using said computer device, determining whether each text node is invisible, and if it is, excluding it from an output data structure comprising the plurality of text nodes and attached attributes.

摘要翻译： 公开了一种用于获得网页上的可视文本元素的渲染坐标的计算机实现的方法。网页由包括多个文本节点的输入数据结构表示，每个文本节点表示网页上的文本元素。该方法包括以下步骤：a）使用计算机设备，将多个文本节点中的每一个包裹在一对标记语言标签中; b）使用所述计算机设备，使用所述标记语言标签获得每个文本节点的边界矩形的坐标; c）使用所述计算机设备，将指定所述边界矩形的坐标的属性附加到每个文本节点; 以及d）使用所述计算机设备，确定每个文本节点是否不可见，并且如果是，则将其从包括所述多个文本节点和附加属性的输出数据结构中排除。

7.

发明授权
System and method for web content extraction 有权
标题翻译：网页内容提取的系统和方法

公开(公告)号：US08819028B2

公开(公告)日：2014-08-26

申请号：US13258482

申请日：2009-12-14

申请人： Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu

发明人： Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu

IPC分类号： G06F17/30 , G06F3/12

CPC分类号： G06F17/30896 , G06F3/1246

摘要： A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted.

摘要翻译： 公开了一种用于提取Web内容的方法和系统。在一个实施例中，通过基于线间歇节点确定来识别Web内容中的段落来提取网页中的Web内容。然后使用最大记分子序列来识别与识别的段落相关联的文本体的范围。此外，使用基本上水平对齐的启发式规则来改进所识别的文本体。此外，提取与Web内容相关联的一个或多个标题和一个或多个图像。此外，输出包括识别的段落的Web内容，一个或多个标题和一个或多个图像。

8.

发明申请
System and Method for Web Content Extraction 有权
标题翻译： Web内容提取的系统和方法

公开(公告)号：US20120303636A1

公开(公告)日：2012-11-29

申请号：US13258482

申请日：2009-12-14

申请人： Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu

发明人： Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu

IPC分类号： G06F17/30

CPC分类号： G06F17/30896 , G06F3/1246

摘要： A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted.

摘要翻译： 公开了一种用于提取Web内容的方法和系统。在一个实施例中，通过基于线间歇节点确定来识别Web内容中的段落来提取网页中的Web内容。然后使用最大记分子序列来识别与识别的段落相关联的文本体的范围。此外，使用基本上水平对齐的启发式规则来改进所识别的文本体。此外，提取与Web内容相关联的一个或多个标题和一个或多个图像。此外，输出包括识别的段落的Web内容，一个或多个标题和一个或多个图像。

9.

发明申请
Extraction of Content from a Web Page 审中-公开
标题翻译：从网页提取内容

公开(公告)号：US20130283148A1

公开(公告)日：2013-10-24

申请号：US13817656

申请日：2010-10-26

申请人： Suk Hwan Lim , Jian-Ming Jin , Li-Wei Zheng , Jian Fan , Eamonn O'Brien-Strain , Parag Joshi

发明人： Suk Hwan Lim , Jian-Ming Jin , Li-Wei Zheng , Jian Fan , Eamonn O'Brien-Strain , Parag Joshi

IPC分类号： G06F17/22

CPC分类号： G06F17/2247 , G06F16/986

摘要： A system and method are provided for extracting main content from a web page. Web page segmentation is performed on a web page to provide affinity-grouped segments. Descriptive features of at least one of the affinity-grouped segments are computed. At least one of the affinity-grouped segments is classified as a main body segment based on the computed descriptive features. Additional affinity-grouped segments are classified as to a document function based on the computed descriptive features. Classified affinity-grouped segments are assembled according to their classified document functions to provide the main content.

摘要翻译： 提供了一种用于从网页提取主要内容的系统和方法。在网页上执行网页分割以提供关联分组的段。计算至少一个亲和力分组段的描述性特征。基于所计算的描述特征，至少一个亲和度分组的段被分类为主体段。基于所计算的描述特征，附加的亲和组合段被分类为文档功能。分类的亲和度分组段根据其分类的文档功能进行组装以提供主要内容。

10.

发明授权
Detecting separator lines in a web page 有权
标题翻译：检测网页中的分隔线

公开(公告)号：US08867837B2

公开(公告)日：2014-10-21

申请号：US13812421

申请日：2010-07-30

申请人： Hui-Man Hou , Li-Wei Zheng , Jian-Ming Jin , Jian Fan , Suk Hwan Lim

发明人： Hui-Man Hou , Li-Wei Zheng , Jian-Ming Jin , Jian Fan , Suk Hwan Lim

IPC分类号： G06K9/34 , C07D309/28 , G06K9/00

CPC分类号： G06K9/00463 , C07D309/28

摘要： A system and method of detecting separator lines in a web page may include determining coordinates of visible web elements on a web page, generating an edge image of the web page based on the coordinates of the web elements, filtering edges belonging to non-separator line elements within the edge image, detecting horizontal lines within the edge image, detecting vertical lines within the edge image, and filtering short lines within the edge image. A system for detecting separator lines in a web page may include a memory device, and a processor communicatively coupled to the memory, in which the processor determines coordinates of visible web elements on a web page, generates an edge image of the web page based on the coordinates of the web elements, filters edges belonging to non-separator line elements within the edge image, detects horizontal lines within the edge image, detects vertical lines within the edge image, and filters short lines within the edge image.

摘要翻译： 检测网页中的分隔线的系统和方法可以包括确定网页上的可视网页元素的坐标，基于网页元素的坐标生成网页的边缘图像，过滤属于非分隔线的边边缘图像内的元素，检测边缘图像内的水平线，检测边缘图像内的垂直线，以及过滤边缘图像内的短线。用于检测网页中的分隔线的系统可以包括存储器设备和通信地耦合到存储器的处理器，其中处理器确定网页上的可视网页元素的坐标，基于网页生成网页的边缘图像网页元素的坐标，属于边缘图像内的非分隔线元素的滤镜边缘，检测边缘图像内的水平线，检测边缘图像内的垂直线，并对边缘图像内的短线进行滤波。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类