-
公开(公告)号:US20120246552A1
公开(公告)日:2012-09-27
申请号:US13052622
申请日:2011-03-21
申请人: Samson J. Liu , Suk Hwan Lim , Jerry J. Liu
发明人: Samson J. Liu , Suk Hwan Lim , Jerry J. Liu
IPC分类号: G06F17/00
CPC分类号: G06F16/951
摘要: Examples disclosed herein are example systems and methods to provide a particular type of uniform resource locator. In one example, a processor identifies webpage source code associated with a list of text associated with the type of uniform resource locator. The processor may identify a uniform resource locator within the identified webpage source code and provide the uniform resource locator.
摘要翻译: 本文公开的示例是提供特定类型的统一资源定位符的示例系统和方法。 在一个示例中,处理器识别与与统一资源定位符的类型相关联的文本列表相关联的网页源代码。 处理器可以识别所识别的网页源代码内的统一资源定位符,并提供统一的资源定位符。
-
公开(公告)号:US20130124953A1
公开(公告)日:2013-05-16
申请号:US13811912
申请日:2010-07-28
申请人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
发明人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
IPC分类号: G06F17/21
CPC分类号: G06F17/21 , G06F17/212 , G06F17/227 , G06F17/30896
摘要: A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.
摘要翻译: 用于产生网页内容的方法包括识别网页内的块。 块被选择性地组装成部分。 这些部分被选择性地组装成文章候选人。 包含文章内容的文章候选人与不包含文章内容的文章候选人区分开来。 内容仅从作为包含文章内容的文章候选人生成。
-
公开(公告)号:US08918403B2
公开(公告)日:2014-12-23
申请号:US13635412
申请日:2010-04-19
申请人: Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang
发明人: Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/3089
摘要: Semantically ranking content in a website (110) with a computerized ranking device (105) includes: parsing content from the website (110) into multiple autonomous content blocks (415-1 to 415-17) with the computerized ranking device (105) and assigning an importance ranking with said computerized ranking device (105) to each of the content blocks (415-1 to 415-17) based on a degree to which a substance of the content block (415-1 to 415-17) is relevant to one of a plurality of predefined categories.
摘要翻译: 使用计算机化排名设备(105)在网站(110)中语义地排序内容包括:使用所述计算机化排名设备(105)将来自所述网站(110)的内容解析为多个自主内容块(415-1至415-17),以及 基于内容块(415-1至415-17)的实质相关的程度,将所述计算机化排名设备(105)的重要性排名分配给每个内容块(415-1至415-17) 到多个预定类别之一。
-
公开(公告)号:US09218322B2
公开(公告)日:2015-12-22
申请号:US13811912
申请日:2010-07-28
申请人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
发明人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
CPC分类号: G06F17/21 , G06F17/212 , G06F17/227 , G06F17/30896
摘要: A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.
摘要翻译: 用于产生网页内容的方法包括识别网页内的块。 块被选择性地组装成部分。 这些部分被选择性地组装成文章候选人。 包含文章内容的文章候选人与不包含文章内容的文章候选人区分开来。 内容仅从作为包含文章内容的文章候选人生成。
-
公开(公告)号:US20130114105A1
公开(公告)日:2013-05-09
申请号:US13635412
申请日:2010-04-19
申请人: Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang
发明人: Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/3089
摘要: Semantically ranking content in a website (110) with a computerized ranking device (105) includes: parsing content from the website (110) into multiple autonomous content blocks (415-1 to 415-17) with the computerized ranking device (105) and assigning an importance ranking with said computerized ranking device (105) to each of the content blocks (415-1 to 415-17) based on a degree to which a substance of the content block (415-1 to 415-17) is relevant to one of a plurality of predefined categories.
摘要翻译: 使用计算机化排名设备(105)在网站(110)中语义地排序内容包括:使用所述计算机化排名设备(105)将来自所述网站(110)的内容解析为多个自主内容块(415-1至415-17),以及 基于内容块(415-1至415-17)的实质相关的程度,将所述计算机化排名设备(105)的重要性排名分配给每个内容块(415-1至415-17) 到多个预定类别之一。
-
公开(公告)号:US08577887B2
公开(公告)日:2013-11-05
申请号:US12639768
申请日:2009-12-16
申请人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
发明人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
IPC分类号: G06F17/30
CPC分类号: G06F17/30911
摘要: A method of grouping a plurality of media content is provided. The method includes converting at least a portion of the media content into at least one document object model (“DOM”) using a processor. The DOM can include a plurality of block elements, each comprising at least one content object. The method includes apportioning the content objects into a relevant portion and an irrelevant portion and extracting a set of keywords, the set comprising at least one keyword, within the relevant portion of the content objects. The method includes apportioning the relevant portion of the content objects into a related portion and an unrelated portion using at least a portion of the set of keywords and grouping the related portion of the content to provide a group of related content.
摘要翻译: 提供了一种分组多个媒体内容的方法。 该方法包括使用处理器将媒体内容的至少一部分转换成至少一个文档对象模型(“DOM”)。 DOM可以包括多个块元素,每个块元素包括至少一个内容对象。 该方法包括将内容对象分配到相关部分和不相关部分中,并且在内容对象的相关部分内提取一组关键字,该集合包括至少一个关键字。 该方法包括使用该组关键字的至少一部分将内容对象的相关部分分配到相关部分和不相关部分中,并且对内容的相关部分进行分组以提供一组相关内容。
-
公开(公告)号:US20110145249A1
公开(公告)日:2011-06-16
申请号:US12639768
申请日:2009-12-16
申请人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
发明人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
IPC分类号: G06F17/30
CPC分类号: G06F17/30911
摘要: A method of grouping a plurality of media content is provided. The method includes converting at least a portion of the media content into at least one document object model (“DOM”) using a processor. The DOM can include a plurality of block elements, each comprising at least one content object. The method includes apportioning the content objects into a relevant portion and an irrelevant portion and extracting a set of keywords, the set comprising at least one keyword, within the relevant portion of the content objects. The method includes apportioning the relevant portion of the content objects into a related portion and an unrelated portion using at least a portion of the set of keywords and grouping the related portion of the content to provide a group of related content.
摘要翻译: 提供了一种分组多个媒体内容的方法。 该方法包括使用处理器将媒体内容的至少一部分转换成至少一个文档对象模型(“DOM”)。 DOM可以包括多个块元素,每个块元素包括至少一个内容对象。 该方法包括将内容对象分配到相关部分和不相关部分中,并且在内容对象的相关部分内提取一组关键字,该集合包括至少一个关键字。 该方法包括使用该组关键字的至少一部分将内容对象的相关部分分配到相关部分和不相关部分中,并且对内容的相关部分进行分组以提供一组相关内容。
-
公开(公告)号:US20120303636A1
公开(公告)日:2012-11-29
申请号:US13258482
申请日:2009-12-14
申请人: Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu
发明人: Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu
IPC分类号: G06F17/30
CPC分类号: G06F17/30896 , G06F3/1246
摘要: A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted.
摘要翻译: 公开了一种用于提取Web内容的方法和系统。 在一个实施例中,通过基于线间歇节点确定来识别Web内容中的段落来提取网页中的Web内容。 然后使用最大记分子序列来识别与识别的段落相关联的文本体的范围。 此外,使用基本上水平对齐的启发式规则来改进所识别的文本体。 此外,提取与Web内容相关联的一个或多个标题和一个或多个图像。 此外,输出包括识别的段落的Web内容,一个或多个标题和一个或多个图像。
-
公开(公告)号:US08819028B2
公开(公告)日:2014-08-26
申请号:US13258482
申请日:2009-12-14
申请人: Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu
发明人: Ping Luo , Jian Fan , Samson J. Liu , Yuhong Xiong , Jerry J. Liu
CPC分类号: G06F17/30896 , G06F3/1246
摘要: A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted.
摘要翻译: 公开了一种用于提取Web内容的方法和系统。 在一个实施例中,通过基于线间歇节点确定来识别Web内容中的段落来提取网页中的Web内容。 然后使用最大记分子序列来识别与识别的段落相关联的文本体的范围。 此外,使用基本上水平对齐的启发式规则来改进所识别的文本体。 此外,提取与Web内容相关联的一个或多个标题和一个或多个图像。 此外,输出包括识别的段落的Web内容,一个或多个标题和一个或多个图像。
-
10.
公开(公告)号:US20150138605A1
公开(公告)日:2015-05-21
申请号:US13821356
申请日:2010-09-21
申请人: Samson J. Liu , Parag M. Joshi , Sheng-Wen Yang , Jian-Ming Jin
发明人: Samson J. Liu , Parag M. Joshi , Sheng-Wen Yang , Jian-Ming Jin
CPC分类号: G06Q30/0251 , G06F3/1243 , G06F3/1289 , G06K15/1809 , G06K15/1822 , G06Q30/0254 , G06Q30/0276
摘要: Systems, devices and methods are provided which relate to detecting a print command on a client computer, the print command reflecting an interest to print content of an electronic document, accessible by a client computer, as a hard copy printout. One method includes analyzing the electronic document content to determine its underlying subject matter, identifying commercial content relevant to the underlying subject matter, and creating and formatting a new, printable document that includes the electronic document content and the identified commercial content.
摘要翻译: 提供了与检测客户端计算机上的打印命令相关的系统,设备和方法,该打印命令反映了将由客户端计算机访问的电子文档的内容打印出来的兴趣,作为硬拷贝打印输出。 一种方法包括分析电子文档内容以确定其基本主题,识别与底层主题相关的商业内容,以及创建和格式化包括电子文档内容和所识别的商业内容的新的可打印文档。
-
-
-
-
-
-
-
-
-