-
公开(公告)号:US20120246552A1
公开(公告)日:2012-09-27
申请号:US13052622
申请日:2011-03-21
申请人: Samson J. Liu , Suk Hwan Lim , Jerry J. Liu
发明人: Samson J. Liu , Suk Hwan Lim , Jerry J. Liu
IPC分类号: G06F17/00
CPC分类号: G06F16/951
摘要: Examples disclosed herein are example systems and methods to provide a particular type of uniform resource locator. In one example, a processor identifies webpage source code associated with a list of text associated with the type of uniform resource locator. The processor may identify a uniform resource locator within the identified webpage source code and provide the uniform resource locator.
摘要翻译: 本文公开的示例是提供特定类型的统一资源定位符的示例系统和方法。 在一个示例中,处理器识别与与统一资源定位符的类型相关联的文本列表相关联的网页源代码。 处理器可以识别所识别的网页源代码内的统一资源定位符,并提供统一的资源定位符。
-
公开(公告)号:US09330323B2
公开(公告)日:2016-05-03
申请号:US14364743
申请日:2012-04-29
申请人: Steven J Simske , Samson J. Liu
发明人: Steven J Simske , Samson J. Liu
CPC分类号: G06K9/18 , G06K9/00442 , G06K9/03
摘要: A system and method to error correct extant electronic documents is disclosed. An electronic document may be rasterized to obtain a pixel representation of the electronic document (e.g., raster image). One or more optical character recognition (OCR) tasks may be performed on the raster image of the electronic document. Errors discovered by the OCR tasks may be corrected and a customized error corrected version of the electronic document may be created and stored. If the author of the electronic document is known, the raster image may be compared to a personalized tf*idf error dictionary associated with the author to determine known OCR errors specific to the author. The raster image may also be compared to a personalized electronic error dictionary associated with the author to determine known typographical errors specific to the author.
摘要翻译: 公开了一种错误纠正现有电子文档的系统和方法。 电子文档可以被光栅化以获得电子文档的像素表示(例如,光栅图像)。 可以在电子文档的光栅图像上执行一个或多个光学字符识别(OCR)任务。 可能会纠正由OCR任务发现的错误,并且可以创建和存储电子文档的定制错误更正版本。 如果电子文档的作者是已知的,则光栅图像可以与与作者相关联的个性化tf * idf错误字典进行比较,以确定作者特有的已知OCR错误。 也可以将光栅图像与与作者相关联的个性化电子错误字典进行比较,以确定作者特有的已知印刷错误。
-
公开(公告)号:US09218322B2
公开(公告)日:2015-12-22
申请号:US13811912
申请日:2010-07-28
申请人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
发明人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
CPC分类号: G06F17/21 , G06F17/212 , G06F17/227 , G06F17/30896
摘要: A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.
摘要翻译: 用于产生网页内容的方法包括识别网页内的块。 块被选择性地组装成部分。 这些部分被选择性地组装成文章候选人。 包含文章内容的文章候选人与不包含文章内容的文章候选人区分开来。 内容仅从作为包含文章内容的文章候选人生成。
-
公开(公告)号:US20150049949A1
公开(公告)日:2015-02-19
申请号:US14364743
申请日:2012-04-29
申请人: Steven J Simske , Samson J. Liu
发明人: Steven J Simske , Samson J. Liu
CPC分类号: G06K9/18 , G06K9/00442 , G06K9/03
摘要: A system and method to error correct extant electronic documents is disclosed. An electronic document may be rasterized to obtain a pixel representation of the electronic document (e.g., raster image). One or more optical character recognition (OCR) tasks may be performed on the raster image of the electronic document. Errors discovered by the OCR tasks may be corrected and a customized error corrected version of the electronic document may be created and stored. If the author of the electronic document is known, the raster image may be compared to a personalized tf*idf error dictionary associated with the author to determine known OCR errors specific to the author. The raster image may also be compared to a personalized electronic error dictionary associated with the author to determine known typographical errors specific to the author.
摘要翻译: 公开了一种错误纠正现有电子文档的系统和方法。 电子文档可以被光栅化以获得电子文档的像素表示(例如,光栅图像)。 可以在电子文档的光栅图像上执行一个或多个光学字符识别(OCR)任务。 可能会纠正由OCR任务发现的错误,并且可以创建和存储电子文档的定制错误更正版本。 如果电子文档的作者是已知的,则光栅图像可以与与作者相关联的个性化tf * idf错误字典进行比较,以确定作者特有的已知OCR错误。 也可以将光栅图像与与作者相关联的个性化电子错误字典进行比较,以确定作者特有的已知印刷错误。
-
公开(公告)号:US08577887B2
公开(公告)日:2013-11-05
申请号:US12639768
申请日:2009-12-16
申请人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
发明人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
IPC分类号: G06F17/30
CPC分类号: G06F17/30911
摘要: A method of grouping a plurality of media content is provided. The method includes converting at least a portion of the media content into at least one document object model (“DOM”) using a processor. The DOM can include a plurality of block elements, each comprising at least one content object. The method includes apportioning the content objects into a relevant portion and an irrelevant portion and extracting a set of keywords, the set comprising at least one keyword, within the relevant portion of the content objects. The method includes apportioning the relevant portion of the content objects into a related portion and an unrelated portion using at least a portion of the set of keywords and grouping the related portion of the content to provide a group of related content.
摘要翻译: 提供了一种分组多个媒体内容的方法。 该方法包括使用处理器将媒体内容的至少一部分转换成至少一个文档对象模型(“DOM”)。 DOM可以包括多个块元素,每个块元素包括至少一个内容对象。 该方法包括将内容对象分配到相关部分和不相关部分中,并且在内容对象的相关部分内提取一组关键字,该集合包括至少一个关键字。 该方法包括使用该组关键字的至少一部分将内容对象的相关部分分配到相关部分和不相关部分中,并且对内容的相关部分进行分组以提供一组相关内容。
-
公开(公告)号:US20130114105A1
公开(公告)日:2013-05-09
申请号:US13635412
申请日:2010-04-19
申请人: Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang
发明人: Samson J. Liu , Suk Hwan Lim , Jian-Ming Jin , Yuhong Xiong , Parag M. Joshi , Nina Bhatti , Jerry J. Liu , Jian Fan , Sheng-Wen Yang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/3089
摘要: Semantically ranking content in a website (110) with a computerized ranking device (105) includes: parsing content from the website (110) into multiple autonomous content blocks (415-1 to 415-17) with the computerized ranking device (105) and assigning an importance ranking with said computerized ranking device (105) to each of the content blocks (415-1 to 415-17) based on a degree to which a substance of the content block (415-1 to 415-17) is relevant to one of a plurality of predefined categories.
摘要翻译: 使用计算机化排名设备(105)在网站(110)中语义地排序内容包括:使用所述计算机化排名设备(105)将来自所述网站(110)的内容解析为多个自主内容块(415-1至415-17),以及 基于内容块(415-1至415-17)的实质相关的程度,将所述计算机化排名设备(105)的重要性排名分配给每个内容块(415-1至415-17) 到多个预定类别之一。
-
7.
公开(公告)号:US20120150637A1
公开(公告)日:2012-06-14
申请号:US13391637
申请日:2009-08-26
申请人: Samson J. Liu , Parag M. Joshi
发明人: Samson J. Liu , Parag M. Joshi
IPC分类号: G06Q30/02
CPC分类号: G06F3/1203 , G06F3/1243 , G06F3/1287 , G06Q30/02 , G06Q30/0251
摘要: In one embodiment, a system and method relate to detecting a print command received by a network browser of a client computer, the print command reflecting an interest to print content of a network page displayed in the network browser as a hard copy printout, analyzing the network page content to determine its underlying subject matter, identifying commercial content relevant to the underlying subject matter, and creating and formatting a document that includes the network page content and the identified commercial content.
摘要翻译: 在一个实施例中,一种系统和方法涉及检测由客户端计算机的网络浏览器接收到的打印命令,该打印命令反映了将网络浏览器中显示的网页的内容打印出来的兴趣作为硬拷贝打印输出,分析 网页内容以确定其基本主题,识别与底层主题相关的商业内容,以及创建和格式化包括网页内容和所识别的商业内容的文档。
-
公开(公告)号:US20110145249A1
公开(公告)日:2011-06-16
申请号:US12639768
申请日:2009-12-16
申请人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
发明人: Parag M. Joshi , Jian-Ming Jin , Sheng-Wen Yang , Samson J. Liu , Nina Bhatti , Suk Hwan Lim
IPC分类号: G06F17/30
CPC分类号: G06F17/30911
摘要: A method of grouping a plurality of media content is provided. The method includes converting at least a portion of the media content into at least one document object model (“DOM”) using a processor. The DOM can include a plurality of block elements, each comprising at least one content object. The method includes apportioning the content objects into a relevant portion and an irrelevant portion and extracting a set of keywords, the set comprising at least one keyword, within the relevant portion of the content objects. The method includes apportioning the relevant portion of the content objects into a related portion and an unrelated portion using at least a portion of the set of keywords and grouping the related portion of the content to provide a group of related content.
摘要翻译: 提供了一种分组多个媒体内容的方法。 该方法包括使用处理器将媒体内容的至少一部分转换成至少一个文档对象模型(“DOM”)。 DOM可以包括多个块元素,每个块元素包括至少一个内容对象。 该方法包括将内容对象分配到相关部分和不相关部分中,并且在内容对象的相关部分内提取一组关键字,该集合包括至少一个关键字。 该方法包括使用该组关键字的至少一部分将内容对象的相关部分分配到相关部分和不相关部分中,并且对内容的相关部分进行分组以提供一组相关内容。
-
9.
公开(公告)号:US20150138605A1
公开(公告)日:2015-05-21
申请号:US13821356
申请日:2010-09-21
申请人: Samson J. Liu , Parag M. Joshi , Sheng-Wen Yang , Jian-Ming Jin
发明人: Samson J. Liu , Parag M. Joshi , Sheng-Wen Yang , Jian-Ming Jin
CPC分类号: G06Q30/0251 , G06F3/1243 , G06F3/1289 , G06K15/1809 , G06K15/1822 , G06Q30/0254 , G06Q30/0276
摘要: Systems, devices and methods are provided which relate to detecting a print command on a client computer, the print command reflecting an interest to print content of an electronic document, accessible by a client computer, as a hard copy printout. One method includes analyzing the electronic document content to determine its underlying subject matter, identifying commercial content relevant to the underlying subject matter, and creating and formatting a new, printable document that includes the electronic document content and the identified commercial content.
摘要翻译: 提供了与检测客户端计算机上的打印命令相关的系统,设备和方法,该打印命令反映了将由客户端计算机访问的电子文档的内容打印出来的兴趣,作为硬拷贝打印输出。 一种方法包括分析电子文档内容以确定其基本主题,识别与底层主题相关的商业内容,以及创建和格式化包括电子文档内容和所识别的商业内容的新的可打印文档。
-
公开(公告)号:US20130124953A1
公开(公告)日:2013-05-16
申请号:US13811912
申请日:2010-07-28
申请人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
发明人: Jian Fan , Ping Luo , Li-Wei Zheng , Samson J. Liu , Suk Hwan Lim , Jerry J. Liu , Yuhong Xiong
IPC分类号: G06F17/21
CPC分类号: G06F17/21 , G06F17/212 , G06F17/227 , G06F17/30896
摘要: A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.
摘要翻译: 用于产生网页内容的方法包括识别网页内的块。 块被选择性地组装成部分。 这些部分被选择性地组装成文章候选人。 包含文章内容的文章候选人与不包含文章内容的文章候选人区分开来。 内容仅从作为包含文章内容的文章候选人生成。
-
-
-
-
-
-
-
-
-