Systems and methods for processing a digital captured image
    41.
    发明授权
    Systems and methods for processing a digital captured image 有权
    用于处理数字拍摄图像的系统和方法

    公开(公告)号:US07301564B2

    公开(公告)日:2007-11-27

    申请号:US10197072

    申请日:2002-07-17

    申请人: Jian Fan

    发明人: Jian Fan

    IPC分类号: H04N5/235

    摘要: In one embodiment, the present invention is directed to a method for processing a digitally captured image that comprises an imaged document. The method comprises: detecting graphical information related to spatial discontinuities of the digitally captured image; detecting lines from the detected graphical information; computing effective area parameters for quadrilaterals associated with ones of the detected lines, wherein each effective area parameter for a respective quadrilateral equals an area of the respective quadrilateral modified by at least a corner matching score that is indicative of a number of connected edge pixels in corners of the respective quadrilateral; and selecting a quadrilateral of the quadrilaterals that possesses a largest effective area parameter.

    摘要翻译: 在一个实施例中,本发明涉及一种用于处理包括成像文档的数字捕获图像的方法。 该方法包括:检测与数字拍摄图像的空间不连续性有关的图形信息; 从检测到的图形信息中检测线; 计算与检测到的线中的一个相关联的四边形的有效面积参数,其中相应四边形的每个有效面积参数等于由至少角部匹配分数修改的相应四边形的面积,其指示角部中连接的边缘像素的数量 的四边形; 并选择具有最大有效面积参数的四边形的四边形。

    Extracting graphical bar codes from template-based documents
    42.
    发明授权
    Extracting graphical bar codes from template-based documents 失效
    从基于模板的文档中提取图形条形码

    公开(公告)号:US07017816B2

    公开(公告)日:2006-03-28

    申请号:US10675026

    申请日:2003-09-30

    IPC分类号: G02B26/10

    摘要: Systems and methods of extracting from an input image a graphical bar code containing graphically encoded information are described. In one aspect, a document template is matched to the input image. The document template is selected from a set of document templates each having a respective predetermined page layout corresponding to a respective document type and including a predetermined graphical bar code location. The input image is cropped based on information relating to the graphical bar code location in the page layout of the document template matched to the input image to produce a cropped graphical bar code candidate for decoding.

    摘要翻译: 描述从输入图像提取包含图形编码信息的图形条形码的系统和方法。 在一个方面,文档模板与输入图像匹配。 从一组文档模板中选择文档模板,每个文档模板具有对应于相应文档类型并且包括预定图形条形码位置的相应预定页面布局。 基于与输入图像匹配的文档模板的页面布局中的图形条形码位置相关的信息裁剪输入图像,以产生用于解码的剪切图形条形码候选。

    Segmenting pixels in an image based on orientation-dependent adaptive thresholds

    公开(公告)号:US20060062454A1

    公开(公告)日:2006-03-23

    申请号:US10948822

    申请日:2004-09-23

    IPC分类号: G06K9/34 G06K9/00

    摘要: Methods, machines, and computer-readable media storing machine-readable instructions for segmenting pixels in an image are described. In one aspect, a region of background pixels is identified in the image. At least some of the background pixels in the region are located on a boundary spatially delimiting the region. One or more orientation-dependent adaptive thresholds are determined for one or more respective candidate growth directions from a given background pixel located on the region boundary. Color distances between the given background pixel and candidate pixels in a neighborhood of the given background pixel are determined. The region is grown based on application of the one or more orientation-dependent adaptive thresholds to the determined color distances.

    Image processing methods and systems
    44.
    发明申请
    Image processing methods and systems 失效
    图像处理方法和系统

    公开(公告)号:US20050213848A1

    公开(公告)日:2005-09-29

    申请号:US10809234

    申请日:2004-03-25

    申请人: Jian Fan Hui Chao

    发明人: Jian Fan Hui Chao

    IPC分类号: G06T11/60 G06K9/00 G06K1/00

    CPC分类号: G06T11/60

    摘要: Systems and methods according to the present invention provide techniques to automatically insert an object from one image into a region of another image. The systems and methods require little or no user interaction to allow efficient re-use and updating of existing images, presentations, documents and the like. An object and a container region are identified. Feasible placement location(s) within the container region for the object, as well as an associated scale factor, are determined. If multiple feasible placement locations are identified for a particular scale factor, then one is selected based upon predetermined criteria. The object can then be inserted into the container region and the resulting composite image stored or, alternatively, parameters can be stored which enable object insertion at a subsequent processing step.

    摘要翻译: 根据本发明的系统和方法提供了将对象从一个图像自动插入到另一个图像的区域中的技术。 系统和方法需要很少或没有用户交互以允许有效地重新使用和更新现有图像,演示文稿,文档等。 识别对象和容器区域。 确定对象的容器区域内的可行放置位置以及相关的比例因子。 如果针对特定的比例因子识别出多个可行的放置位置,则基于预定标准来选择一个。 然后可以将对象插入到容器区域中,并且可以存储所生成的合成图像,或者可以存储参数,以在随后的处理步骤中使对象插入。

    Methods and apparatus for analyzing and image and for controlling a scanner
    45.
    发明授权
    Methods and apparatus for analyzing and image and for controlling a scanner 失效
    用于分析图像和控制扫描仪的方法和装置

    公开(公告)号:US06757081B1

    公开(公告)日:2004-06-29

    申请号:US09545223

    申请日:2000-04-07

    IPC分类号: H04N104

    摘要: A method analyzes an image to be scanned and analyzes at least part of the image pixel-by-pixel. During or after a preview scan, a characteristic is assigned to a plurality of pixels in the image and pixels are grouped according to similar characteristics. A representation of a least one of the characteristics corresponding to a group of pixels is communicated to the scanner. For example, the pixels may be analyzed to determining if the pixel is black or white. The pixels may also be analyzed to determining if a pixel is on an edge between black and white. Black pixels that are adjacent each other can be grouped together, and white pixels that are adjacent each other can also be grouped together. A region of an image with a relatively high number of black and white groups can be characterized as black and white text only. That characterization can then be used to properly set a scanner, for example, without user intervention, so that the final scan of the image can be done at 300 dpi with a low bit depth.

    摘要翻译: 一种方法分析要扫描的图像,并逐个像素地分析至少部分图像。 在预览扫描期间或之后,将特征分配给图像中的多个像素,并且根据相似的特征对像素进行分组。 将与一组像素对应的特征中的至少一个的表示传送到扫描仪。 例如,可以分析像素以确定像素是黑色还是白色。 还可以分析像素以确定像素是否在黑色和白色之间的边缘上。 彼此相邻的黑色像素可以分组在一起,并且彼此相邻的白色像素也可以被分组在一起。 具有相对较多黑白组的图像的区域可以被表征为仅黑色和白色文本。 然后可以使用该表征来适当地设置扫描仪,例如,无需用户干预,使得图像的最终扫描可以以低比特深度的300dpi进行。

    Segmenting a Web Page into Coherent Functional Blocks
    46.
    发明申请
    Segmenting a Web Page into Coherent Functional Blocks 审中-公开
    将网页分割成相干功能块

    公开(公告)号:US20130275854A1

    公开(公告)日:2013-10-17

    申请号:US13635410

    申请日:2010-04-19

    IPC分类号: G06F17/22

    CPC分类号: G06F17/2247 G06F17/2705

    摘要: Segmenting a web page (110) into coherent function blocks (705-1 to 705-8) includes parsing content from the web page (110) into multiple coherent, collectively exhaustive nodes (405-1 to 405-37); calculating at least one matrix (500, 600, 605-1 to 605-4) of affinity values between each of the nodes (405-1 to 405-37); and clustering the nodes (405-1 to 405-37) into functional blocks (705-1 to 705-8) based on the affinity values in the at least one matrix (500, 600, 605-1 to 605-4).

    摘要翻译: 将网页(110)分段成相干功能块(705-1至705-8)包括将来自网页(110)的内容解析为多个相干,共同穷举的节点(405-1至405-37); 计算每个节点(405-1至405-37)之间的亲和度值的至少一个矩阵(500,600,605-1至605-4); 以及基于所述至少一个矩阵(500,600,605-1至605-4)中的所述亲和度值将所述节点(405-1至405-37)聚类成功能块(705-1至705-8)。

    SYSTEMS AND METHODS FOR FILTERING WEB PAGE CONTENTS
    47.
    发明申请
    SYSTEMS AND METHODS FOR FILTERING WEB PAGE CONTENTS 审中-公开
    用于过滤网页内容的系统和方法

    公开(公告)号:US20130145255A1

    公开(公告)日:2013-06-06

    申请号:US13817366

    申请日:2010-08-20

    IPC分类号: G06F17/21

    摘要: A system and method for selectively filtering web page contents are disclosed. In one example embodiment a document object model (DOM) structure and visual information of the web page contents are generated. The document object model (DOM) structure and the visual information are analyzed to determine multiple web page content attributes. One or more filtering parameters are selected from the multiple web page content attributes. The web page is filtered based on the one or more filtering parameters.

    摘要翻译: 公开了一种用于选择性地过滤网页内容的系统和方法。 在一个示例实施例中,生成文档对象模型(DOM)结构和网页内容的视觉信息。 分析文档对象模型(DOM)结构和视觉信息以确定多个网页内容属性。 从多个网页内容属性中选择一个或多个过滤参数。 基于一个或多个过滤参数对网页进行过滤。

    System and Method for Web Content Extraction
    49.
    发明申请
    System and Method for Web Content Extraction 有权
    Web内容提取的系统和方法

    公开(公告)号:US20120303636A1

    公开(公告)日:2012-11-29

    申请号:US13258482

    申请日:2009-12-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30896 G06F3/1246

    摘要: A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted.

    摘要翻译: 公开了一种用于提取Web内容的方法和系统。 在一个实施例中,通过基于线间歇节点确定来识别Web内容中的段落来提取网页中的Web内容。 然后使用最大记分子序列来识别与识别的段落相关联的文本体的范围。 此外,使用基本上水平对齐的启发式规则来改进所识别的文本体。 此外,提取与Web内容相关联的一个或多个标题和一个或多个图像。 此外,输出包括识别的段落的Web内容,一个或多个标题和一个或多个图像。

    TEXT SEGMENTATION OF A DOCUMENT
    50.
    发明申请
    TEXT SEGMENTATION OF A DOCUMENT 审中-公开
    文件的文本分段

    公开(公告)号:US20120102388A1

    公开(公告)日:2012-04-26

    申请号:US13227136

    申请日:2011-09-07

    申请人: Jian Fan

    发明人: Jian Fan

    IPC分类号: G06F17/21

    CPC分类号: G06F17/2264 G06F17/218

    摘要: A system and method are provided for segmenting text from a portable document format (PDF) document. The system includes a memory for storing computer executable instructions and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions include an engine to group line segments into text blocks using a homogeneity measure based on relative line space difference between line segments and a homogeneity measure based on difference in font size between line segments, where the line segments comprise text elements extracted from the PDF document.

    摘要翻译: 提供了一种用于从便携式文档格式(PDF)文档分割文本的系统和方法。 该系统包括用于存储计算机可执行指令的存储器和用于访问存储器并执行计算机可执行指令的处理单元。 计算机可执行指令包括引擎,其使用基于线段之间的相对线间距差的均匀性度量和基于线段之间的字体大小差异的均匀性度量将线段分组成文本块,其中线段包括从 PDF文件。