Segmenting pixels in an image based on orientation-dependent adaptive thresholds

    公开(公告)号:US20060062454A1

    公开(公告)日:2006-03-23

    申请号:US10948822

    申请日:2004-09-23

    IPC分类号: G06K9/34 G06K9/00

    摘要: Methods, machines, and computer-readable media storing machine-readable instructions for segmenting pixels in an image are described. In one aspect, a region of background pixels is identified in the image. At least some of the background pixels in the region are located on a boundary spatially delimiting the region. One or more orientation-dependent adaptive thresholds are determined for one or more respective candidate growth directions from a given background pixel located on the region boundary. Color distances between the given background pixel and candidate pixels in a neighborhood of the given background pixel are determined. The region is grown based on application of the one or more orientation-dependent adaptive thresholds to the determined color distances.

    Image processing methods and systems
    43.
    发明申请
    Image processing methods and systems 失效
    图像处理方法和系统

    公开(公告)号:US20050213848A1

    公开(公告)日:2005-09-29

    申请号:US10809234

    申请日:2004-03-25

    申请人: Jian Fan Hui Chao

    发明人: Jian Fan Hui Chao

    IPC分类号: G06T11/60 G06K9/00 G06K1/00

    CPC分类号: G06T11/60

    摘要: Systems and methods according to the present invention provide techniques to automatically insert an object from one image into a region of another image. The systems and methods require little or no user interaction to allow efficient re-use and updating of existing images, presentations, documents and the like. An object and a container region are identified. Feasible placement location(s) within the container region for the object, as well as an associated scale factor, are determined. If multiple feasible placement locations are identified for a particular scale factor, then one is selected based upon predetermined criteria. The object can then be inserted into the container region and the resulting composite image stored or, alternatively, parameters can be stored which enable object insertion at a subsequent processing step.

    摘要翻译: 根据本发明的系统和方法提供了将对象从一个图像自动插入到另一个图像的区域中的技术。 系统和方法需要很少或没有用户交互以允许有效地重新使用和更新现有图像,演示文稿,文档等。 识别对象和容器区域。 确定对象的容器区域内的可行放置位置以及相关的比例因子。 如果针对特定的比例因子识别出多个可行的放置位置,则基于预定标准来选择一个。 然后可以将对象插入到容器区域中,并且可以存储所生成的合成图像,或者可以存储参数,以在随后的处理步骤中使对象插入。

    Methods and apparatus for analyzing and image and for controlling a scanner
    44.
    发明授权
    Methods and apparatus for analyzing and image and for controlling a scanner 失效
    用于分析图像和控制扫描仪的方法和装置

    公开(公告)号:US06757081B1

    公开(公告)日:2004-06-29

    申请号:US09545223

    申请日:2000-04-07

    IPC分类号: H04N104

    摘要: A method analyzes an image to be scanned and analyzes at least part of the image pixel-by-pixel. During or after a preview scan, a characteristic is assigned to a plurality of pixels in the image and pixels are grouped according to similar characteristics. A representation of a least one of the characteristics corresponding to a group of pixels is communicated to the scanner. For example, the pixels may be analyzed to determining if the pixel is black or white. The pixels may also be analyzed to determining if a pixel is on an edge between black and white. Black pixels that are adjacent each other can be grouped together, and white pixels that are adjacent each other can also be grouped together. A region of an image with a relatively high number of black and white groups can be characterized as black and white text only. That characterization can then be used to properly set a scanner, for example, without user intervention, so that the final scan of the image can be done at 300 dpi with a low bit depth.

    摘要翻译: 一种方法分析要扫描的图像,并逐个像素地分析至少部分图像。 在预览扫描期间或之后,将特征分配给图像中的多个像素,并且根据相似的特征对像素进行分组。 将与一组像素对应的特征中的至少一个的表示传送到扫描仪。 例如,可以分析像素以确定像素是黑色还是白色。 还可以分析像素以确定像素是否在黑色和白色之间的边缘上。 彼此相邻的黑色像素可以分组在一起,并且彼此相邻的白色像素也可以被分组在一起。 具有相对较多黑白组的图像的区域可以被表征为仅黑色和白色文本。 然后可以使用该表征来适当地设置扫描仪,例如,无需用户干预,使得图像的最终扫描可以以低比特深度的300dpi进行。

    Segmenting a Web Page into Coherent Functional Blocks
    45.
    发明申请
    Segmenting a Web Page into Coherent Functional Blocks 审中-公开
    将网页分割成相干功能块

    公开(公告)号:US20130275854A1

    公开(公告)日:2013-10-17

    申请号:US13635410

    申请日:2010-04-19

    IPC分类号: G06F17/22

    CPC分类号: G06F17/2247 G06F17/2705

    摘要: Segmenting a web page (110) into coherent function blocks (705-1 to 705-8) includes parsing content from the web page (110) into multiple coherent, collectively exhaustive nodes (405-1 to 405-37); calculating at least one matrix (500, 600, 605-1 to 605-4) of affinity values between each of the nodes (405-1 to 405-37); and clustering the nodes (405-1 to 405-37) into functional blocks (705-1 to 705-8) based on the affinity values in the at least one matrix (500, 600, 605-1 to 605-4).

    摘要翻译: 将网页(110)分段成相干功能块(705-1至705-8)包括将来自网页(110)的内容解析为多个相干,共同穷举的节点(405-1至405-37); 计算每个节点(405-1至405-37)之间的亲和度值的至少一个矩阵(500,600,605-1至605-4); 以及基于所述至少一个矩阵(500,600,605-1至605-4)中的所述亲和度值将所述节点(405-1至405-37)聚类成功能块(705-1至705-8)。

    SYSTEMS AND METHODS FOR FILTERING WEB PAGE CONTENTS
    46.
    发明申请
    SYSTEMS AND METHODS FOR FILTERING WEB PAGE CONTENTS 审中-公开
    用于过滤网页内容的系统和方法

    公开(公告)号:US20130145255A1

    公开(公告)日:2013-06-06

    申请号:US13817366

    申请日:2010-08-20

    IPC分类号: G06F17/21

    摘要: A system and method for selectively filtering web page contents are disclosed. In one example embodiment a document object model (DOM) structure and visual information of the web page contents are generated. The document object model (DOM) structure and the visual information are analyzed to determine multiple web page content attributes. One or more filtering parameters are selected from the multiple web page content attributes. The web page is filtered based on the one or more filtering parameters.

    摘要翻译: 公开了一种用于选择性地过滤网页内容的系统和方法。 在一个示例实施例中,生成文档对象模型(DOM)结构和网页内容的视觉信息。 分析文档对象模型(DOM)结构和视觉信息以确定多个网页内容属性。 从多个网页内容属性中选择一个或多个过滤参数。 基于一个或多个过滤参数对网页进行过滤。

    System and Method for Web Content Extraction
    48.
    发明申请
    System and Method for Web Content Extraction 有权
    Web内容提取的系统和方法

    公开(公告)号:US20120303636A1

    公开(公告)日:2012-11-29

    申请号:US13258482

    申请日:2009-12-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30896 G06F3/1246

    摘要: A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted.

    摘要翻译: 公开了一种用于提取Web内容的方法和系统。 在一个实施例中,通过基于线间歇节点确定来识别Web内容中的段落来提取网页中的Web内容。 然后使用最大记分子序列来识别与识别的段落相关联的文本体的范围。 此外,使用基本上水平对齐的启发式规则来改进所识别的文本体。 此外,提取与Web内容相关联的一个或多个标题和一个或多个图像。 此外,输出包括识别的段落的Web内容,一个或多个标题和一个或多个图像。

    TEXT SEGMENTATION OF A DOCUMENT
    49.
    发明申请
    TEXT SEGMENTATION OF A DOCUMENT 审中-公开
    文件的文本分段

    公开(公告)号:US20120102388A1

    公开(公告)日:2012-04-26

    申请号:US13227136

    申请日:2011-09-07

    申请人: Jian Fan

    发明人: Jian Fan

    IPC分类号: G06F17/21

    CPC分类号: G06F17/2264 G06F17/218

    摘要: A system and method are provided for segmenting text from a portable document format (PDF) document. The system includes a memory for storing computer executable instructions and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions include an engine to group line segments into text blocks using a homogeneity measure based on relative line space difference between line segments and a homogeneity measure based on difference in font size between line segments, where the line segments comprise text elements extracted from the PDF document.

    摘要翻译: 提供了一种用于从便携式文档格式(PDF)文档分割文本的系统和方法。 该系统包括用于存储计算机可执行指令的存储器和用于访问存储器并执行计算机可执行指令的处理单元。 计算机可执行指令包括引擎,其使用基于线段之间的相对线间距差的均匀性度量和基于线段之间的字体大小差异的均匀性度量将线段分组成文本块,其中线段包括从 PDF文件。

    Image processing methods and systems
    50.
    发明授权
    Image processing methods and systems 有权
    图像处理方法和系统

    公开(公告)号:US07672507B2

    公开(公告)日:2010-03-02

    申请号:US10768461

    申请日:2004-01-30

    申请人: Jian Fan

    发明人: Jian Fan

    IPC分类号: G06K9/00 G06K9/48

    摘要: Systems and methods according to the present invention provide techniques to reliably detect edges, lines and quadrilaterals, especially those with low local contrast, in color images. Edges can be detected using a color gradient operator is based on color distance with a non-linear weight determined by the consistency of local gradient orientations, thereby significantly improving the signal/noise ratio. In detecting lines, a variant of the Gradient Weighted Hough Transform can be used employing both the edge strength and orientation. Multiple non-overlapping quadrilaterals can be detected using a process which includes quality metrics (for both individual quadrilaterals and for a set of non-overlapping quadrilaterals) and a graph-searching method.

    摘要翻译: 根据本发明的系统和方法提供了在彩色图像中可靠地检测边缘,线和四边形,特别是具有低局部对比度的边缘,四边形的技术。 可以使用颜色梯度运算符检测边缘,其基于由局部梯度取向的一致性确定的非线性权重的颜色距离,从而显着提高信号/噪声比。 在检测线中,可以使用梯度加权霍夫变换的变体来采用边缘强度和方向。 可以使用包括质量度量(对于单个四边形和一组非重叠四边形)的处理和图形搜索方法来检测多个不重叠的四边形。