Method of merging large databases in parallel

    公开(公告)号:US5717915A

    公开(公告)日:1998-02-10

    申请号:US610639

    申请日:1996-03-04

    摘要: The semantic integration problem for merging multiple databases of very large size, the merge/purge problem, can be solved by multiple runs of the sorted neighborhood method or the clustering method with small windows followed by the computation of the transitive closure over the results of each run. The sorted neighborhood method works well under this scheme but is computationally expensive due to the sorting phase. An alternative method based on data clustering that reduces the complexity to linear time making multiple runs followed by transitive closure feasible and efficient. A method is provided for identifying duplicate records in a database, each record having at least one field and a plurality of keys, including the steps of sorting the records according to a criteria applied to a first key; comparing a number of consecutive sorted records to each other, wherein the number is less than a number of records in said database and identifying a first group of duplicate records; storing the identity of the first group; sorting the records according to a criteria applied to a second key; comparing a number of consecutive sorted records to each other, wherein the number is less than a number of records in said database and identifying a second group of duplicate records; storing the identity of the second group; and subjecting the union of the first and second groups to transitive closure.

    Method and apparatus for imaging, image processing and data compression
merge/purge techniques for document image databases
    2.
    发明授权
    Method and apparatus for imaging, image processing and data compression merge/purge techniques for document image databases 失效
    用于文件图像数据库的成像,图像处理和数据压缩合并/清除技术的方法和装置

    公开(公告)号:US5668897A

    公开(公告)日:1997-09-16

    申请号:US488333

    申请日:1995-06-07

    IPC分类号: G06F17/30 G06K9/00 G06K9/36

    摘要: A method for processing an image, consisting of a foreground and a background, to produce a highly compressed and accurate representation of the image, including the steps of scanning the image to create a digital image of the image, comparing the digital image against a codebook of stored digital images; matching the digital image with one of the stored digital images of the codebook; producing an index code identifying the background of the stored digital image as having matched the digital image; subtracting the stored digital image from the digital image to produce a second digital image representing the foreground of the stored digital image; and storing the second digital image with the index code. Techniques are also provided to enable merge/purge of the database(s) thereby created.

    摘要翻译: 一种用于处理由前景和背景组成的图像以产生图像的高度压缩和精确表示的方法,包括扫描图像以创建图像的数字图像的步骤,将数字图像与码本进行比较 存储的数字图像; 将数字图像与码本的所存储的数字图像之一进行匹配; 产生将所存储的数字图像的背景识别为与数字图像相匹配的索引码; 从数字图像中减去所存储的数字图像,以产生表示所存储的数字图像的前景的第二数字图像; 并存储具有索引码的第二数字图像。 还提供了技术来使得能够合并/清除由此创建的数据库。

    Method and apparatus for imaging, image processing and data compression
    3.
    发明授权
    Method and apparatus for imaging, image processing and data compression 失效
    用于成像,图像处理和数据压缩的方法和装置

    公开(公告)号:US5748780A

    公开(公告)日:1998-05-05

    申请号:US259527

    申请日:1994-06-14

    IPC分类号: G06F17/30 G06K9/00 G06K9/36

    摘要: A method for processing an image, consisting of a foreground and a background, to produce a highly compressed and accurate representation of the image, including the steps of scanning the image to create a digital image of the image, comparing the digital image against a codebook of stored digital images; matching the digital image with one of the stored digital images of the codebook; producing an index code identifying the background of the stored digital image as having matched the digital image; subtracting the stored digital image from the digital image to produce a second digital image representing the foreground of the stored digital image; and storing the second digital image with the index code. An apparatus is also provided for compressing images having a foreground and a background, consisting of an image scanner, a template image storage device for storing background templates, a processor system for matching a scanned image of the image with one of the background templates, resulting in a template identifier, a processor system for compensating the scanned image for the matched template to produce a foreground image, and a data compression system for compressing the foreground image.

    摘要翻译: 一种用于处理由前景和背景组成的图像以产生图像的高度压缩和精确表示的方法,包括扫描图像以创建图像的数字图像的步骤,将数字图像与码本进行比较 存储的数字图像; 将数字图像与码本的所存储的数字图像之一进行匹配; 产生将所存储的数字图像的背景识别为与数字图像相匹配的索引码; 从数字图像中减去所存储的数字图像,以产生表示所存储的数字图像的前景的第二数字图像; 并存储具有索引码的第二数字图像。 还提供了一种用于压缩具有前景和背景的图像的装置,包括图像扫描器,用于存储背景模板的模板图像存储装置,用于将图像的扫描图像与背景模板之一匹配的处理器系统,产生 在模板标识符中,用于补偿用于匹配模板的扫描图像以产生前景图像的处理器系统,以及用于压缩前景图像的数据压缩系统。

    Method of merging large databases in parallel
    4.
    发明授权
    Method of merging large databases in parallel 失效
    并行大型数据库的合并方法

    公开(公告)号:US5497486A

    公开(公告)日:1996-03-05

    申请号:US213795

    申请日:1994-03-15

    摘要: The semantic integration problem for merging multiple databases of very large size, the merge/purge problem, can be solved by multiple runs of the sorted neighborhood method or the clustering method with small windows followed by the computation of the transitive closure over the results of each run. The sorted neighborhood method works well under this scheme but is computationally expensive due to the sorting phase. An alternative method based on data clustering that reduces the complexity to linear time making multiple runs followed by transitive closure feasible and efficient. A method is provided for identifying duplicate records in a database, each record having at least one field and a plurality of keys, including the steps of sorting the records according to a criteria applied to a first key; comparing a number of consecutive sorted records to each other, wherein the number is less than a number of records in said database and identifying a first group of duplicate records; storing the identity of the first group; sorting the records according to a criteria applied to a second key; comparing a number of consecutive sorted records to each other, wherein the number is less than a number of records in said database and identifying a second group of duplicate records; storing the identity of the second group; and subjecting the union of the first and second groups to transitive closure.

    摘要翻译: 合并/清除问题的多个数据库的语义集成问题可以通过多次运行的排序邻域方法或使用小窗口的聚类方法来解决,然后计算每个结果的传递闭包 跑。 排序的邻域方法在该方案下工作良好,但是由于分类阶段,计算费用很高。 一种基于数据聚类的替代方法,可以将复杂度降低到线性时间,从而实现多次运行,然后传递闭包可行且高效。 提供了一种用于识别数据库中的重复记录的方法,每个记录具有至少一个字段和多个密钥,包括根据应用于第一密钥的准则对记录进行排序的步骤; 将多个连续排序的记录彼此进行比较,其中所述数目小于所述数据库中的记录数量,并且识别第一组重复记录; 存储第一组的身份; 根据应用于第二个键的标准对记录进行排序; 将多个连续排序的记录彼此进行比较,其中所述数量小于所述数据库中的记录数量,并且识别第二组重复记录; 存储第二组的身份; 并使第一组和第二组的联合进行传递闭合。