Linking Data Elements Based on Similarity Data Values and Semantic Annotations
    4.
    发明申请
    Linking Data Elements Based on Similarity Data Values and Semantic Annotations 审中-公开
    基于相似性数据值和语义​​注释链接数据元素

    公开(公告)号:US20130332466A1

    公开(公告)日:2013-12-12

    申请号:US13491724

    申请日:2012-06-08

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked.

    摘要翻译: 来自数据源并且具有数据值集合的数据元素通过使用散列函数来链接,以基于与该数据元素相关联的所有数据值来确定每个数据元素的尺寸上减小的实例签名,以产生多个等距固定的尺寸缩小的实例签名 大小,使得在多个实例签名之间保持跨所有数据元素的数据值中的数据值之间的相似性。 使用位置敏感哈希函数中的多个实例签名来识别要链接的候选数据元素对,并且使用预定的相似度测量为每个候选对生成相似性索引。 具有高于给定阈值的相似性指数的候选对的数据元素被链接。

    Querying and integrating structured and unstructured data
    8.
    发明授权
    Querying and integrating structured and unstructured data 有权
    查询和整合结构化和非结构化数据

    公开(公告)号:US09037615B2

    公开(公告)日:2015-05-19

    申请号:US13493174

    申请日:2012-06-11

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30946 G06F17/30292

    摘要: A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity in-formation comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data.

    摘要翻译: 用于查询和整合结构化和非结构化数据的计算机实现的方法,系统和制造。 该方法包括:使用开放域信息提取系统接收从第一组非结构化数据提取的实体信息,其中所述实体信息包括第一组非结构化数据的第一实体与第二实体之间的关系信息; 基于所述关系信息识别模式,并基于所述模式为所述第一组非结构化数据创建模式; 并且将所创建的模式的元素与(i)第二组非结构化数据的实体相关联,或者(ii)现有结构化数据集合的模式元素,如果所创建的模式元素与第二组之间存在足够的总体相似度 非结构化数据实体或现有结构化数据的架构元素。

    QUERYING AND INTEGRATING STRUCTURED AND INSTRUCTURED DATA
    10.
    发明申请
    QUERYING AND INTEGRATING STRUCTURED AND INSTRUCTURED DATA 有权
    查询和整合结构化和结构化数据

    公开(公告)号:US20130332478A1

    公开(公告)日:2013-12-12

    申请号:US13493174

    申请日:2012-06-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30946 G06F17/30292

    摘要: A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data.

    摘要翻译: 用于查询和整合结构化和非结构化数据的计算机实现的方法,系统和制造。 该方法包括:使用开放域信息提取系统接收从第一组非结构化数据提取的实体信息,其中实体信息包括第一组非结构化数据的第一实体与第二实体之间的关系信息; 基于所述关系信息识别模式,并基于所述模式为所述第一组非结构化数据创建模式; 并且将所创建的模式的元素与(i)第二组非结构化数据的实体相关联,或者(ii)现有结构化数据集合的模式元素,如果所创建的模式元素与第二组之间存在足够的总体相似度 非结构化数据实体或现有结构化数据的架构元素。