Enhanced data conversion framework
    1.
    发明授权
    Enhanced data conversion framework 有权
    增强数据转换框架

    公开(公告)号:US08346819B2

    公开(公告)日:2013-01-01

    申请号:US12341463

    申请日:2008-12-22

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30569

    摘要: An enhanced data conversion framework, in which a data record in each of first and second data sources is populated with manually selected, representative sample data, the first and second data sources using different data storage schemas to store the representative sample data as instance values of instance elements. Parameters for a CONCATENATE function or an EXTRACT function are automatically determined based on a selected succession graph, and non-sample data is converted between the different data storage schemas of the first and second data sources, using the CONCATENATE function or the EXTRACT function.

    摘要翻译: 一种增强的数据转换框架,其中用手动选择的代表性采样数据填充第一和第二数据源中的每一个中的数据记录,使用不同的数据存储模式来存储第一和第二数据源,以将代表性样本数据存储为 实例元素。 CONCATENATE函数或EXTRACT函数的参数将根据所选择的继承图自动确定,并且使用CONCATENATE函数或EXTRACT函数在第一和第二数据源的不同数据存储模式之间转换非样本数据。

    Schema Matching for Data Migration
    2.
    发明申请
    Schema Matching for Data Migration 有权
    用于数据迁移的模式匹配

    公开(公告)号:US20080281820A1

    公开(公告)日:2008-11-13

    申请号:US12116352

    申请日:2008-05-07

    IPC分类号: G06F7/06

    摘要: Embodiments include a system for matching an element of a source schema to an element of a target schema. The system includes a processing unit and a communication unit. The processing unit may be configured to: identify a sample data item of the element of the target schema; match a part of the sample data item to a part of a sample instance of the source schema; and match the element of the source schema to which the part of the sample instance of the source schema belongs to the element of the target schema. The communication unit may be configured to: provide the sample data item through an interface and receive the sample instance of the source schema.

    摘要翻译: 实施例包括用于将源模式的元素与目标模式的元素进行匹配的系统。 该系统包括处理单元和通信单元。 处理单元可以被配置为:识别目标模式的元素的样本数据项; 将示例数据项的一部分与源模式的示例实例的一部分进行匹配; 并将源模式的示例实例的一部分所属的源模式的元素与目标模式的元素进行匹配。 通信单元可以被配置为:通过接口提供样本数据项并且接收源模式的样本实例。

    Graph based re-composition of document fragments for name entity recognition under exploitation of enterprise databases
    3.
    发明授权
    Graph based re-composition of document fragments for name entity recognition under exploitation of enterprise databases 有权
    基于图表的企业数据库开发下的名称实体识别文档片段的重组

    公开(公告)号:US08229883B2

    公开(公告)日:2012-07-24

    申请号:US12413611

    申请日:2009-03-30

    IPC分类号: G06F17/20 G06F17/30

    CPC分类号: G06F17/30622

    摘要: Methods and systems are described that involve recognizing complex entities from text documents with the help of structured data and Natural Language Processing (NLP) techniques. In one embodiment, the method includes receiving a document as input from a set of documents, wherein the document contains text or unstructured data. The method also includes identifying a plurality of text segments from the document via a set of tagging techniques. Further, the method includes matching the identified plurality of text segments against attributes of a set of predefined entities. Lastly, a best matching predefined entity is selected for each text segment from the plurality of text segments.In one embodiment, the system includes a set of documents, each document containing text or unstructured data. The system also includes a database storage unit that stores a set of predefined entities, wherein each entity contains a set of attributes. Further, the system includes a processor to identify a plurality of text segments from a document via a set of tagging techniques and to match the identified plurality of text segments against the set of attributes.

    摘要翻译: 描述了在结构化数据和自然语言处理(NLP)技术的帮助下,从文本文档中识别复杂实体的方法和系统。 在一个实施例中,该方法包括从一组文档接收文档作为输入,其中文档包含文本或非结构化数据。 该方法还包括经由一组标签技术从文档识别多个文本段。 此外,该方法包括将所识别的多个文本段与一组预定义实体的属性进行匹配。 最后,从多个文本段中为每个文本段选择最佳匹配的预定义实体。 在一个实施例中,系统包括一组文档,每个文档包含文本或非结构化数据。 该系统还包括存储一组预定义实体的数据库存储单元,其中每个实体包含一组属性。 此外,该系统包括处理器,用于经由一组标签技术从文档中识别多个文本段,并且将所识别的多个文本段与该属性集匹配。

    GRAPH BASED RE-COMPOSITION OF DOCUMENT FRAGMENTS FOR NAME ENTITY RECOGNITION UNDER EXPLOITATION OF ENTERPRISE DATABASES
    4.
    发明申请
    GRAPH BASED RE-COMPOSITION OF DOCUMENT FRAGMENTS FOR NAME ENTITY RECOGNITION UNDER EXPLOITATION OF ENTERPRISE DATABASES 有权
    基于图表的企业数据库使用名称实体识别文档片段的重组

    公开(公告)号:US20100250598A1

    公开(公告)日:2010-09-30

    申请号:US12413611

    申请日:2009-03-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30622

    摘要: Methods and systems are described that involve recognizing complex entities from text documents with the help of structured data and Natural Language Processing (NLP) techniques. In one embodiment, the method includes receiving a document as input from a set of documents, wherein the document contains text or unstructured data. The method also includes identifying a plurality of text segments from the document via a set of tagging techniques. Further, the method includes matching the identified plurality of text segments against attributes of a set of predefined entities. Lastly, a best matching predefined entity is selected for each text segment from the plurality of text segments.In one embodiment, the system includes a set of documents, each document containing text or unstructured data. The system also includes a database storage unit that stores a set of predefined entities, wherein each entity contains a set of attributes. Further, the system includes a processor to identify a plurality of text segments from a document via a set of tagging techniques and to match the identified plurality of text segments against the set of attributes.

    摘要翻译: 描述了在结构化数据和自然语言处理(NLP)技术的帮助下,从文本文档中识别复杂实体的方法和系统。 在一个实施例中,该方法包括从一组文档接收文档作为输入,其中文档包含文本或非结构化数据。 该方法还包括经由一组标签技术从文档识别多个文本段。 此外,该方法包括将所识别的多个文本段与一组预定义实体的属性进行匹配。 最后,从多个文本段中为每个文本段选择最佳匹配的预定义实体。 在一个实施例中,系统包括一组文档,每个文档包含文本或非结构化数据。 该系统还包括存储一组预定义实体的数据库存储单元,其中每个实体包含一组属性。 此外,该系统包括处理器,用于经由一组标签技术从文档中识别多个文本段,并且将所识别的多个文本段与该属性集匹配。

    Method and system for managing data quality
    5.
    发明授权
    Method and system for managing data quality 有权
    管理数据质量的方法和系统

    公开(公告)号:US07676523B2

    公开(公告)日:2010-03-09

    申请号:US11785929

    申请日:2007-04-20

    IPC分类号: G06F7/00

    CPC分类号: G05B23/0221

    摘要: A method and system are described for managing data quality. An example method may include obtaining a first data stream interval including a first group of data items and a first aggregated data quality value associated with a quality of obtaining the first group, each data item including data attribute values, each data quality item including data quality attribute values associated with one of the data items. The first aggregated data quality value, a first indicator associating the first aggregated data quality value with the first group, and the first group may be selected. The first group and the first indicator may be stored in a user table of a database. A data quality table associated with the user table may be determined based on an entry in a system table. The first aggregated data quality value and the first indicator may be stored in the data quality table.

    摘要翻译: 描述了一种用于管理数据质量的方法和系统。 示例性方法可以包括获得包括第一组数据项的第一数据流间隔和与获得第一组的质量相关联的第一聚合数据质量值,每个数据项包括数据属性值,每个数据质量项包括数据质量 属性值与其中一个数据项相关联。 可以选择第一聚合数据质量值,将第一聚合数据质量值与第一组相关联的第一指示符和第一组。 第一组和第一指示符可以存储在数据库的用户表中。 可以基于系统表中的条目来确定与用户表相关联的数据质量表。 第一聚合数据质量值和第一指示符可以存储在数据质量表中。

    Method and system for including data quality in data streams
    6.
    发明申请
    Method and system for including data quality in data streams 有权
    在数据流中包含数据质量的方法和系统

    公开(公告)号:US20080263062A1

    公开(公告)日:2008-10-23

    申请号:US11785928

    申请日:2007-04-20

    IPC分类号: G06F7/00

    摘要: A method and system are described for including data quality in data streams. An example method may include obtaining a first group of data items, each data item including one or more data attribute values. A first group of data quality items may be determined, each data quality item including one or more data quality attribute values associated with one of the data items of the first group. A first aggregated data quality value may be determined based on the first group of data quality items. A first data stream interval including the first group of data items and the first aggregated data quality value may be output.

    摘要翻译: 描述了一种在数据流中包括数据质量的方法和系统。 示例性方法可以包括获得第一组数据项,每个数据项包括一个或多个数据属性值。 可以确定第一组数据质量项目,每个数据质量项目包括与第一组数据项之一相关联的一个或多个数据质量属性值。 可以基于第一组数据质量项来确定第一聚合数据质量值。 可以输出包括第一组数据项和第一聚合数据质量值的第一数据流间隔。

    ENHANCED DATA CONVERSION FRAMEWORK
    7.
    发明申请
    ENHANCED DATA CONVERSION FRAMEWORK 有权
    增强的数据转换框架

    公开(公告)号:US20100161666A1

    公开(公告)日:2010-06-24

    申请号:US12341463

    申请日:2008-12-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30569

    摘要: An enhanced data conversion framework, in which a data record in each of first and second data sources is populated with manually selected, representative sample data, the first and second data sources using different data storage schemas to store the representative sample data as instance values of instance elements. Parameters for a CONCATENATE function or an EXTRACT function are automatically determined based on a selected succession graph, and non-sample data is converted between the different data storage schemas of the first and second data sources, using the CONCATENATE function or the EXTRACT function.

    摘要翻译: 一种增强的数据转换框架,其中用手动选择的代表性采样数据填充第一和第二数据源中的每一个中的数据记录,使用不同的数据存储模式来存储第一和第二数据源,以将代表性样本数据存储为 实例元素。 CONCATENATE函数或EXTRACT函数的参数将根据所选择的继承图自动确定,并且使用CONCATENATE函数或EXTRACT函数在第一和第二数据源的不同数据存储模式之间转换非样本数据。

    Method and system for managing data quality
    8.
    发明申请
    Method and system for managing data quality 有权
    管理数据质量的方法和系统

    公开(公告)号:US20080263096A1

    公开(公告)日:2008-10-23

    申请号:US11785929

    申请日:2007-04-20

    IPC分类号: G06F7/00

    CPC分类号: G05B23/0221

    摘要: A method and system are described for managing data quality. An example method may include obtaining a first data stream interval including a first group of data items and a first aggregated data quality value associated with a quality of obtaining the first group, each data item including data attribute values, each data quality item including data quality attribute values associated with one of the data items. The first aggregated data quality value, a first indicator associating the first aggregated data quality value with the first group, and the first group may be selected. The first group and the first indicator may be stored in a user table of a database. A data quality table associated with the user table may be determined based on an entry in a system table. The first aggregated data quality value and the first indicator may be stored in the data quality table.

    摘要翻译: 描述了一种用于管理数据质量的方法和系统。 示例性方法可以包括获得包括第一组数据项的第一数据流间隔和与获得第一组的质量相关联的第一聚合数据质量值,每个数据项包括数据属性值,每个数据质量项包括数据质量 属性值与其中一个数据项相关联。 可以选择第一聚合数据质量值,将第一聚合数据质量值与第一组相关联的第一指示符和第一组。 第一组和第一指示符可以存储在数据库的用户表中。 可以基于系统表中的条目来确定与用户表相关联的数据质量表。 第一聚合数据质量值和第一指示符可以存储在数据质量表中。

    Schema matching for data migration
    9.
    发明授权
    Schema matching for data migration 有权
    用于数据迁移的模式匹配

    公开(公告)号:US09280569B2

    公开(公告)日:2016-03-08

    申请号:US12116352

    申请日:2008-05-07

    IPC分类号: G06F17/30

    摘要: Embodiments include a system for matching an element of a source schema to an element of a target schema. The system includes a processing unit and a communication unit. The processing unit may be configured to: identify a sample data item of the element of the target schema; match a part of the sample data item to a part of a sample instance of the source schema; and match the element of the source schema to which the part of the sample instance of the source schema belongs to the element of the target schema. The communication unit may be configured to: provide the sample data item through an interface and receive the sample instance of the source schema.

    摘要翻译: 实施例包括用于将源模式的元素与目标模式的元素进行匹配的系统。 该系统包括处理单元和通信单元。 处理单元可以被配置为:识别目标模式的元素的样本数据项; 将示例数据项的一部分与源模式的示例实例的一部分进行匹配; 并将源模式的示例实例的一部分所属的源模式的元素与目标模式的元素进行匹配。 通信单元可以被配置为:通过接口提供样本数据项并且接收源模式的样本实例。

    Method and system for including data quality in data streams
    10.
    发明授权
    Method and system for including data quality in data streams 有权
    在数据流中包含数据质量的方法和系统

    公开(公告)号:US07676522B2

    公开(公告)日:2010-03-09

    申请号:US11785928

    申请日:2007-04-20

    IPC分类号: G06F7/00

    摘要: A method and system are described for including data quality in data streams. An example method may include obtaining a first group of data items, each data item including one or more data attribute values. A first group of data quality items may be determined, each data quality item including one or more data quality attribute values associated with one of the data items of the first group. A first aggregated data quality value may be determined based on the first group of data quality items. A first data stream interval including the first group of data items and the first aggregated data quality value may be output.

    摘要翻译: 描述了一种在数据流中包括数据质量的方法和系统。 示例性方法可以包括获得第一组数据项,每个数据项包括一个或多个数据属性值。 可以确定第一组数据质量项目,每个数据质量项目包括与第一组数据项之一相关联的一个或多个数据质量属性值。 可以基于第一组数据质量项来确定第一聚合数据质量值。 可以输出包括第一组数据项和第一聚合数据质量值的第一数据流间隔。