Methods and apparatus for mapping source schemas to a target schema using schema embedding
    1.
    发明授权
    Methods and apparatus for mapping source schemas to a target schema using schema embedding 有权
    使用模式嵌入将源模式映射到目标模式的方法和装置

    公开(公告)号:US07921072B2

    公开(公告)日:2011-04-05

    申请号:US11141357

    申请日:2005-05-31

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: G06F17/3092

    摘要: Methods and apparatus are provided for mapping XML source documents to target documents using schema embeddings. According to one aspect of the invention, one or more edges in the one or more source schemas are mapped to one or more paths in at least one target schema. The disclosed mapping techniques ensure that (i) one or more source documents that conform to one or more of the source schemas can be recovered from one or more target documents that conform to the at least one target schema, if a mapping exists between the one or more of the source schemas and the at least one target schema; (ii) queries on one or more source documents that conform to one or more of the source schemas in a given query language can be answered on one or more target documents that conform to the at least one target schema; and (iii) the one or more target documents conform to a target schema.

    摘要翻译: 提供了使用模式嵌入将XML源文档映射到目标文档的方法和装置。 根据本发明的一个方面,一个或多个源模式中的一个或多个边缘被映射到至少一个目标模式中的一个或多个路径。 所公开的映射技术确保(i)符合一个或多个源模式的一个或多个源文档可以从符合至少一个目标模式的一个或多个目标文档中恢复,如果一个 或更多的源模式和至少一个目标模式; (ii)可以在符合所述至少一个目标模式的一个或多个目标文档上回答关于符合给定查询语言中的一个或多个源模式的一个或多个源文档的查询; 和(iii)一个或多个目标文档符合目标模式。

    Equivalence class-based method and apparatus for cost-based repair of database constraint violations
    2.
    发明授权
    Equivalence class-based method and apparatus for cost-based repair of database constraint violations 有权
    基于类的基于类的方法和设备,用于数据库约束违规的基于成本的修复

    公开(公告)号:US08224863B2

    公开(公告)日:2012-07-17

    申请号:US11025846

    申请日:2004-12-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30371 G06F17/3051

    摘要: Methods and apparatus are provided for identifying constraint violation repairs in data that is comprised of a plurality of records, where each record has a plurality of cells. A database is processed, based on a plurality of constraints that data in the database must satisfy. At least one constraint violation to be resolved is identified based on a cost of repair and the corresponding records to be resolved and equivalent cells are identified in the data that violate the identified at least one constraint violation. A value for each of the equivalent cells can optionally be determined, and the determined value can be assigned to each of the equivalent cells. The at least one constraint violation selected for resolution may be, for example, the constraint violation with a lowest cost. The cost of repairing a constraint is based on a distance metric between the attributes values.

    摘要翻译: 提供了用于识别由多个记录组成的数据中的约束违规修复的方法和装置,其中每个记录具有多个单元。 基于数据库中的数据必须满足的多个约束来处理数据库。 基于修复成本和要解析的相应记录以及违反所识别的至少一个约束违规的数据中标识等价单元来识别要解决的至少一个约束违规。 可以可选地确定每个等效单元的值,并且可以将确定的值分配给每个等效单元。 选择用于解决的至少一个约束违反可以是例如具有最低成本的约束违反。 修复约束的成本基于属性值之间的距离度量。

    Methods and apparatus for contextual schema mapping of source documents to target documents
    3.
    发明申请
    Methods and apparatus for contextual schema mapping of source documents to target documents 审中-公开
    将源文档的语境模式映射到目标文档的方法和装置

    公开(公告)号:US20080027930A1

    公开(公告)日:2008-01-31

    申请号:US11496271

    申请日:2006-07-31

    IPC分类号: G06F17/30

    CPC分类号: G06F16/20 G06F16/285

    摘要: Methods and apparatus are provided for improved schema mapping of source documents to target documents. A list of matches are generated between at least one source table and at least one target table. One or more of the matches are annotated with a logical condition providing a context in which the match applies. Matches can be annotated with a logical condition, for example, by generating a set of candidate view conditions, C, to be applied to the one or more source tables. A schema match algorithm can generate the list of matches. Candidate logical conditions can be identified, for example, by (i) creating a set of views for categorical attributes in the tables and adding a view for each partitioning of the attribute values; (ii) using a classifier built on target attribute values; or (iii) evaluating internal features of a source table.

    摘要翻译: 提供了方法和设备,用于改进源文档到目标文档的模式映射。 在至少一个源表和至少一个目标表之间生成匹配列表。 匹配中的一个或多个用提供匹配适用的上下文的逻辑条件进行注释。 匹配可以用逻辑条件来注释,例如,通过生成要应用于一个或多个源表的候选视图条件C的集合。 模式匹配算法可以生成匹配列表。 可以识别候选逻辑条件,例如,(i)为表中的分类属性创建一组视图,并为属性值的每个分区添加视图; (ii)使用基于目标属性值的分类器; 或(iii)评估源表的内部特征。

    Methods and Apparatus for User-Guided Inference of Regular Expressions for Information Extraction
    4.
    发明申请
    Methods and Apparatus for User-Guided Inference of Regular Expressions for Information Extraction 审中-公开
    用于信息提取的正则表达式的用户引导推理的方法和装置

    公开(公告)号:US20080133443A1

    公开(公告)日:2008-06-05

    申请号:US11565213

    申请日:2006-11-30

    IPC分类号: G06N5/04

    CPC分类号: G06F17/2705

    摘要: Methods and apparatus are provided for inferring regular expressions that parse and extract information from line-oriented data. A regular expression is generated that matches a line of text by: evaluating a plurality of characters of the line of text to identify one or more domains associated with each of the plurality of characters; assigning a run-length to each of the identified domains; populating a data structure having a data position corresponding to each of the characters with the identified domains and corresponding run-lengths; and generating the regular expression based on the data structure.

    摘要翻译: 提供了方法和装置,用于推导从线性数据解析和提取信息的正则表达式。 生成与文本行匹配的正则表达式:评估文本行的多个字符以识别与多个字符中的每一个相关联的一个或多个域; 为每个所识别的域分配游程长度; 填充具有与具有所识别的域的每个字符相对应的数据位置的数据结构和对应的游程长度; 并基于数据结构生成正则表达式。