Methods, apparatus and computer programs for evaluating and using a resilient data representation
    51.
    发明申请
    Methods, apparatus and computer programs for evaluating and using a resilient data representation 有权
    用于评估和使用弹性数据表示的方法,装置和计算机程序

    公开(公告)号:US20060026157A1

    公开(公告)日:2006-02-02

    申请号:US10880141

    申请日:2004-06-29

    Abstract: Provided are methods, apparatus and computer programs for evaluating the resilience, to structural changes in a data source, of a representative label representing a data element within the data source. Also disclosed are applications using a resilient representative label. For example, a representative label may represent a particular data field or other data element within a semi-structured data source - such as within XML or HTML Web pages. An estimate of resilience to changes can be used to determine whether a candidate representative label satisfies a required degree of resilience, or to enable selection of a label with the highest resilience score among a set of representative labels. The validated or selected representative label may then be used for data extraction, remaining usable despite the possibility of future changes to the structure of a Web page, or for template clustering/classification.

    Abstract translation: 提供了用于评估表示数据源中的数据元素的代表性标签的弹性(数据源中的结构变化)的方法,装置和计算机程序。 还公开了使用弹性代表性标签的应用。 例如,代表性标签可以表示半结构化数据源中的特定数据字段或其他数据元素,例如在XML或HTML网页内。 可以使用对变化的弹性的估计来确定候选代表标签是否满足所需的弹性程度,或者使得能够在一组代表性标签中选择具有最高回弹分数的标签。 经验证或选择的代表性标签然后可用于数据提取,尽管可能将来会改变网页的结构,或用于模板聚类/分类,仍然可用。

Patent Agency Ranking