发明授权
- 专利标题: Hierarchical conditional random fields for web extraction
- 专利标题(中): Web提取的分层条件随机字段
-
申请号: US11461400申请日: 2006-07-31
-
公开(公告)号: US07720830B2公开(公告)日: 2010-05-18
- 发明人: Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
- 申请人: Ji-Rong Wen , Wei-Ying Ma , Zaiqing Nie , Jun Zhu
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 代理机构: Perkins Coie LLP
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30 ; G06F17/00 ; G06F15/173
摘要:
A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.
公开/授权文献
信息查询