发明申请
- 专利标题: System and method for smoothing hierarchical data using isotonic regression
- 专利标题(中): 使用等渗回归平滑分层数据的系统和方法
-
申请号: US11800235申请日: 2007-05-04
-
公开(公告)号: US20080275890A1公开(公告)日: 2008-11-06
- 发明人: Deepayan Chakrabarti , Kunal Punera , Shanmugasundaram Ravikumar
- 申请人: Deepayan Chakrabarti , Kunal Punera , Shanmugasundaram Ravikumar
- 申请人地址: US CA Sunnyvale
- 专利权人: Yahoo! Inc.
- 当前专利权人: Yahoo! Inc.
- 当前专利权人地址: US CA Sunnyvale
- 主分类号: G06F17/30
- IPC分类号: G06F17/30 ; G06F15/00
摘要:
An improved system and method is provided for detecting a web page template. A web page template detector may be provided for performing page-level template detection on a web page. In general, the web page template classifier may be trained using automatically generated training data, and then the web page template classifier may be applied to web pages to identify web page templates. A web page template may be detected by classifying segments of a web page as template structures, by assigning classification scores to the segments of the web page classified as template structures, and then by smoothing the classification scores assigned to the segments of the web page. Generalized isotonic regression may be applied for smoothing scores associated with the nodes of a hierarchy by minimizing an optimization function using dynamic programming.
公开/授权文献
信息查询