发明申请
- 专利标题: CJK NAME DETECTION
- 专利标题(中): CJK名称检测
-
申请号: US12746465申请日: 2007-12-06
-
公开(公告)号: US20100306139A1公开(公告)日: 2010-12-02
- 发明人: Jun Wu , Hui Xu , Yifei Zhang
- 申请人: Jun Wu , Hui Xu , Yifei Zhang
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 国际申请: PCT/CN07/03464 WO 20071206
- 主分类号: G06N5/02
- IPC分类号: G06N5/02 ; G06F15/18
摘要:
Aspects directed to name detection are provided. A method includes generating a raw name detection model using a collection of family names and an annotated corpus including a collection of n-grams, each n-gram having a corresponding probability of occurring. The method includes applying the raw name detection model to a collection of semi-structured data to form annotated semi?structured data identifying n-grams identifying names and n?grams not identifying names and applying the raw name detection model to a large unannotated corpus to form a large annotated corpus data identifying n-grams of the large unannotated corpus identifying names and n-grams not identifying names. The method includes generating a name detection model, including deriving a name model using the annotated semi-structured data identifying names and the large annotated corpus data identifying names, deriving a not-name model using the semi?structured data not identifying names, and deriving a language model using the large annotated corpus.
公开/授权文献
- US08478787B2 Name detection 公开/授权日:2013-07-02