发明申请
US20100306139A1 CJK NAME DETECTION 有权
CJK名称检测

  • 专利标题: CJK NAME DETECTION
  • 专利标题(中): CJK名称检测
  • 申请号: US12746465
    申请日: 2007-12-06
  • 公开(公告)号: US20100306139A1
    公开(公告)日: 2010-12-02
  • 发明人: Jun WuHui XuYifei Zhang
  • 申请人: Jun WuHui XuYifei Zhang
  • 申请人地址: US CA Mountain View
  • 专利权人: Google Inc.
  • 当前专利权人: Google Inc.
  • 当前专利权人地址: US CA Mountain View
  • 国际申请: PCT/CN07/03464 WO 20071206
  • 主分类号: G06N5/02
  • IPC分类号: G06N5/02 G06F15/18
CJK NAME DETECTION
摘要:
Aspects directed to name detection are provided. A method includes generating a raw name detection model using a collection of family names and an annotated corpus including a collection of n-grams, each n-gram having a corresponding probability of occurring. The method includes applying the raw name detection model to a collection of semi-structured data to form annotated semi?structured data identifying n-grams identifying names and n?grams not identifying names and applying the raw name detection model to a large unannotated corpus to form a large annotated corpus data identifying n-grams of the large unannotated corpus identifying names and n-grams not identifying names. The method includes generating a name detection model, including deriving a name model using the annotated semi-structured data identifying names and the large annotated corpus data identifying names, deriving a not-name model using the semi?structured data not identifying names, and deriving a language model using the large annotated corpus.
公开/授权文献
信息查询
0/0