发明授权
US08131539B2 Search-based word segmentation method and device for language without word boundary tag
有权
基于搜索的词分割方法和无字边界标签语言的设备
- 专利标题: Search-based word segmentation method and device for language without word boundary tag
- 专利标题(中): 基于搜索的词分割方法和无字边界标签语言的设备
-
申请号: US12044258申请日: 2008-03-07
-
公开(公告)号: US08131539B2公开(公告)日: 2012-03-06
- 发明人: Wen Liu , Yong Qin , Xin Jing Wang
- 申请人: Wen Liu , Yong Qin , Xin Jing Wang
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 代理商 William Stock; Anne Vachon Dougherty
- 优先权: CN200710086030 20070307
- 主分类号: G06F17/27
- IPC分类号: G06F17/27 ; G06F17/20
摘要:
The present invention discloses a search-based segmentation method and device for a language without a word boundary tag. The inventive method includes the steps of: a. providing at least one search engine with a segment of a text including at least one segment; b. searching for the segment through the at least one search engine, and returning search results; and c. selecting a word segmentation approach for the segment in accordance with at least part of the returned search results. The invention solves the problems of word segmentation for a language without a word boundary tag, and thus combat the limitations of the prior art in terms of flexibility, dependence upon coverage of dictionaries, available training data corpuses, processing of a new word, etc.