发明申请
- 专利标题: Chinese character-based parser
- 专利标题(中): 基于汉字的解析器
-
申请号: US10826707申请日: 2004-04-16
-
公开(公告)号: US20050234707A1公开(公告)日: 2005-10-20
- 发明人: Xiaoqiang Luo , Robert Ward
- 申请人: Xiaoqiang Luo , Robert Ward
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F17/27
- IPC分类号: G06F17/27 ; G06F17/28
摘要:
A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.
公开/授权文献
- US07464024B2 Chinese character-based parser 公开/授权日:2008-12-09
信息查询