发明授权
- 专利标题: Automatic segmentation of continuous text using statistical approaches
- 专利标题(中): 使用统计方法自动分割连续文本
-
申请号: US700823申请日: 1996-09-04
-
公开(公告)号: US5806021A公开(公告)日: 1998-09-08
- 发明人: Chengjun Julian Chen , Fu-Hua Liu , Michael Alan Picheny
- 申请人: Chengjun Julian Chen , Fu-Hua Liu , Michael Alan Picheny
- 申请人地址: NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: NY Armonk
- 主分类号: G06F17/27
- IPC分类号: G06F17/27 ; G06F17/20
摘要:
An automatic segmenter for continuous text segments such text in a rapid, consistent and semantically accurate manner. Two statistical methods for segmentation of continuous text are used. The first method, called "forward-backward matching", is easy and fast but can produce occasional errors in long phrases. The second method, called "statistical stack search segmenter", utilizes statistical language models to generate more accurate segmentation output at an expense of two times more execution time than the "forward-backward matching" method. In some applications where speed is a major concern, "forward-backward matching" can be used, while in other applications where highly accurate output is desired, "statistical stack search segmenter" is ideal.