Invention Application
- Patent Title: Unsupervised training for overlapping ambiguity resolution in word segmentation
- Patent Title (中): 用于重叠模糊度分辨率的无监督训练
-
Application No.: US10662502Application Date: 2003-09-15
-
Publication No.: US20050060150A1Publication Date: 2005-03-17
- Inventor: Mu Li , Jianfeng Gao
- Applicant: Mu Li , Jianfeng Gao
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Main IPC: G06F17/27
- IPC: G06F17/27 ; G06F17/28 ; G10L15/00

Abstract:
A method for resolving overlapping ambiguity strings in unsegmented languages such as Chinese. The methodology includes segmenting sentences into two possible segmentations and recognizing overlapping ambiguity strings in the sentences. One of the two possible segmentations is selected as a function of probability information. The probability information is derived from unsupervised training data. A method of constructing a knowledge base containing probability information needed to select one of the segmentation is also provided.
Information query