Learning word segmentation from non-white space languages corpora
    1.
    发明授权
    Learning word segmentation from non-white space languages corpora 失效
    从非空白语言语料库学习单词分割

    公开(公告)号:US08165869B2

    公开(公告)日:2012-04-24

    申请号:US11953635

    申请日:2007-12-10

    IPC分类号: G06F17/27 G06F17/20

    CPC分类号: G06F17/2863 G06F17/277

    摘要: Illustrative embodiments provide a computer implemented method, apparatus, and computer program product for learning word segmentation from non-white space language corpora. In one illustrative embodiment, the computer implemented method receives text input characters and calculates a ratio-measure for each pair of characters in the input characters. The computer implemented method further determines whether the ratio-measure of each pair of characters is equal to a predetermined threshold value. Responsive to determining the ratio-measure is less than the predetermined threshold value, and a local-minimum value, the computer method further identifies the pair as a weak pair and breaks the weak pair of characters.

    摘要翻译: 说明性实施例提供了一种用于从非空白语言语料库学习单词分割的计算机实现的方法,装置和计算机程序产品。 在一个说明性实施例中,计算机实现的方法接收文本输入字符并且计算输入字符中每对字符的比率度量。 计算机实现的方法还确定每对字符的比例度量是否等于预定阈值。 响应于确定比率测量值小于预定阈值,并且局部最小值,计算机方法进一步将该对识别为弱对,并打破弱对的一对字符。