-
公开(公告)号:US08027832B2
公开(公告)日:2011-09-27
申请号:US11056707
申请日:2005-02-11
IPC分类号: G06F17/27
CPC分类号: G06F17/275 , Y10T70/30 , Y10T70/358 , Y10T70/5726 , Y10T70/5973 , Y10T70/7057
摘要: A system and methods of language identification of natural language text are presented. The system includes stored expected character counts and variances for a list of characters found in a natural language. Expected character counts and variances are stored for multiple languages to be considered during language identification. At run-time, one or more languages are identified for a text sample based on comparing actual and expected character counts. The present methods can be combined with upstream analyzing of Unicode ranges for characters in the text sample to limit the number of languages considered. Further, n-gram methods can be used in downstream processing to select the most probable language from among the languages identified by the present system and methods.
摘要翻译: 介绍了自然语言文本语言识别的系统和方法。 该系统包括存储的预期字符计数和以自然语言发现的字符列表的方差。 为语言识别期间考虑的多种语言存储预期的字符数和差异。 在运行时,基于比较实际和预期的字符数量,为文本样本识别一种或多种语言。 目前的方法可以与文本样本中的字符的Unicode范围的上游分析相结合,以限制所考虑的语言数量。 此外,n-gram方法可以用于下游处理,以从本系统和方法识别的语言中选择最可能的语言。
-
公开(公告)号:US07584093B2
公开(公告)日:2009-09-01
申请号:US11113612
申请日:2005-04-25
申请人: Douglas W. Potter , Edward C. Hart, Jr. , Hisakazu Igarashi , Patricia M. Schmid , William D. Ramsey
发明人: Douglas W. Potter , Edward C. Hart, Jr. , Hisakazu Igarashi , Patricia M. Schmid , William D. Ramsey
IPC分类号: G06F17/27
CPC分类号: G06F17/2795
摘要: A computer implemented method of suggesting replacement words for words of a string. In the method, an input string of input words is received. The input words are then matched to subject words of a candidate table. Next, candidate replacement words and scores from the candidate table corresponding to the matched subject words are extracted. Each score is indicative of a probability that the input word should be replaced with the corresponding candidate replacement word. Finally, replacement of the input words with their corresponding candidate replacement words is selectively suggested based on the scores for the replacement words. Another aspect of the present invention is directed to a spell checking system that is configured to implement the method.
摘要翻译: 一种计算机实现的方法,用于为字符串的字提示替换字。 在该方法中,接收输入字的输入字符串。 然后将输入字与候选表的主题相匹配。 接下来,提取与匹配对象词相对应的候选表的候选替换词和分数。 每个分数表示输入单词应该用相应的候选替代单词替代的概率。 最后,基于替换字的得分,选择性地建议用其对应的候补替换字替换输入字。 本发明的另一方面涉及一种被配置为实现该方法的拼写检查系统。
-