专利检索 ap:("KANG LI" OR "STEPHEN ALLEN KLODER" OR "IAN GEORGE JOHNSON" OR "SIARHEI ALONICHAU") AND inv:"IAN GEORGE JOHNSON" 第 1 页

1.

发明申请
Language Identification in Multilingual Text 有权
标题翻译：多语言文本中的语言识别

公开(公告)号：US20120095748A1

公开(公告)日：2012-04-19

申请号：US12904642

申请日：2010-10-14

申请人： KANG LI , STEPHEN ALLEN KLODER , IAN GEORGE JOHNSON , SIARHEI ALONICHAU

发明人： KANG LI , STEPHEN ALLEN KLODER , IAN GEORGE JOHNSON , SIARHEI ALONICHAU

IPC分类号： G06F17/20

CPC分类号： G06F17/275 , G06F17/30864

摘要： Methods, systems, and media are provided for identifying languages in multilingual text. A document is decoded into a universal representative coding for easier tag manipulation, then broken into plain-text content sections. The sections are identified and assigned a weight, wherein more informative sections are given a higher weight and less informative sections are given a lesser weight. A language likelihood score is determined for each word, phrase, or character n-gram in a section. The language likelihood scores within a section are combined for each language. The combined section scores are then summed together to obtain a total document score for each language. This results in a document score for each language, which can be ranked to determine the primary language for the document.

摘要翻译： 提供方法，系统和媒体用于识别多语言文本中的语言。将文档解码为通用代表编码，便于标签操纵，然后分解成纯文本内容部分。这些部分被识别并分配了一个重量，其中更多的信息部分被给予较高的重量，并且较少的信息部分被给予较小的重量。确定一个部分中每个单词，短语或字符n-gram的语言可能性得分。一个部分内的语言可能性分数与每种语言相结合。然后将组合的分数相加在一起以获得每种语言的总文档分数。这导致每个语言的文档分数，其可以被排序以确定文档的主要语言。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类