-
公开(公告)号:US09317564B1
公开(公告)日:2016-04-19
申请号:US14518891
申请日:2014-10-20
Applicant: Google Inc.
Inventor: Dmity Korolev , Hartmut Maennel , Matthias Heiler , Michael Schaer , Thomas Hofmann , Wojciech Gajewski , Justyna Sidorska
IPC: G06F17/30
CPC classification number: G06F17/30 , G06F15/00 , G06F17/2785 , G06F17/30707 , G06K9/726
Abstract: Methods, systems, and apparatus, including computer program products, for constructing text classifiers. The method includes receiving a collection of candidate phrases for a given topic; filtering the received candidate phrases to remove erroneously included candidate phrases; assigning weights to the candidate phrases including scoring each candidate phrase using an initial classifier and assigning weights to the candidate phrases based on the scores; and generating a linear classifier using the filtered and weighted candidate phrases, where the linear classifier varies the weights for each phrase candidate depending on the length of the document being classified.
Abstract translation: 用于构建文本分类器的方法,系统和装置,包括计算机程序产品。 该方法包括接收给定主题的候选短语集合; 过滤所接收的候选短语以去除错误地包括的候选短语; 向所述候选短语分配权重,包括使用初始分类器对每个候选短语进行评分,并且基于所述分数向所述候选短语分配权重; 以及使用经滤波和加权的候选短语生成线性分类器,其中线性分类器根据被分类的文档的长度来改变每个短语候选的权重。