-
公开(公告)号:US09483460B2
公开(公告)日:2016-11-01
申请号:US14047502
申请日:2013-10-07
Applicant: Google Inc.
Inventor: Tania Bedrax-Weiss , Geza Kovacs , Ulas Kirazci
IPC: G06F17/27
CPC classification number: G06F17/2735 , G06F17/277 , G06F17/2795
Abstract: A document analysis system analyzes a corpus of documents and automatically generates a dictionary of specialized phrases not already in conventional dictionaries. The dictionary generation process involves a series of operations on the phrases to identify the phrases most suitable for inclusion in a dictionary, such as phrase scoring and phrase clustering. The dictionary generation process also comprises the identification of one or more corresponding definitions for the various phrases identified for inclusion in the specialized dictionary.
Abstract translation: 文档分析系统分析文档语料库,并自动生成一个不在常规词典中的专业词典的词典。 字典生成过程涉及对短语的一系列操作以识别最适合包括在字典中的短语,例如短语评分和短语聚类。 字典生成过程还包括识别用于包括在专门词典中的各种短语的一个或多个相应定义。