SEMANTIC-BASED APPROACH FOR IDENTIFYING TOPICS IN A CORPUS OF TEXT-BASED ITEMS
    3.
    发明申请
    SEMANTIC-BASED APPROACH FOR IDENTIFYING TOPICS IN A CORPUS OF TEXT-BASED ITEMS 审中-公开
    在基于文本的物品中识别主题的基于语义的方法

    公开(公告)号:US20130085745A1

    公开(公告)日:2013-04-04

    申请号:US13632848

    申请日:2012-10-01

    CPC classification number: G06F17/2785

    Abstract: A method of identifying topics in a corpus that includes a plurality of text-based items begins by extracting keytext from each of the plurality of text-based items, resulting in sets of keytext. The method continues by processing the keytext sets to generate a respective semantic footprint for each of the text-based items, resulting in a plurality of semantic footprints. The semantic footprints are used to calculate similarity values for the text-based items, wherein the similarity values indicate commonality between pairs of the text-based items. The method continues by clustering the text-based items into a number of topic groups, wherein the clustering is influenced by the similarity values, and by generating a topic heading for each of the number of topic groups, resulting in a number of topic headings. Next, the text-based items are grouped into accessible topic groups associated with the topic headings.

    Abstract translation: 一种识别包含多个基于文本的项目的语料库中的主题的方法通过从多个基于文本的项目中的每一个提取密钥文本,从而产生一组密钥文本。 该方法通过处理密钥文本集来继续,以针对每个基于文本的项目生成相应的语义覆盖,导致多个语义覆盖。 语义足迹用于计算基于文本的项目的相似度值,其中相似度值表示基于文本的项目对之间的共性。 该方法继续通过将基于文本的项目聚类成多个主题组,其中聚类受相似性值的影响,并且通过为每个主题组的每一个生成主题标题,导致多个主题标题。 接下来,基于文本的项目被分组成与主题标题相关联的可访问主题组。

Patent Agency Ranking