Method and apparatus for computing similarity between cross-field documents

Invention Grant

US10452696B2 Method and apparatus for computing similarity between cross-field documents 有权

Please log in to see more content

Patent Title: Method and apparatus for computing similarity between cross-field documents
Application No.: US15190985

Application Date: 2016-06-23
Publication No.: US10452696B2

Publication Date: 2019-10-22
Inventor: Liangwei Wang , Wing Ki Leung , Yang Yang
Applicant: Huawei Technologies Co., Ltd.
Applicant Address: CN Shenzhen
Assignee: HAUWEI TECHNOLOGIES CO., LTD.
Current Assignee: HAUWEI TECHNOLOGIES CO., LTD.
Current Assignee Address: CN Shenzhen
Agency: Conley Rose, P.C.
Priority: CN201310722866 20131224
Main IPC: G06F16/35
IPC: G06F16/35 ; G06F16/93 ; G06F16/33 ; G06F16/36

Method and apparatus for computing similarity between cross-field documents

Abstract:

A method includes storing documents of different fields, and a relationship between any two documents of different fields, performing word segmentation and stop word removal on the documents of different fields, to obtain a vocabulary data set for the documents of different fields, constructing an incidence matrix between the documents of different fields according to the relationship between the any two documents of different fields, obtaining a topic cluster of the documents of different fields according to the vocabulary data set, obtaining a probability that any topic in the topic cluster appears in any document and a matching weight of the any topic for any two different fields according to the incidence matrix and the topic cluster, and computing a similarity between the any two documents according to the probabilities and the matching weight of the any topic for the fields to which the any two documents belong.

Public/Granted literature

US20160306873A1 Method and Apparatus for Computing Similarity Between Cross-Field Documents Public/Granted day:2016-10-20

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）
G06F16/35	..••聚类；分类