-
公开(公告)号:US20230034027A1
公开(公告)日:2023-02-02
申请号:US17873371
申请日:2022-07-26
Applicant: KYOCERA Document Solutions Inc.
Inventor: Koji SATO , Kanako MORIMOTO , Rui HAMABE , kazunori TANAKA , Takuya MIYAMOTO
Abstract: A vector generation unit derives a feature vector of a reference document and a feature vector of a population document. A feature quantity extraction unit performs a dimensionality reduction process to reduce dimensionality of the above feature vectors and sets a dimensional value obtained by the dimensionality reduction process as a first feature quantity, and derives a cosine similarity between the feature vector of the reference document and the feature vector of the population document as a second feature quantity. A retrieval range control unit extracts a specific number of population documents, starting from the population document with the shortest distance to the reference document in a feature quantity space of the first feature quantity, so as to limit a retrieval range. A training data extraction unit extracts, as training data, a specific number of documents from the extracted documents, starting from the document with the lowest cosine similarity.