Invention Grant
US08035855B2 Automatic selection of a subset of representative pages from a multi-page document 有权
从多页文档中自动选择代表页面的一个子集

Automatic selection of a subset of representative pages from a multi-page document
Abstract:
What is provided herein is a method for automatically selecting a subset of pages from a multi-page document for image processing wherein each selected page is substantially different from all other pages according to certain features of interest and wherein the combined content of the selected pages approximately represents the content in the entire document. Selected pages are clustered wherein each page is represented by a feature vector meaningfully related to the task to be performed. A matrix of feature vectors is analyzed. Basis vectors are extracted from the matrix using rank-reduction techniques. Clustering is performed by subspace projection of page features onto the basis vectors with each page being assigned to a cluster to which that page maximally projects. Representative pages are selected from each cluster. The representative pages can then be used as input to a secondary process.
Information query
Patent Agency Ranking
0/0