Invention Grant
- Patent Title: Automatic selection of a subset of representative pages from a multi-page document
- Patent Title (中): 从多页文档中自动选择代表页面的一个子集
-
Application No.: US12024208Application Date: 2008-02-01
-
Publication No.: US08035855B2Publication Date: 2011-10-11
- Inventor: Vishal Monga , Raja Bala
- Applicant: Vishal Monga , Raja Bala
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fleit Gibbons Gutman Bongini & Bianco P.L.
- Agent Philip E. Blair
- Main IPC: G06F15/00
- IPC: G06F15/00

Abstract:
What is provided herein is a method for automatically selecting a subset of pages from a multi-page document for image processing wherein each selected page is substantially different from all other pages according to certain features of interest and wherein the combined content of the selected pages approximately represents the content in the entire document. Selected pages are clustered wherein each page is represented by a feature vector meaningfully related to the task to be performed. A matrix of feature vectors is analyzed. Basis vectors are extracted from the matrix using rank-reduction techniques. Clustering is performed by subspace projection of page features onto the basis vectors with each page being assigned to a cluster to which that page maximally projects. Representative pages are selected from each cluster. The representative pages can then be used as input to a secondary process.
Public/Granted literature
- US20090195796A1 AUTOMATIC SELECTION OF A SUBSET OF REPRESENTATIVE PAGES FROM A MULTI-PAGE DOCUMENT Public/Granted day:2009-08-06
Information query