摘要:
A method for extracting information from electronic documents, including: learning terms and term variants from a training corpus, wherein the terms and the term variants correspond to a specialized dictionary related to the training corpus; generating a list of negative indicators found in the training corpus; performing a partial match of the terms and the term variants in a set of electronic documents to create initial match results; and performing a negation test using the negative indicators and a positive terms test using the terms and the term variants on the initial match results to remove matches from the initial match results that fail either the negation test or the positive terms test, resulting in final match results.
摘要:
A method for de-identification of visual media data, including: merging a sequence of images from a set of visual media data into an averaged image; bounding portions of the averaged image that are determined to be relatively fixed, wherein each bounded portion is identified by a corresponding position in the averaged image; generating a template comprising the bounded portions and the corresponding position for each bounded portion in the averaged image; and de-identifying the sequence of images by obfuscating content in the bounded portions.
摘要:
A method for inferring disease similarity by similarity retrieval of electrocardiogram time-series, comprising: acquiring user ECG waveforms correspondingly depicting many cardiac cycles of the heart of many users stored in a database; pre-processing each of the user ECG waveforms through pre-processing steps to isolate sets of single cardiac cycles corresponding to different heart-rates detected for each of the user ECG waveforms, each single cardiac cycle within the many cardiac cycles of the heart of many users corresponds to one single heart-rate detected. acquiring patient ECG waveforms depicting multiple cardiac cycles of the heart of a query patient; pre-processing the patient ECG waveforms through pre-processing steps to isolate sets of single cardiac cycles corresponding to different heart-rates detected for each of the patient ECG waveforms of the query patient, each single cardiac cycle within the multiple cardiac cycles of the heart of the query patient corresponds to one single heart-rate detected.
摘要:
The embodiments of the invention provide an apparatus, method, etc. for a camera-equipped writing tablet for digitizing form entries. More specifically, a data capture apparatus comprises a form holder and an image capture device connected to the form holder. The image capture device is positioned to capture an image of a form on the form holder, wherein the form could be a paper form. A clip is connected to the form holder, wherein the image capture device is mounted to the clip. The apparatus further comprises an electronic pen connected to the form holder, wherein the form holder comprises an electronic pen capture device to electronically capture marks made on the form using the electronic pen. The electronic pen is a combination ink and electronic pen that is temporarily connected to the form holder.
摘要:
A method for matching objects based on spatial layout of regions based on a shape similarity model for detecting similarity between general 2D objects. The method uses the shape similarity model to determine if two objects are similar by logical region generation in which logical regions are automatically derived from information in the objects to be matched, region correspondence, in which a correspondence is established between the regions on the objects, pose computation in which the individual transforms relating corresponding regions are recovered, and pose verification in which the extent of spatial similarity is measured by projecting one document onto the other using the computed pose parameters. The method of the invention can be carried out in a microprocessor-based system capable of being programmed to carry out the method of the invention.
摘要:
A method of locating handwritten words in handwritten text images under a variety of transformations including changes in document orientation, skew, noise, and changes in handwriting style of a single author which avoids a detailed search of the image for locating every word by pre-computing relevant information in a hash table and indexing the table for word localization. Both the hash table construction and indexing can be done as fast operations taking time quadratic in the number of basis points. Generally, the method involves four stages: (1) Pre-processing where features for word localization are extracted; (2) Image hash table construction; (3) Indexing where query word features are used to look up hash table for candidate locations; and (4) Verification, where the query word is projected and registered with the underlying word at the candidate locations. The method has applications in digital libraries, handwriting tokenization, document management and OCR systems.
摘要:
There is provided a network server which interfaces a client with selected database sites from a plurality of database sites. The network server comprises a meta-database (including both text information and multimedia information), a search agent, and a refining module. The search agent indexes the meta-database with a user query obtained from the client, and then distributes queries, developed pursuant to such indexing, to the selected ones of the plurality of database sites. In turn, database site information (responsive to the distributed queries) is retrieved from the selected ones of the plurality of database sites. A refining module is used to update the meta-database with the database relevancy information.