摘要:
Systems and methods are described that facilitate performing feature extraction across multiple received input features to reduce computational overhead associated with feature processing related to, for instance, optical character recognition. Input feature information can be unfolded and concatenated to generate an aggregated input matrix, which can be convolved with a kernel matrix to produce output feature information for multiple output features concurrently.
摘要:
The present invention provides a unique system and method that facilitates obtaining high performance and more secure HIPs. More specifically, the HIPs can be generated in part by caching pre-rendered characters and/or pre-rendered arcs as bitmaps in binary form and then selecting any number of the characters and/or arcs randomly to form a HIP sequence. The warp field can be pre-computed and converted to integers in binary form and can include a plurality of sub-regions. The warp field can be cached as well. Any one sub-region can be retrieved from the warp field cache and mapped to the HIP sequence to warp the HIP. Thus, the pre-computed warp field can be used to warp multiple HIP sequences. The warping can occur in binary form and at a high resolution to mitigate reverse engineering. Following, the warped HIP sequence can be down-sampled and texture and/or color can be added as well to improve its appearance.
摘要:
A method and system for implementing character recognition is described herein. An input character is received. The input character is composed of one or more logical structures in a particular layout. The layout of the one or more logical structures is identified. One or more of a plurality of classifiers are selected based on the layout of the one or more logical structures in the input character. The entire character is input into the selected classifiers. The selected classifiers classify the logical structures. The outputs from the selected classifiers are then combined to form an output character vector.
摘要:
The subject invention leverages a scalable character glyph hash table to provide an efficient means to identify print characters where the character glyphs are identical over independent presentation. The hash table allows for quick determinations of glyph meta data as, for example, a pre-filter to traditional OCR techniques. The hash table can be trained for a particular environment, user, language, character set (e.g., alphabet), document type, and/or specific document and the like. This permits substantial flexibility and increases in speed in identifying unknown glyphs. The hash table itself can be composed of single or multiple tables that have a specific optimization purpose. In one instance of the subject invention, traditional OCR techniques can be utilized to update the hash tables as needed based on glyph frequency. This keeps the hash tables from growing by limiting updates that reduce its performance, while adding frequently determined glyphs to increase the pre-filter performance.
摘要:
A computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document. An analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document. In accordance with one aspect of the subject invention, the viewing mode can be one of a conventional viewing mode and a viewing mode associated with enhanced readability.
摘要:
The subject invention provides a unique system and method that facilitates creating HIP challenges (HIPs) that can be readily segmented and solved by human users but that are too difficult for non-human users. More specifically, the system and method utilize a variety of unique alteration techniques that are segmentation-based. For example, the system and method employ thicker arcs or occlusions that do not intersect characters already placed in the HIP. The thickness of the arc can be measured or determined by the thickness of the characters in the HIP. In addition to increasing the thickness, the arcs can be lengthened because longer arcs tend to resemble pieces of characters and may be harder to erode. Usability maps can be generated and used to selectively place clutter or occlusions and to selectively warp characters or the character sequence to facilitate human recognition of the characters.
摘要:
A printer, scanner device and methods for using same are described herein. A printer device may include a dedicated input that, when actuated, generates and sends a request to a computer for known data or a predetermined print job, e.g., schedule information from a personal information management (PIM) application. A scanner device may include another dedicated input that, when actuated, automatically scans a document fed to the device by the user and sends the scanned image to IM (or other) software on a computer, bypassing the need to manipulate the scanned image using scanner software. The device may be used with printed metapaper, which includes a barcode or other indicia identifying the metapaper and corresponds to a stored template image of the metapaper. When the metapaper is rescanned, the scan can be compared to the stored template information to identify changes and synchronize the changes with the IM software.
摘要:
The subject disclosure pertains to systems and methods that facilitate detection of cloaked web pages. Commercial value of search terms and/or queries can be indicative of the likelihood that web pages associated with the keywords or queries are cloaked. Commercial value can be determined based upon popularity of terms and/or advertisement market value as established based upon advertising revenue, fees and the like. Commercial value can be utilized in conjunction with term frequency difference analysis to identify a cloaked page automatically. In addition, commercial values of terms associated with web pages can be used to order or prioritize web pages for further analysis.
摘要:
A printer, scanner device and methods for using same are described herein. A printer device may include a dedicated input that, when actuated, generates and sends a request to a computer for known data or a predetermined print job, e.g., schedule information from a personal information management (PIM) application. A scanner device may include another dedicated input that, when actuated, automatically scans a document fed to the device by the user and sends the scanned image to IM (or other) software on a computer, bypassing the need to manipulate the scanned image using scanner software. The device may be used with printed metapaper, which includes a barcode or other indicia identifying the metapaper and corresponds to a stored template image of the metapaper. When the metapaper is rescanned, the scan can be compared to the stored template information to identify changes and synchronize the changes with the IM software.
摘要:
An approach for identifying suspect network sites in a network environment entails using one or more malware analysis modules to identify distribution sites that host malicious content and/or benign content. The approach then uses a linking analysis module to identify landing sites that are linked to the distribution sites. These linked sites are identified as suspect sites for further analysis. This analysis can be characterized as “bottom up” because it is initiated by the detection of potentially problematic distribution sites. The approach can also perform linking analysis to identify a suspect network site based on a number of alternating paths between that network site and a set of distribution sites that are known to host malicious content. The approach can also train a classifier module to predict whether an unknown landing site is a malicious landing site or a benign landing site.