摘要:
A training system for a classifier utilizes both a back-propagation system to iteratively modify parameters of functions which provide raw output indications of desired categories, wherein the parameters are modified based on a weighted decay, and a probability determining system with further parameters that are determined during iterative training. A margin error metric may be combined with weight decay, and a sigmoid is used to calibrate the raw outputs to probability percentages for each category. A method of training such a system involves gathering a training set of inputs and desired corresponding outputs. Classifier parameters are then initialized and an error margin is calculated with respect to the classifier parameters. A weight decay is then used to adjust the parameters. After a selected number of times through the training set, the parameters are deemed in final form, and an optimization routine is used to derive a set of probability transducer parameters for use in calculating the probable classification for each input.
摘要:
A method and system are provided for drawing or writing using an input device in computer systems. The system provides an absolute-coordinate drawing mode, in which the user may draw written strokes to which the computer system display is responsive, and a relative-coordinate repositioning mode, in which the user may figuratively “pick up the pen” and reposition the beginning of a following written stroke. The system enters the drawing mode in response to a user command, and remains in the drawing mode in response to continued written strokes. The system enters the cursor mode after a selected time period occurs with no written strokes, or in response to other user commands. In the drawing mode, a coordinate system for the touchpad is mapped to a coordinate system for a selected window on the display. The mapping is selected so that an initial position of a written stroke in the window corresponds to a final position selected during the cursor mode, so that the user is able to reposition the input device when it has (or is about to) “run off the edge” of an input tablet which is relatively smaller than the display. The system uses either a resistive touchpad input tablet, or a capacitive touchpad input tablet in conjunction with a passive stylus input device having a flexible conductive writing tip. The system alternatively provides a signature mode, in which the user is able to make a simple drawing, such as a handwritten signature.
摘要:
A method for incremental recognition of ideographic handwriting comprises in order the steps of: (1) entering in a natural stroke order at least one stroke of an ideographic character from a computer entry tablet; (2) providing the at least one stroke to an incremental character recognizer, which produces a hypothesis list of at least one candidate character; (3) displaying a hypothesis list of candidate characters containing the at least one stroke; (4) selecting a correct character from among the candidate characters on the hypothesis list if it a correct character appears thereon; (5) entering in natural stroke order at least one additional stroke of the ideographic character from the computer entry tablet if no candidate character is a correct character; (6) providing the additional stroke(s) to the incremental character recognizer, which produces an updated hypothesis list; (7) displaying the updated hypothesis list of candidate characters containing every stroke; (8) selecting a correct character from among the candidate characters on the updated hypothesis list if it a correct character appears thereon; and (9) repeating steps (5) through (8) until a correct character is selected from the updated hypothesis list.
摘要:
Architecture that monitors interaction data (e.g., search queries, query results and click-through rates), and provides users with links to other users that fall into similar categories with respect to the foregoing monitored activities (e.g., providing links to individuals and groups that share common interests and/or profiles). A search engine can be interactively coupled with one or more social networks, and that maps individuals and/or groups within respective social networks to subsets of categories associated with searches. A database stores mapped information which can be continuously updated and reorganized as links within the system mapping become stronger or weaker. The architecture can comprise a social network system that includes a database for mapping search-related information to an entity of a social network, and a search component for processing a search query for search results and returning a link to an entity of a social network based on the search query.
摘要:
A reliable automated malware classification approach with substantially low false positive rates is provided. Graph-based local and/or global file relationships are used to improve malware classification along with a feature selection algorithm. File relationships such as containing, creating, copying, downloading, modifying, etc. are used to assign malware probabilities and simultaneously reduce the false positive and false negative rates on executable files.
摘要:
A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.
摘要:
Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure.
摘要:
A reliable automated malware classification approach with substantially low false positive rates is provided. Graph-based local and/or global file relationships are used to improve malware classification along with a feature selection algorithm. File relationships such as containing, creating, copying, downloading, modifying, etc. are used to assign malware probabilities and simultaneously reduce the false positive and false negative rates on executable files.
摘要:
A method described herein includes an act of receiving a query from a user, wherein the query is configured to search over a plurality of documents belonging to a particular domain. The method also includes an act of providing data to the user for display on a display screen of a computing apparatus, wherein the data is provided based at least in part upon a statistical analysis undertaken with respect to structured data pertaining to the particular domain, wherein the structured data is based at least in part upon data included in the plurality of documents.
摘要:
A method of identifying a malware file using multiple classifiers is disclosed. The method includes receiving a file at a client computer. The file includes static metadata. A set of metadata classifier weights are applied to the static metadata to generate a first classifier output. A dynamic classifier is initiated to evaluate the file and to generate a second classifier output. The method includes automatically identifying the file as potential malware based on at least the first classifier output and the second classifier output.