摘要:
A method, system, and computer program product for preventing network service attacks, including processing a message to validate the message for message version and syntax via a security firewall; canonicalizing the message and extracting a message header and body via a converter; converting the body into a Patricia Trie via the converter; and validating the header and the converted body for security via a comparator.
摘要:
A method, system, and computer program product for preventing network service attacks, including processing a message to validate the message for message version and syntax via a security firewall; canonicalizing the message and extracting a message header and body via a converter; converting the body into a Patricia Trie via the converter; and validating the header and the converted body for security via a comparator.
摘要:
A method and system for generating a decision-tree classifier from a training set of records, independent of the system memory size. The method includes the steps of: generating an attribute list for each attribute of the records, sorting the attribute lists for numeric attributes, and generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, split points are evaluated to determine the best split test for partitioning the records at the node. Preferably, a gini index and class histograms are used in determining the best splits. The gini index indicates how well a split point separates the records while the class histograms reflect the class distribution of the records at the node. Also, a hash table is built as the attribute list of the split attribute is divided among the child nodes, which is then used for splitting the remaining attribute lists of the node. The method reduces I/O read time by combining the read for partitioning the records at a node with the read required for determining the best split test for the child nodes. Further, it requires writes of the records only at one out of n levels of the decision tree where n.gtoreq.2. Finally, a novel data layout on disk minimizes disk seek time. The I/O optimizations work in a general environment for hierarchical data partitioning. They also work in a multi-processor environment. After the generation of the decision tree, any prior art pruning methods may be used for pruning the tree.
摘要:
Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other. Databases in domains such as multimedia and time-series can require a high number of dimensions. The .epsilon.-k-d-B tree has been proposed as a data structure that scales better as number of dimensions increases compared to previous data structures such as the R-tree (and variations), grid-file, and k-d-B tree. We present a cost model of the .epsilon.-k-d-B tree and use it to optimize the leaf size. This new leaf size is shown to be better in most situations compared to previous work that used a constant leaf size. We present novel parallel procedures for the .epsilon.-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted, equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. The weights for the latter strategy are based on the same cost model that is used to determine optimal leaf sizes.
摘要:
The present invention relates to analysis of large, disk resident data sets using a Patient Rule Induction Method (PRIM) in a computer system wherein a relational data table is initially received. The relational data table includes continuous attributes, discrete attributes, a matter parameter and a cost attribute. The cost attribute represents cost output values based on continuous attribute values and discrete attribute values as inputs. A hyper-rectangle is then formed which encloses a multi-dimensional space defined by the continuous attribute values and the discrete attribute values. The continuous attribute values and the discrete attribute values are represented as points within the multi-dimensional space. A plurality of points along edges of the hyper-rectangle are then removed based on an average of the cost output value from the plurality of points until a count of the points enclosed within the hyper-rectangle equals the meta parameter. Discrete attribute values and continuous attribute values which were removed from the hyper-rectangle are next added along edges of the hyper-rectangle until a sum of the cost output value over the multi-dimensional space enclosed by the hyper-rectangle changes. In a further embodiment a parallel architecture computer system calculates the cost attribute average values over the plurality of points enclosed by the hyper-rectangle in parallel. The invention analyzes large disk resident data sets without having to load the data set into main memory and can be practiced on a parallel computer architecture or a symmetric multi-processor architecture to improve performance.
摘要:
The invention real time electronic service interaction management system and method facilitates presentation of information that increases the probability of desirable target interaction. Desirable target interaction includes metrics associated with campaign objectives (e.g., maximize profits) and constraints (e.g., budget constraints). The system and method automatically develops interaction motivation plans that determine a stimulation action (e.g., information presented to a target). A motivation interaction plan is a procedure utilized to determine a stimulation action to present to a target with specific attributes under certain system attributes. The present invention adaptively optimizes and tests interaction motivation plans to permit automated learning about target individual interaction activities and accordingly modify interaction motivation plans in both real time and over the lifetime of a campaign. It also facilitates the development of behavioral models that provide predictions associated with the probability of target behavior based upon a set of target characteristics and system attributes.
摘要:
A multicasting system for multicasting window events to various application programs running on a computer system, each such program having an application window. A global control program runs on the computer system and has a global control window. Through the global control program, a user selects one or more of the application programs to receive incoming window events. Later, when the global control window is active, any incoming window event is received in that window. The global control program automatically multicasts each such event to every application program that the user has selected to receive incoming window events. Events may be multicast directly to child windows of the various application windows. The global control window may have a global child window that receives incoming window events; such events are multicast directly to selected child windows of the application programs. The application programs may be resident locally or on a remote computer system. If window events are received out of sequence, the global control program may either ignore them or resequence them for proper operation.
摘要:
The present invention is an apparatus and method for classifying high-dimensional sparse datasets. A raw data training set is flattened by converting it from categorical representation to a boolean representation. The flattened data is then used to build a class model on which new data not in the training set may be classified. In one embodiment, the class model takes the form of a decision tree, and large itemsets and cluster information are used as attributes for classification. In another embodiment, the class model is based on the nearest neighbors of the data to be classified. An advantage of the invention is that, by flattening the data, classification accuracy is increased by eliminating artificial ordering induced on the attributes. Another advantage is that the use of large itemsets and clustering increases classification accuracy.
摘要:
The present invention is directed to an improved data clustering method and apparatus for use in data mining operations. The present invention determines the pattern vectors of a k-d tree structure which are closest to a given prototype cluster by pruning prototypes through geometrical constraints, before a k-means process is applied to the prototypes. For each sub-branch in the k-d tree, a candidate set of prototypes is formed from the parent of a child node. The minimum and maximum distances from any point in the child node to any prototype in the candidate set is determined. The smallest of the maximum distances found is compared to the minimum distances of each prototype in the candidate set. Those prototypes with a minimum distance greater than the smallest of the maximum distances are pruned or eliminated. Pruning the number of remote prototypes reduces the number of distance calculations for the k-means process, significantly reducing the overall computation time.
摘要:
The disclosed embodiments provide a system that facilitates use of an application. During operation, the system obtains an activity history of interaction between the user and the application during use of the application by the user. Next, the system applies a predictive model to the activity history to predict a probability of a user action in the application. Finally, the system facilitates subsequent real-time use of the application by the user based on the probability of the user action.