Abstract:
A system, method, and machine-readable storage medium for detecting an anomaly are provided. In some embodiments, the method includes computing an access rate of a set of entities for each user of a plurality of users. The access rate may refer to data operations for the set of entities stored by a storage system. The method also includes normalizing the access rates for a subset of the plurality of users, the subset belonging to a community. The method further includes determining whether a normalized access rate from among the access rates satisfies a threshold. The method also includes detecting an anomaly in response to a determination that the normalized access rate satisfies the threshold.
Abstract:
An embodiment of the invention provides an apparatus and method for classifying a workload of a computing entity. In an embodiment, the computing entity samples a plurality of values for a plurality of parameters of the workload. Based on the plurality of values of each parameter, the computing entity determines a parameter from the plurality of parameters that the computing entity's response time is dependent on. Here, the computing entity's response time is indicative of a time required by the computing entity to respond to a service request from the workload. Further, based on the identified significant parameter, the computing entity classifies the workload of the computing entity by selecting a workload classification from a plurality of predefined workload classifications.
Abstract:
A system, method, and machine-readable storage medium for forming a community based on a common set of attributes are provided. In some embodiments, the method includes creating a list of entities associated with a plurality of users, each entity included in the list of entities being accessed by a user of the plurality of users. The method also includes identifying a first entity accessed by a group of users of the plurality of users. The method further includes determining a first set of entities accessed by each user of the group of users, the first set of entities being included in the list of entities. The method also includes removing the first entity and the first set of entities from the list of entities. The method further includes forming a first community including the group of users, the first entity, and the first set of entities.
Abstract:
A system, method, and machine-readable storage medium for resolving a candidate community are provided. In some embodiments, a method includes obtaining a candidate community and a neighbor set for the candidate community, the neighbor set including zero or more stable communities. The method also includes resolving the candidate community as being a new stable community if the neighbor set is empty. The method further includes resolving the candidate community as being part of a matching stable community if a hash value of the candidate community matches a hash value of one or more stable communities included in the neighbor set. The method also includes resolving the candidate community as being a new stable community if an entropy value is greater than a threshold, the entropy value being based on the candidate community and the neighbor set.
Abstract:
Methods and systems for document classification are provided. One method includes generating by a processor, a plurality of topics using content of a plurality of electronic documents, where each topic includes a plurality of words associated with the plurality of electronic documents; reducing by the processor, the plurality of topics to a subset of topics to represent the plurality of electronic documents based on a parameter indicating a property of each subset topic and separation between the subset topics; automatically generating by the processor, a tag for each subset topic, based on the tag's position within the subset topic; wherein each tag is an attribute of each subset topic; storing by the processor, the subset of topics with corresponding tags in a model data structure; and updating the model data structure by the processor based on one of a new topic and a new tag associated with an electronic document.
Abstract:
Systems and methods disclosed herein provide intelligent filtering of system log messages having low utility value. In providing the filtering, the systems and methods determine the utility value of a system log message and delete the message from the system log if the message is determined to be of low utility value. As such, embodiments herein provide an system log filter, which reduces the amount of data stored in the system log based on the utility value of the message.
Abstract:
Methods and systems for securing unstructured data are provided. One method includes generating, by a processor, a schema from unstructured data, the schema including one or more relationships between named entities of the unstructured data; identifying, by the processor, a plurality of semantic relationships between the named entities; determining, by the processor, a sensitive relationship from the plurality of semantic relationships; and anonymizing, by the processor, sensitive data associated with the sensitive relationship by replacing, a first portion of the sensitive data with generalized information.
Abstract:
A method, a computing device, and a non-transitory machine-readable medium for detecting malware attacks. In one example, an agent implemented in an operating system detects an overwrite in which an original data component is overwritten with a new data component. The agent computes a plurality of features associated with the overwrite, the plurality of features including an original entropy corresponding to the original data component, a new entropy corresponding to the new data component, an overwrite fraction, and a set of divergence features. The agent determines whether the new data component is encrypted using the plurality of features.
Abstract:
A system, method, and machine-readable storage medium for resolving a candidate community are provided. In some embodiments, a method includes obtaining a candidate community and a neighbor set for the candidate community, the neighbor set including zero or more stable communities. The method also includes resolving the candidate community as being a new stable community if the neighbor set is empty. The method further includes resolving the candidate community as being part of a matching stable community if a hash value of the candidate community matches a hash value of one or more stable communities included in the neighbor set. The method also includes resolving the candidate community as being a new stable community if an entropy value is greater than a threshold, the entropy value being based on the candidate community and the neighbor set.
Abstract:
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.