摘要:
The embodiments of the invention provide a systems, methods, etc. for adaptive content processing and classification in a high-availability environment. More specifically, a system is provided having a plurality of processing engines and at least one server that classifies data objects on the computer system. The classification includes analyzing the data objects for the presence of a type of content. This can include assigning a score corresponding to the amount of the type of content in each of the data objects. Moreover, the server can remove a data object from the computer system based on the results of the analyzing. The results of the analyzing are stored and the computer system is updated with feedback information. This can include allowing a user to review the results of the analyzing and aggregating reviews of the user into the feedback information.
摘要:
Disclosed are embodiments a system and method for managing an on-line community. Electronic postings are pre-screened based on one or more metrics to determine a risk value indicative of the likelihood that an individual posting contains objectionable content. These metrics are based on the profile of a poster, including various parameters of the poster and/or the poster's record of objectionable content postings. These metrics can also be based on the social network profile of a poster, including the average of various parameters of other users in the poster's social network and/or a compiled record of objectionable content postings of other users in the poster's social network. If the risk value is relatively low, the posting can be displayed to the on-line community immediately. If the risk value is relatively high, display of the posting can be delayed until further content analysis is completed. Finally, if the risk value is above a predetermined high risk threshold value, the posting can be removed automatically.
摘要:
The embodiments of the invention provide a systems, methods, etc. for adaptive content processing and classification in a high-availability environment. More specifically, a system is provided having a plurality of processing engines and at least one server that classifies data objects on the computer system. The classification includes analyzing the data objects for the presence of a type of content. This can include assigning a score corresponding to the amount of the type of content in each of the data objects. Moreover, the server can remove a data object from the computer system based on the results of the analyzing. The results of the analyzing are stored and the computer system is updated with feedback information. This can include allowing a user to review the results of the analyzing and aggregating reviews of the user into the feedback information.
摘要:
Disclosed are embodiments a system and method for managing an on-line community. Electronic postings are pre-screened based on one or more metrics to determine a risk value indicative of the likelihood that an individual posting contains objectionable content. These metrics are based on the profile of a poster, including various parameters of the poster and/or the poster's record of objectionable content postings. These metrics can also be based on the social network profile of a poster, including the average of various parameters of other users in the poster's social network and/or a compiled record of objectionable content postings of other users in the poster's social network. If the risk value is relatively low, the posting can be displayed to the on-line community immediately. If the risk value is relatively high, display of the posting can be delayed until further content analysis is completed. Finally, if the risk value is above a predetermined high risk threshold value, the posting can be removed automatically.
摘要:
Disclosed are embodiments a system and method for managing an on-line community. Electronic postings are pre-screened based on one or more metrics to determine a risk value indicative of the likelihood that an individual posting contains objectionable content. These metrics are based on the profile of a poster, including various parameters of the poster and/or the poster's record of objectionable content postings. These metrics can also be based on the social network profile of a poster, including the average of various parameters of other users in the poster's social network and/or a compiled record of objectionable content postings of other users in the poster's social network. If the risk value is relatively low, the posting can be displayed to the on-line community immediately. If the risk value is relatively high, display of the posting can be delayed until further content analysis is completed. Finally, if the risk value is above a predetermined high risk threshold value, the posting can be removed automatically.
摘要:
A system and associated method for bulk processing of semi-structured results streams from many different resources ingest bytes, parse as many bytes as practical, and return to process additional bytes. The system processes network packets as they arrive from a computing resource, creating intermediate results. The intermediate results are held in a stack until sufficient information is accumulated. The system then merges the intermediate results to form a single document model. As network packets at one connection are consumed by the system, the system can select another connection at which packets are waiting for processing. The processing of a result at a connection can be interrupted while the system processes the results at another connection. In this manner, the system is able to utilize one thread to process many incoming results in parallel.
摘要:
Methods and systems for optimizing the retrieval of data from multiple sources are described. A slot map including slots for the storage of data elements can be obtained. The data elements associated with the slots can be prioritized by weighting values with costs of retrieving the data elements from respective data sources. Each value can be associated with a different data element and can indicate a respective degree of importance of the associated data element. Further, the systems and methods can direct the retrieval of data elements from the respective data sources in an order in accordance with the priority of the data elements to optimize the quality of data obtainable within a critical time constraint. In addition, the retrieved data elements can be stored in corresponding slots on a storage medium.
摘要:
Data deduplication compression in a streaming storage application, is provided. The disclosed deduplication process provides a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium. One implementation involves compressing fully sequential data stored in a data repository to a sequential streaming storage, by: splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments.
摘要:
A method for anonymization of unstructured data comprises determining structured references in the unstructured data; populating a table with the structured references; anonymizing the structured references in the table using ontological analysis; and rewriting the structured references in the unstructured data with the anonymized structured references from the table to produce anonymized data. A system for anonymizing unstructured data comprises an entity spotting module configured to determine structured references in the unstructured data and populate a table with the determined structured references; an anonymization module configured to anonymizing the structured references in the table using ontological analysis; and a replacement module configured to rewrite the structured references in the unstructured data with the anonymized structured references from the table to produce anonymized data.
摘要:
Methods and systems for optimizing the retrieval of data from multiple sources are described. A slot map including slots for the storage of data elements can be obtained. The data elements associated with the slots can be prioritized by weighting values with costs of retrieving the data elements from respective data sources. Each value can be associated with a different data element and can indicate a respective degree of importance of the associated data element. Further, the systems and methods can direct the retrieval of data elements from the respective data sources in an order in accordance with the priority of the data elements to optimize the quality of data obtainable within a critical time constraint. In addition, the retrieved data elements can be stored in corresponding slots on a storage medium.