摘要:
The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.
摘要:
The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.
摘要:
Methods and arrangements for automatically finding the dependency of a software product on other software products or components. From an install image or directory, a signature is found by deriving the same from a directory structure of the software. Further, a directory tree structure is built and an approximate sub-tree matching algorithm is applied to find commonalties across software products.
摘要:
Data obfuscation of text data using entity detection and replacement A data obfuscation method, apparatus and computer program product are disclosed in which at least selected text entities such as words or abbreviations in a document are obfuscated to prevent the disclosure of private information if the document is disclosed. A user establishes various configuration parameters for selected text entities desired to obfuscated. The document is processed and text entities matching the configuration parameters are tagged for obfuscation. The tagged entities are then substituted in the document with obfuscating text. The obfuscating text can be derived from a hash table. The hash table may be used to provide a reverse obfuscation method by which original data can be restored to an obfuscated document.
摘要:
Methods, systems and computer program products for automatically enforcing obligations in accordance with a data-handling policy are disclosed. Requests by users for accessing data stored in a data repository are intercepted. A determination is made whether any obligations apply to each data item requested in accordance with the data handling policy. The determination may relate to whether rules having associated obligations identified in the data-handling policy apply to data items requested by a user. The obligations are automatically executed at an appropriate time after access of the data. Association of a data item requested by the user with an obligation may be recorded and tracked to determine the appropriate time for executing the obligation.
摘要:
Methods and arrangements for document quality assessment. Documents are accepted and a quality specification containing predetermined quality criteria is assimilated. Each document is assessed based on the predetermined quality criteria, and a quality score is assigned to each document, the quality score being a function of positive and negative attributes assessed for each document.
摘要:
Harvesting assets for packaged application practices, in one aspect, may include obtaining one or more work products associated with deployment of packaged software applications, extracting content and style, enhancing content and style with models of work products, and storing assets in asset repository.
摘要:
A data obfuscation method, apparatus and computer program product are disclosed in which at least selected text entities such as words or abbreviations in a document are obfuscated to prevent the disclosure of private information if the document is disclosed. A user establishes various configuration parameters for selected text entities desired to obfuscated. The document is processed and text entities matching the configuration parameters are tagged for obfuscation. The tagged entities are then substituted in the document with obfuscating text. The obfuscating text can be derived from a hash table. The hash table may be used to provide a reverse obfuscation method by which original data can be restored to an obfuscated document.
摘要:
A data obfuscation method, apparatus and computer program product are disclosed in which at least selected text entities such as words or abbreviations in a document are obfuscated to prevent the disclosure of private information if the document is disclosed. A user establishes various configuration parameters for selected text entities desired to obfuscated. The document is processed and text entities matching the configuration parameters are tagged for obfuscation. The tagged entities are then substituted in the document with obfuscating text. The obfuscating text can be derived from a hash table. The hash table may be used to provide a reverse obfuscation method by which original data can be restored to an obfuscated document.
摘要:
Methods and arrangements for automatically finding the dependency of a software product on other software products or components. From an install image or directory, a signature is found by deriving the same from a directory structure of the software. Further, a directory tree structure is built and an approximate sub-tree matching algorithm is applied to find commonalities across software products.