摘要:
The embodiments of the invention provide a method for the automatic identification of changing subtopics within topics. The method begins by receiving customer satisfaction data having unstructured data objects. Next, the data objects are automatically categorized into pre-defined topics, wherein the pre-defined topics do not change throughout the customer satisfaction analysis. The pre-defined topics can be automatically defined based on a history of customer satisfaction data. Following this, a clustering analysis is automatically performed to identify subtopics of the data objects within the pre-defined topics. The subtopics are more specific than the pre-defined topics, and the subtopics can change. Further, the clustering analysis can include extracting features from the data objects and grouping the features into the subtopics. Each of the subtopics includes features having a predetermined degree of similarity.
摘要:
Methods, apparatus and computer programs are provided for characterizing Web-based information resources based on their interactions. A Web-based information resource is a single Web document or a collection of related Web documents. Unlike simple text documents, Web documents contain hyperlinks and other HTML tags. Different types of interactions, including inbound hyperlinks, outbound hyperlinks and internal links associated with a Web-based information resource, are used to characterize the Web-based information resource. A DOM tree representing the tag structure of a Web-based information resource is used to identify text items likely to be useful as context for a hyperlink anchor text, and the anchor text is combined with the context to generate a representation. The representation of Web-based information resources based on interactions can be used for clustering and classification, and in Web mining applications such as query disambiguation and automatic taxonomy generation.
摘要:
The present invention provides a method, system and computer program product for profiling an entity based on information obtained form at least one information source. Various contexts associated with the entity are identified. This can be achieved by using a clustering algorithm, an ontology, a thesaurus, association rules or manually by an expert. After the classified into various sets and ranked using a ranking algorithm. Thereafter, certain top ranked concepts are presented to a user as the profile of the entity.
摘要:
Documents are represented based on their structure, which arises from the relationship between various elements in the document. After representing documents based on their structure in vector form, a method of measuring similarity between vectors is used to obtain the measure of structural similarity between two given documents.
摘要:
Web pages are previewed without actually having to browse to those web pages. A method is performed in relation to a first web page being browsed by a user and that has a hyperlink to a second web page. The second web page is acquired, and a site-specific preview, a user-specific preview, and a time-specific preview of the second web page are constructed. The site-specific preview is specific to a web site encompassing the second web page. The user-specific preview is specific to the user browsing the first web page. The time-specific preview is nominally specific to a time at which the user previews the second web page. These three previews are combined into an overall preview. In response to the user performing an action in relation to the hyperlink on the first web page, the overall preview of the second web page is displayed without browsing to that page.
摘要:
A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognize factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.
摘要:
Methods and arrangements for automatically finding the dependency of a software product on other software products or components. From an install image or directory, a signature is found by deriving the same from a directory structure of the software. Further, a directory tree structure is built and an approximate sub-tree matching algorithm is applied to find commonalties across software products.
摘要:
Techniques for protecting information in an audio file are provided. The techniques include obtaining an audio file, detecting information bearing one or more segments in a speech signal, wherein the information comprises information sought for protection, encrypting the information sought for protection by scrambling the one or more segments using a scrambling filter, and selectively decrypting an amount of the encrypted information, wherein the amount of the encrypted information to be decrypted depends on user access privilege, and wherein selectively decrypting the amount of the encrypted information protects said amount of the encrypted information. Techniques are also provided for protecting information in an audio file.
摘要:
Techniques for extracting information from a formatted document are provided. The techniques include combining one or more visual layout rules, one or more mark-up rules and one or more text-based rules in connection with a formatted document, and specifying one or more rules from the one or more visual layout rules, one or more mark-up rules and one or more text based rules to extract information from the formatted document.
摘要:
Techniques for protecting information in an audio file are provided. The techniques include obtaining an audio file, detecting information bearing one or more segments in a speech signal, wherein the information comprises information sought for protection, encrypting the information sought for protection by scrambling the one or more segments using a scrambling filter, and selectively decrypting an amount of the encrypted information, wherein the amount of the encrypted information to be decrypted depends on user access privilege, and wherein selectively decrypting the amount of the encrypted information protects said amount of the encrypted information. Techniques are also provided for protecting information in an audio file.