摘要:
Methods, apparatus and computer programs are provided for characterizing Web-based information resources based on their interactions. A Web-based information resource is a single Web document or a collection of related Web documents. Unlike simple text documents, Web documents contain hyperlinks and other HTML tags. Different types of interactions, including inbound hyperlinks, outbound hyperlinks and internal links associated with a Web-based information resource, are used to characterize the Web-based information resource. A DOM tree representing the tag structure of a Web-based information resource is used to identify text items likely to be useful as context for a hyperlink anchor text, and the anchor text is combined with the context to generate a representation. The representation of Web-based information resources based on interactions can be used for clustering and classification, and in Web mining applications such as query disambiguation and automatic taxonomy generation.
摘要:
Methods, apparatus and computer programs are provided for characterizing Web-based information resources based on their interactions. A Web-based information resource is a single Web document or a collection of related Web documents. Unlike simple text documents, Web documents contain hyperlinks and other HTML tags. Different types of interactions, including inbound hyperlinks, outbound hyperlinks and internal links associated with a Web-based information resource, are used to characterize the Web-based information resource. A DOM tree representing the tag structure of a Web-based information resource is used to identify text items likely to be useful as context for a hyperlink anchor text, and the anchor text is combined with the context to generate a representation. The representation of Web-based information resources based on interactions can be used for clustering and classification, and in Web mining applications such as query disambiguation and automatic taxonomy generation.
摘要:
The embodiments of the invention provide a method for the automatic identification of changing subtopics within topics. The method begins by receiving customer satisfaction data having unstructured data objects. Next, the data objects are automatically categorized into pre-defined topics, wherein the pre-defined topics do not change throughout the customer satisfaction analysis. The pre-defined topics can be automatically defined based on a history of customer satisfaction data. Following this, a clustering analysis is automatically performed to identify subtopics of the data objects within the pre-defined topics. The subtopics are more specific than the pre-defined topics, and the subtopics can change. Further, the clustering analysis can include extracting features from the data objects and grouping the features into the subtopics. Each of the subtopics includes features having a predetermined degree of similarity.
摘要:
Web pages are previewed without actually having to browse to those web pages. A method is performed in relation to a first web page being browsed by a user and that has a hyperlink to a second web page. The second web page is acquired, and a site-specific preview, a user-specific preview, and a time-specific preview of the second web page are constructed. The site-specific preview is specific to a web site encompassing the second web page. The user-specific preview is specific to the user browsing the first web page. The time-specific preview is nominally specific to a time at which the user previews the second web page. These three previews are combined into an overall preview. In response to the user performing an action in relation to the hyperlink on the first web page, the overall preview of the second web page is displayed without browsing to that page.
摘要:
A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a specified number of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence clusters within sequences of the collection.
摘要:
A method, system, and computer program product for translating a text file are disclosed. A text file in a source language is received and text snippets from the text file are extracted. The text snippets are distributed to a first set of remote workers for translation. The translated text snippets are validated by a second set of remote workers and the validated text snippets are used to generate a translated text file.
摘要:
Partial access to electronic documents and aggregation for secure document distribution is disclosed. The embodiments herein relate to providing access to electronic documents and, more particularly, to providing access to portions of electronic documents and aggregating such portions in secure document distribution environment. Existing document distribution mechanisms do not provide means to access partial documents based on the attributes such as roles of the agents within an organization, location of access, time of access, device ID and so on. The disclosed method allows agents to access partial contents of documents based on the attributes. Meta data tags are attached to the documents in order to control the access of the documents by the defined attributes. The agent who wishes to access the document enters his credential and based on the credentials he is provided access to the content that is assigned for him
摘要:
The present disclosure provides a method for incenting potential contributors for creating content in response to a posting. The method comprises: posting a task to a first crowdsource with the task having a first expiry period of δ1; waiting for δ1 period to expire; determining whether the task is complete; reposting the task if not complete including a second expiry period of δ2; waiting for the second period of δ2 to expire; reposting the task if not yet complete including an increased reward and a third expiry period of δ3; waiting for the third period of δ3 to expire; and, reposting the task if still not complete, wherein the reposting includes a second crowdsource.
摘要:
The application discloses systems and methods for physically sharing a hard copy of a document. The systems and methods include presenting to a user a graphical user interface having printing options for printing the document, where the graphical user interface has an input for receiving an indication by the user that the user is willing to share the hard copy of the document; presenting to the user options for defining characteristics of the hard copy of the document in response to receiving the indication; and publishing at least one of the defined characteristics within a profile page of the user.
摘要:
Partial access to electronic documents and aggregation for secure document distribution is disclosed. The embodiments herein relate to providing access to electronic documents and, more particularly, to providing access to portions of electronic documents and aggregating such portions in secure document distribution environment. Existing document distribution mechanisms do not provide means to access partial documents based on the attributes such as roles of the agents within an organization, location of access, time of access, device ID and so on. The disclosed method allows agents to access partial contents of documents based on the attributes. Meta data tags are attached to the documents in order to control the access of the documents by the defined attributes. The agent who wishes to access the document enters his credential and based on the credentials he is provided access to the content that is assigned for him