摘要:
Systems and methods for filtering tokens from a document for determining whether the document describes substantially similar subject matter compared to another document are described. In one embodiment, a first document is obtained. This document is organized into a plurality of fields, and at least some of the fields include tokens representing the subject matter described by the document. A field of this document is selected and a token from within the selected field having the highest inverse document frequency (IDF) is selected. Those tokens that have a higher IDF than the selected token are removed. Using the remaining tokens, a determination is made as to whether the first document describes substantially similar subject matter to the subject matter described by a second document. An indication is provided as to whether the first document describes substantially similar subject matter to that described by a second document according to the determination.
摘要:
Systems and methods for determining whether a first document is a potential duplicate of a second document such that the two documents describe the same or substantially the same subject matter, wherein the first and second documents include attribute data in attribute fields. A set of rules is obtained for determining whether the first document is a potential duplicate of the second document. Moreover, for each rule in the set of rules, a determination is made as to whether data in a first set of attributes of the first document is contained in a second set of attributes of the second document. According to the results of the evaluated rules in the rules set, determining whether the first document is a potential duplicate of the second document. If, according to the evaluated rules in the rules set, the first document is determined to be a potential duplicate of the second document, storing a reference to the first document in a set of potential duplicates of the second document.
摘要:
Under the present invention, a client-based editor is launched (e.g., from a web server or the like) within a client interface such as a browser. Upon being launched, initial configuration parameters are passed from a portal server to the editor. The present invention also provides a “communications tunnel” between the editor and the portal server in the form of a portlet interface on the web server. This is so that any characteristics expressed by the portal server (e.g., changes to the initial configuration parameters) can be pushed to the editor. Moreover, the portlet interface allows the editor to query the portal server to obtain any needed services (e.g. a spreadsheet computation).
摘要:
Extracting content from an associate website may enable a host website to gain insight into web content that are effective at driving consumers to the host website. The content extraction may involve selecting an associate website from multiple associate websites for content extraction, with the associate website including a referral link to an item for sale on the host merchant website. Content may be obtained from one or more web pages of the associate website, and at least a part of the content may be associated with the item that is listed for sale on the host website.
摘要:
Disclosed are various embodiments of systems, methods and computer programs for proactive pricing. An offer to sell a product extended by a seller is maintained in a server. The offer to sell includes a plurality of asking terms and at least one selling rule authorizing a deviation from the asking terms and that is associated with the offer. A plurality of purchase offers from at least one buyer to purchase the product is maintained in the server. Each of the purchase offers specifies at least one purchase term. The purchase offers are ranked based upon a degree to which the respective purchase terms match the asking terms.
摘要:
Enabling network-accessible applications to be integrated into content aggregation frameworks (such as portals) and to become dynamically interactive through proxying components (such as proxying portlets), thereby providing run-time cooperation and data sharing.
摘要:
Disclosed are various embodiments of systems, methods, and computer programs that facilitate haggling in an electronic commerce system. An average spread of a user is calculated, which is the average difference between an initial list price and a final transaction price among transactions in a transaction history. A rounds score is also calculated, which is based on the number of counteroffers extended by a user in the transaction history. A volume score is calculated and based on the volume of transactions a user has consummated in the transaction history. An abandonment score is calculated and based on the percentage of transactions the user has abandoned. A haggling rating is calculated and based on a combination of the average spread, the rounds score, the volume score, and the abandonment score, and represents an effectiveness of the user in haggling and completing transactions with other users.
摘要:
A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.
摘要:
According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document is provided. A source document is obtained. A list of queries corresponding to a source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
摘要:
A system and method for determining the likelihood of two documents describing substantially similar subject matter is presented. A set of tokens for each of two documents is obtained, each set representing strings of characters found in the corresponding document. A matrix of token pairs is determined, each token pair comprising a token from each set of tokens. For each token pair in the matrix, a similarity score is determined. Those token pairs in the matrix with a similarity score above a threshold score are selected and added to a set of matched tokens. A similarity score for the two documents is determined according to the scores of the token pairs added to the set of matched tokens. The determined similarity score is provided as the likelihood that the first and second documents describing substantially similar subject matter.