Filtering invalid tokens from a document using high IDF token filtering
    21.
    发明授权
    Filtering invalid tokens from a document using high IDF token filtering 有权
    使用高IDF令牌过滤从文档过滤无效令牌

    公开(公告)号:US07908279B1

    公开(公告)日:2011-03-15

    申请号:US11856581

    申请日:2007-09-17

    CPC分类号: G06F17/2211 Y10S707/917

    摘要: Systems and methods for filtering tokens from a document for determining whether the document describes substantially similar subject matter compared to another document are described. In one embodiment, a first document is obtained. This document is organized into a plurality of fields, and at least some of the fields include tokens representing the subject matter described by the document. A field of this document is selected and a token from within the selected field having the highest inverse document frequency (IDF) is selected. Those tokens that have a higher IDF than the selected token are removed. Using the remaining tokens, a determination is made as to whether the first document describes substantially similar subject matter to the subject matter described by a second document. An indication is provided as to whether the first document describes substantially similar subject matter to that described by a second document according to the determination.

    摘要翻译: 描述用于从文档过滤标记以确定文档是否描述与另一文档相比基本相似的主题的系统和方法。 在一个实施例中,获得第一文档。 该文档被组织成多个字段,并且至少一些字段包括表示文档描述的主题的令牌。 选择该文档的字段,并且选择具有最高逆文档频率(IDF)的所选字段内的令牌。 删除IDF高于所选令牌的令牌。 使用剩余的令牌,确定第一文档是否描述与第二文档描述的主题相当的主题。 提供关于第一文档是否根据确定描述与第二文档描述的主题相当的主题的指示。

    Comparison engine for identifying documents describing similar subject matter
    22.
    发明授权
    Comparison engine for identifying documents describing similar subject matter 有权
    用于识别描述相似主题的文档的比较引擎

    公开(公告)号:US07904462B1

    公开(公告)日:2011-03-08

    申请号:US11953726

    申请日:2007-12-10

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: G06Q30/06

    摘要: Systems and methods for determining whether a first document is a potential duplicate of a second document such that the two documents describe the same or substantially the same subject matter, wherein the first and second documents include attribute data in attribute fields. A set of rules is obtained for determining whether the first document is a potential duplicate of the second document. Moreover, for each rule in the set of rules, a determination is made as to whether data in a first set of attributes of the first document is contained in a second set of attributes of the second document. According to the results of the evaluated rules in the rules set, determining whether the first document is a potential duplicate of the second document. If, according to the evaluated rules in the rules set, the first document is determined to be a potential duplicate of the second document, storing a reference to the first document in a set of potential duplicates of the second document.

    摘要翻译: 用于确定第一文档是否是第二文档的潜在副本的系统和方法,使得两个文档描述相同或基本相同的主题,其中第一和第二文档包括属性字段中的属性数据。 获得一组用于确定第一文档是否是第二文档的潜在副本的规则。 此外,对于该组规则中的每个规则,确定第一文档的第一组属性中的数据是否包含在第二文档的第二组属性中。 根据规则集中评估规则的结果,确定第一个文档是否是第二个文档的潜在副本。 如果根据规则集中的评估规则,确定第一文档是第二文档的潜在副本,则将第一文档的引用存储在第二文档的一组潜在重复项中。

    Providing artifact and configuration cohesion across disparate portal application models
    23.
    发明授权
    Providing artifact and configuration cohesion across disparate portal application models 失效
    在不同的门户应用模型中提供工件和配置的凝聚力

    公开(公告)号:US07877465B2

    公开(公告)日:2011-01-25

    申请号:US10891287

    申请日:2004-07-14

    IPC分类号: G06F15/177

    CPC分类号: G06F17/24

    摘要: Under the present invention, a client-based editor is launched (e.g., from a web server or the like) within a client interface such as a browser. Upon being launched, initial configuration parameters are passed from a portal server to the editor. The present invention also provides a “communications tunnel” between the editor and the portal server in the form of a portlet interface on the web server. This is so that any characteristics expressed by the portal server (e.g., changes to the initial configuration parameters) can be pushed to the editor. Moreover, the portlet interface allows the editor to query the portal server to obtain any needed services (e.g. a spreadsheet computation).

    摘要翻译: 在本发明的基础上,在诸如浏览器的客户端界面中启动基于客户端的编辑器(例如,从web服务器等)。 启动后,初始配置参数从门户服务器传递到编辑器。 本发明还以Web服务器上的Portlet接口的形式提供编辑器和门户服务器之间的“通信隧道”。 这使得门户服务器表达的任何特征(例如,对初始配置参数的改变)都可以被推送到编辑器。 此外,portlet接口允许编辑器查询门户服务器以获得任何所需的服务(例如电子表格计算)。

    Reverse associate website discovery

    公开(公告)号:US10013699B1

    公开(公告)日:2018-07-03

    申请号:US13170043

    申请日:2011-06-27

    IPC分类号: G06Q30/00 G06Q30/02

    CPC分类号: G06Q30/0214 G06Q30/0211

    摘要: Extracting content from an associate website may enable a host website to gain insight into web content that are effective at driving consumers to the host website. The content extraction may involve selecting an associate website from multiple associate websites for content extraction, with the associate website including a referral link to an item for sale on the host merchant website. Content may be obtained from one or more web pages of the associate website, and at least a part of the content may be associated with the item that is listed for sale on the host website.

    Proactive Pricing
    25.
    发明授权
    Proactive Pricing 有权
    主动定价

    公开(公告)号:US09324109B1

    公开(公告)日:2016-04-26

    申请号:US12039816

    申请日:2008-02-29

    IPC分类号: G06Q30/00 G06Q30/08

    CPC分类号: G06Q30/08

    摘要: Disclosed are various embodiments of systems, methods and computer programs for proactive pricing. An offer to sell a product extended by a seller is maintained in a server. The offer to sell includes a plurality of asking terms and at least one selling rule authorizing a deviation from the asking terms and that is associated with the offer. A plurality of purchase offers from at least one buyer to purchase the product is maintained in the server. Each of the purchase offers specifies at least one purchase term. The purchase offers are ranked based upon a degree to which the respective purchase terms match the asking terms.

    摘要翻译: 公开了用于主动定价的系统,方法和计算机程序的各种实施例。 卖方延期销售产品的要约在服务器中维护。 要约出售包括多个要约条款和至少一个销售规则,授权偏离询问条款并与报价相关联。 在服务器中维护来自至少一个购买者购买产品的多个购买报价。 每个购买报价至少指定一个购买条款。 购买优惠根据相应购买条款与询价条件匹配的程度进行排名。

    Haggling in an electronic commerce system
    27.
    发明授权
    Haggling in an electronic commerce system 有权
    在电子商务系统中讨价还价

    公开(公告)号:US08108262B1

    公开(公告)日:2012-01-31

    申请号:US12039784

    申请日:2008-02-29

    IPC分类号: G06Q30/00

    CPC分类号: G06Q30/0601

    摘要: Disclosed are various embodiments of systems, methods, and computer programs that facilitate haggling in an electronic commerce system. An average spread of a user is calculated, which is the average difference between an initial list price and a final transaction price among transactions in a transaction history. A rounds score is also calculated, which is based on the number of counteroffers extended by a user in the transaction history. A volume score is calculated and based on the volume of transactions a user has consummated in the transaction history. An abandonment score is calculated and based on the percentage of transactions the user has abandoned. A haggling rating is calculated and based on a combination of the average spread, the rounds score, the volume score, and the abandonment score, and represents an effectiveness of the user in haggling and completing transactions with other users.

    摘要翻译: 公开了促进电子商务系统中的讨价还价的系统,方法和计算机程序的各种实施例。 计算用户的平均差价,这是交易历史中的交易之间的初始清单价格与最终交易价格之间的平均差额。 还计算回合得分,其基于用户在交易历史中扩展的抵销额的数量。 计算体积分数,并根据用户在交易历史中完成的交易量。 计算放弃分数,并根据用户放弃的交易百分比。 计算一个讨价还价的评级,并且基于平均差分,回合分数,体积分数和放弃分数的组合,并且表示用户在与其他用户交涉和完成交易中的有效性。

    Duplicate entry detection system and method
    28.
    发明授权
    Duplicate entry detection system and method 有权
    重复条目检测系统和方法

    公开(公告)号:US08046372B1

    公开(公告)日:2011-10-25

    申请号:US11754237

    申请日:2007-05-25

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30616

    摘要: A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.

    摘要翻译: 一种计算机系统和方法,用于确定在接收到的文档中描述的主题与文档语料库中的其他文档的主题是否基本相似,使得所接收的文档可以被认为是重复的文档。 在收到第一个文档之后,生成第一个文档的一组令牌。 执行令牌索引上的非字段相关搜索。 相关性搜索返回一组具有与每个候选文档相对应的分数的候选重复文档。 对于分数高于阈值的每个候选文档,对每个候选文档进行过滤以确定每个候选文档是否是第一个文档的真实副本。 然后提供一组具有不超过门槛的分数的候选文件,不被取消作为候选文件的资格。

    Identifying potential duplicates of a document in a document corpus
    29.
    发明授权
    Identifying potential duplicates of a document in a document corpus 有权
    在文档语料库中识别文档的潜在重复项

    公开(公告)号:US07895225B1

    公开(公告)日:2011-02-22

    申请号:US11952020

    申请日:2007-12-06

    IPC分类号: G06F7/00 G06F17/00

    摘要: According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document is provided. A source document is obtained. A list of queries corresponding to a source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.

    摘要翻译: 根据所公开的主题的方面,提供了一种用于从源文档的潜在重复的文档语料库中识别一组文档的方法。 得到一个源文件。 识别与源文档相对应的查询的列表。 在所识别的查询列表中的每个查询在文档语料库上执行,其中每个查询的执行产生标识文档语料库中的有序文档集合的相应结果集。 对于每个结果集中识别的每个文档,根据识别的文档在其结果集中的序数位置,为所识别的文档生成文档分数。 根据满足预定选择标准的所生成的文档分数来选择结果集的识别文档的子集。 识别的文档的所选子集被存储或显示。

    Generating similarity scores for matching non-identical data strings
    30.
    发明授权
    Generating similarity scores for matching non-identical data strings 有权
    生成匹配不相同数据字符串的相似度分数

    公开(公告)号:US07814107B1

    公开(公告)日:2010-10-12

    申请号:US11754241

    申请日:2007-05-25

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: G06F17/30011

    摘要: A system and method for determining the likelihood of two documents describing substantially similar subject matter is presented. A set of tokens for each of two documents is obtained, each set representing strings of characters found in the corresponding document. A matrix of token pairs is determined, each token pair comprising a token from each set of tokens. For each token pair in the matrix, a similarity score is determined. Those token pairs in the matrix with a similarity score above a threshold score are selected and added to a set of matched tokens. A similarity score for the two documents is determined according to the scores of the token pairs added to the set of matched tokens. The determined similarity score is provided as the likelihood that the first and second documents describing substantially similar subject matter.

    摘要翻译: 提出了一种用于确定描述基本相似主题的两个文档的可能性的系统和方法。 获得两个文档中的每一个的一组令牌,每组代表在相应文档中找到的字符串。 确定令牌对的矩阵,每个令牌对包括来自每组令牌的令牌。 对于矩阵中的每个令牌对,确定相似性得分。 选择具有相似性得分高于阈值分数的矩阵中的那些令牌对并将其添加到一组匹配的令牌中。 根据添加到匹配令牌集中的令牌对的分数来确定两个文档的相似性得分。 确定的相似度得分被提供为第一和第二文档描述基本相似的主题的可能性。