SYSTEM AND METHOD FOR STORING TEXT ANNOTATIONS WITH ASSOCIATED TYPE INFORMATION IN A STRUCTURED DATA STORE
    3.
    发明申请
    SYSTEM AND METHOD FOR STORING TEXT ANNOTATIONS WITH ASSOCIATED TYPE INFORMATION IN A STRUCTURED DATA STORE 有权
    在结构化数据存储中存储具有相关类型信息的文本注释的系统和方法

    公开(公告)号:US20090049021A1

    公开(公告)日:2009-02-19

    申请号:US12257110

    申请日:2008-10-23

    IPC分类号: G06F7/06 G06F17/30

    摘要: A text annotation structured storage system stores text annotations with associated type information in a structured data store. The present system persists or stores annotations in a structured data store in an indexable and queryable format. Exemplary structured data stores comprise XML databases and relational databases. The system exploits type information in a type system to develop corresponding schemas in a structured data model. The system comprises techniques for mapping annotations to an XML data model and a relational data model. The system captures various features of the type system, such as complex types and inheritance, in the schema for the persistent store. In particular, the repository provides support for path navigation over the hierarchical type system starting at any type.

    摘要翻译: 文本注释结构化存储系统将具有关联类型信息的文本注释存储在结构化数据存储中。 本系统以可索引和可​​查询的格式将批注持久存储在结构化数据存储中。 示例性结构化数据存储包括XML数据库和关系数据库。 系统利用类型系统中的类型信息来开发结构化数据模型中的相应模式。 该系统包括用于将注释映射到XML数据模型和关系数据模型的技术。 系统在持久存储的架构中捕获类型系统的各种功能,例如复杂类型和继承。 特别地,存储库提供对从任何类型开始的分层式系统的路径导航的支持。

    System and Method for an Online Advertising Exchange with Submarkets Formed by Portfolio Optimization
    4.
    发明申请
    System and Method for an Online Advertising Exchange with Submarkets Formed by Portfolio Optimization 审中-公开
    通过投资组合优化形成的子市场的在线广告交易系统和方法

    公开(公告)号:US20100250362A1

    公开(公告)日:2010-09-30

    申请号:US12415846

    申请日:2009-03-31

    IPC分类号: G06Q30/00 G06N5/02

    摘要: A system and method to distribute computation for an exchange in which advertisers buy online advertising space from publishers. The exchange maintains submarkets, each containing a subset of the ad calls supplied by publishers and a subset of the offers and budgets representing demand from advertisers. Portfolio optimization techniques allocate the supply of ad calls from publishers over the submarkets, with the goal of maximizing profits for publishers while limiting the volatility of those profits. Portfolio optimization techniques allocate the demand from advertisers over the submarkets, with the goal of maximizing return on investment for advertisers. The exchange re-allocates supply and demand over submarkets periodically. Also, periodically, the most effective submarkets are replicated and the least effective submarkets are eliminated.

    摘要翻译: 分发用于广告客户从发布商购买在线广告空间的交换计算的系统和方法。 交易所维护子市场,每个子市场包含发布商提供的广告呼叫的一部分,以及代表广告客户需求的提议和预算的一部分。 投资组合优化技术将发行商的广告电话分配给子市场,目的是最大化发布商的利润,同时限制这些利润的波动。 投资组合优化技术将广告客户的需求分配给子市场,目标是最大限度地提高广告客户的投资回报。 交易所定期重新分配子市场的供求。 此外,定期地,复制最有效的子市场,并且消除最不有效的子市场。

    REDUCING REVENUE RISK IN ADVERTISEMENT ALLOCATION
    5.
    发明申请
    REDUCING REVENUE RISK IN ADVERTISEMENT ALLOCATION 审中-公开
    降低广告分配中的收入风险

    公开(公告)号:US20100241486A1

    公开(公告)日:2010-09-23

    申请号:US12406469

    申请日:2009-03-18

    IPC分类号: G06Q30/00 G06Q10/00 G06N5/02

    摘要: Methods, systems, and apparatuses are provided for selecting advertisements in an advertisement auction. A plurality of bids for an advertisement placement is received. An average expected payout for each bid of the plurality of bids is calculated to determine a plurality of average expected payouts. A plurality of possible allocations of the advertisements is determined. An expected revenue value for each of the possible allocations is calculated based on the calculated average expected payouts to generate a plurality of expected revenue values. A risk value is calculated for each of the possible allocations to generate a plurality of risk values. A bid of the plurality of bids is enabled to be selected based on the calculated expected revenue values and risk values.

    摘要翻译: 提供了用于在广告拍卖中选择广告的方法,系统和装置。 接收用于广告布置的多个出价。 计算多个出价的每个出价的平均预期支付以确定多个平均预期支出。 确定广告的多个可能的分配。 基于计算的平均预期支出来计算每个可能分配的预期收入值,以生成多个预期收入值。 为每个可能的分配计算风险值以产生多个风险值。 可以基于计算出的预期收入值和风险值来选择多个出价的出价。

    APPROACHES FOR THE UNSUPERVISED CREATION OF STRUCTURAL TEMPLATES FOR ELECTRONIC DOCUMENTS
    6.
    发明申请
    APPROACHES FOR THE UNSUPERVISED CREATION OF STRUCTURAL TEMPLATES FOR ELECTRONIC DOCUMENTS 审中-公开
    不间断制作电子文件结构模板的方法

    公开(公告)号:US20100169311A1

    公开(公告)日:2010-07-01

    申请号:US12346483

    申请日:2008-12-30

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F16/951

    摘要: A method and apparatus for creating templates for electronic documents is provided. One or more attributes are extracted, using a seed template, from a first document, such as a web page. A second document that contains a particular attribute, extracted from the first document, is identified. The second document may be in a different cluster than the first document. The second document is annotated, using an extracted attribute, to create an annotated document. The second document is annotated without human intervention. A new template for the annotated document is generated. The new template facilitates extraction of information from the annotated document. The new template may be used to extract additional attributes from all documents in the cluster of documents of which the second document is a member. The process may continue over numerous iterations to generate a large number of templates in an automated fashion.

    摘要翻译: 提供了一种用于创建电子文档模板的方法和装置。 使用种子模板从第一文档(诸如网页)提取一个或多个属性。 识别从第一个文档中提取的包含特定属性的第二个文档。 第二个文档可能位于与第一个文档不同的集群中。 使用提取的属性对第二个文档进行注释,以创建一个带注释的文档。 第二个文件是没有人为干预的注释。 生成注释文档的新模板。 新模板有助于从注释文档中提取信息。 新模板可用于从第二个文档所属文档的集群中的所有文档中提取附加属性。 该过程可以在多次迭代中继续以自动方式生成大量模板。

    TECHNIQUES FOR CONSTRUCTING SITEMAP OR HIERARCHICAL ORGANIZATION OF WEBPAGES OF A WEBSITE USING DECISION TREES
    8.
    发明申请
    TECHNIQUES FOR CONSTRUCTING SITEMAP OR HIERARCHICAL ORGANIZATION OF WEBPAGES OF A WEBSITE USING DECISION TREES 审中-公开
    构建使用决策权的网站的网站地图或分层组织的技术

    公开(公告)号:US20090171986A1

    公开(公告)日:2009-07-02

    申请号:US11965320

    申请日:2007-12-27

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: A decision tree may be determined that is a site map for a domain of web pages. A clustering of a plurality of web pages of a domain is determined, in an unsupervised fashion, based on content-related features of the plurality of web pages. Each determined cluster includes a plurality of web pages, each of the plurality of web pages characterized by a resource locator and each of the resource locators being characterized by at least one resource locator token. The clustering is processed to organize indications of the content-related features of the plurality of web pages into a decision tree characterized by a plurality of nodes, each node characterized by a feature and a value, the feature being at least one of the resource locator tokens and the value being a value of that resource locator token.

    摘要翻译: 可以确定作为网页域的站点地图的决策树。 基于多个网页的内容相关特征,以无监督的方式确定域的多个网页的聚类。 每个确定的群集包括多个网页,所述多个网页中的每一个由资源定位符表征,并且每个资源定位符的特征在于至少一个资源定位符令牌。 处理群集以将多个网页的内容相关特征的指示组织成由多个节点表征的决策树,每个节点的特征在于特征和值,该特征是资源定位符中的至少一个 令牌,该值是该资源定位符令牌的值。

    Mining of generalized disjunctive association rules
    9.
    发明授权
    Mining of generalized disjunctive association rules 有权
    广义分离关联规则挖掘

    公开(公告)号:US06754651B2

    公开(公告)日:2004-06-22

    申请号:US09836118

    申请日:2001-04-17

    IPC分类号: G06F1730

    摘要: The present invention provides a system and a method for mining a new kind of association rules called disjunctive association rules, where the antecedent or the consequent of a rule may contain disjuncts of terms (XY or X⊕Y). Such rules are a natural generalisation to the kind of rules that have been mined hitherto. Furthermore, disjunctive association rules are generalised in the sense that the algorithm also mines rules which have disjunctions of conjuncts (C(AB)(DE)). Since the number of combinations of disjuncts is explosive, we use clustering to find a generalized subset. The said clustering is preferably performed using agglomerative clustering methods for finding the greedy subset.

    摘要翻译: 本发明提供了一种用于挖掘称为分离关联规则的新型关联规则的系统和方法,其中规则的先决条件或结果可以包含术语的分离(X Y或X⊕Y)。 这样的规则是对迄今为止开采的那种规则的自然概括。 此外,分离关联规则在一般意义上是泛化的,即该算法还采用具有联结分离的规则(C (A B)(D E) 。 由于分离组合的数量是爆炸性的,我们使用聚类来找到广义子集。 所述聚类优选使用用于发现贪婪子集的聚集聚类方法进行。

    Dynamically ranking nodes and labels in a hyperlinked database
    10.
    发明授权
    Dynamically ranking nodes and labels in a hyperlinked database 失效
    在超链接数据库中动态地排列节点和标签

    公开(公告)号:US07991755B2

    公开(公告)日:2011-08-02

    申请号:US11015989

    申请日:2004-12-17

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: The World Wide Web (WWW) can be modelled as a labelled directed graph G(V,E,L), in which V is the set of nodes, E is the set of edges, and L is a label function that maps edges to labels. This model, when applied to the WWW, indicates that V is a set of hypertext documents or objects, E is a set of hyperlinks connecting the documents in V, and the edge-label function represents the anchor-text corresponding to the hyperlinks. One can find a probabilistic ranking of the nodes for any given label, a ranking of the labels for any given node, and rankings of labels and pages using flow based models. Further, the flows can be computing using sparse matrix operations.

    摘要翻译: 万维网(WWW)可以被建模为标记有向图G(V,E,L),其中V是节点集合,E是边缘集合,L是将边缘映射到 标签。 该模型在应用于WWW时,表示V是一组超文本文件或对象,E是连接V中的文档的一组超链接,边缘标签功能表示与超链接相对应的锚文本。 可以找到任何给定标签的节点的概率排序,任何给定节点的标签排序,以及使用基于流的模型的标签和页面的排名。 此外,流可以使用稀疏矩阵运算进行计算。