System and method of feature selection for text classification using subspace sampling
    31.
    发明授权
    System and method of feature selection for text classification using subspace sampling 有权
    使用子空间采样的文本分类的特征选择的系统和方法

    公开(公告)号:US08046317B2

    公开(公告)日:2011-10-25

    申请号:US12006178

    申请日:2007-12-31

    IPC分类号: G06N5/00

    摘要: An improved system and method is provided for feature selection for text classification using subspace sampling. A text classifier generator may be provided for selecting a small set of features using subspace sampling from the corpus of training data to train a text classifier for using the small set of features for classification of texts. To select the small set of features, a subspace of features from the corpus of training data may be randomly sampled according to a probability distribution over the set of features where a probability may be assigned to each of the features that is proportional to the square of the Euclidean norms of the rows of left singular vectors of a matrix of the features representing the corpus of training texts. The small set of features may classify texts using only the relevant features among a very large number of training features.

    摘要翻译: 提供了一种改进的系统和方法,用于使用子空间采样进行文本分类的特征选择。 可以提供文本分类器生成器,用于使用来自训练数据语料库的子空间采样来选择一小组特征,以训练文本分类器以使用用于分类文本的小的特征集合。 为了选择一小组特征,可以根据训练数据语料库的特征的子空间根据特征集合上的概率分布来随机抽样,其中概率可以分配给与 表示训练文本语料库的特征矩阵的左奇异矢量行的欧几里得规范。 一小部分功能可以仅使用相当的特征来分类文本,这些功能包含大量的训练特征。

    Automatic Management of Networked Publisher-Subscriber Relationships
    32.
    发明申请
    Automatic Management of Networked Publisher-Subscriber Relationships 审中-公开
    网络发布者 - 用户关系的自动管理

    公开(公告)号:US20110208559A1

    公开(公告)日:2011-08-25

    申请号:US12711873

    申请日:2010-02-24

    IPC分类号: G06Q10/00 G06Q50/00 G06T11/20

    摘要: Automatic management of networked publisher-subscriber relationships in an advertising server network. The method comprises steps for constructing a directed graph representation comprising at least one publisher node (e.g. an Internet property), at least one subscriber node (e.g. an Internet advertiser), at least one intermediary node (e.g. an Internet advertising agent), and at least one edge (e.g. an advertising target predicate) wherein any one of the edges is directly associated with at least one target predicate. The directed graph representation is used in conjunction with an inverted index for retrieving a valid node list comprising only nodes having at least one target predicate that matches at least one event predicate. The event predicate (as well as any target predicate) is any arbitrarily complex Boolean expression, and is used in producing a result node list comprising only nodes that concurrently match the event predicate with an advertising target predicate and are reachable.

    摘要翻译: 在广告服务器网络中自动管理网络发布者 - 订户关系。 该方法包括用于构建包括至少一个发布者节点(例如,因特网属性),至少一个订户节点(例如,因特网广告商),至少一个中间节点(例如,因特网广告代理)以及在 至少一个边缘(例如,广告目标谓词),其中边缘中的任何一个与至少一个目标谓词直接相关联。 有向图表示与用于检索仅包括至少一个与至少一个事件谓词匹配的目标谓词的节点的有效节点列表结合使用。 事件谓词(以及任何目标谓词)是任意复杂的布尔表达式,并且用于生成结果节点列表,其中仅包含与广告目标谓词并发匹配事件谓词并且可访问的节点。

    Using intra-document indices to improve XQuery processing over XML streams
    33.
    发明授权
    Using intra-document indices to improve XQuery processing over XML streams 有权
    使用文档内索引来通过XML流来改进XQuery处理

    公开(公告)号:US07991786B2

    公开(公告)日:2011-08-02

    申请号:US10723391

    申请日:2003-11-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30911

    摘要: A system and method for parsing documents in query processing comprises producing at least one index of a document written in a mark-up language, corresponding the index to the document, scanning the document, and selectively skipping portions of the document based on instructions from the index. Furthermore, the mark-up language comprises any of HTML and XML; the skipped portions of the document comprise portions irrelevant to the query; the index comprises a plurality of elements representing textual categories of the query; and the instructions match the elements to the query. If the elements do not match the query, then the parser uses the index to skip the portions of the document corresponding to the unmatched elements. Moreover, each of the elements corresponds to a position in the document, wherein the position comprises an end position, which determines where to resume scanning the document upon skipping the portions of the document.

    摘要翻译: 用于在查询处理中解析文档的系统和方法包括生成以标记语言书写的文档的至少一个索引,对应于该文档的索引,扫描该文档,以及基于来自该文档的指令来选择性地跳过文档的部分 指数。 此外,标记语言包括HTML和XML中的任何一种; 文档的跳过部分包括与查询无关的部分; 所述索引包括表示所述查询的文本类别的多个元素; 并且指令与查询的元素匹配。 如果元素与查询不匹配,则解析器使用索引来跳过对应于不匹配元素的文档部分。 此外,每个元素对应于文档中的位置,其中该位置包括结束位置,其确定在跳过文档的部分时恢复扫描文档的位置。

    System and method for budgeted generalization search in hierarchies
    34.
    发明授权
    System and method for budgeted generalization search in hierarchies 有权
    用于层次结构的预算泛化搜索的系统和方法

    公开(公告)号:US07991769B2

    公开(公告)日:2011-08-02

    申请号:US11483048

    申请日:2006-07-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30646 G06F17/30864

    摘要: An improved system and method is provided for searching a collection of objects that may be located in hierarchies of auxiliary information for retrieval of response objects. A framework to perform a generalization search in hierarchies may be used to generalize a search by moving up to a higher level in a hierarchy of taxonomies or to specialize a search by moving down to a lower level in the hierarchy of taxonomies. Once the system may decide to enumerate response objects at a particular level of generalization, a budgeted generalization search may be used for enumerating a set of response objects within a budgeted cost.

    摘要翻译: 提供了一种改进的系统和方法,用于搜索可能位于用于检索响应对象的辅助信息的层级中的对象的集合。 在层次结构中执行泛化搜索的框架可以用于通过在分类法的层次结构中移动到更高级别来推广搜索,或者通过向下移动到分类法层级中的较低级别来专门化搜索。 一旦系统可以决定在特定的泛化级别枚举响应对象,则可以使用预算的泛化搜索来枚举在预算成本内的一组响应对象。

    Formal language and translator for parallel processing of data
    35.
    发明授权
    Formal language and translator for parallel processing of data 有权
    用于并行处理数据的正式语言和翻译器

    公开(公告)号:US07921416B2

    公开(公告)日:2011-04-05

    申请号:US11551336

    申请日:2006-10-20

    IPC分类号: G06F9/45

    CPC分类号: G06F17/30427 G06F17/3041

    摘要: The present invention, in an example embodiment, provides a special-purpose formal language and translator for the parallel processing of large databases in a distributed system. The special-purpose language has features of both a declarative programming language and a procedural programming language and supports the co-grouping of tables, each with an arbitrary alignment function, and the specification of procedural operations to be performed on the resulting co-groups. The language's translator translates a program in the language into optimized structured calls to an application programming interface for implementations of functionality related to the parallel processing of tasks over a distributed system. In an example embodiment, the application programming interface includes interfaces for MapReduce functionality, whose implementations are supplemented by the embodiment.

    摘要翻译: 本发明在一个示例性实施例中提供了用于并行处理分布式系统中的大型数据库的专用形式语言和翻译器。 专用语言具有声明式编程语言和程序性编程语言的特征,并且支持表的共同分组,每个表具有任意对齐功能,以及对所得到的协同组执行的过程操作的说明。 语言的翻译者将语言中的程序转换为应用程序编程接口的优化结构化调用,以实现与分布式系统上并行处理任务相关的功能。 在示例实施例中,应用编程接口包括用于MapReduce功能的接口,其实现由该实施例补充。

    Querying markup language data sources using a relational query processor
    37.
    发明授权
    Querying markup language data sources using a relational query processor 有权
    使用关系查询处理器查询标记语言数据源

    公开(公告)号:US07844629B2

    公开(公告)日:2010-11-30

    申请号:US11837567

    申请日:2007-08-13

    IPC分类号: G06F17/30

    摘要: An XML wrapper queries an XML document in an on-the-fly manner so that only parent nodes in the document that satisfy the query are extracted and then unnested. The parent nodes and associated descendent nodes are located using XPath expressions contained as options in data definition language (DDL) statements. The parent nodes satisfying the query and associated descendent nodes are extracted and stored outside of a database according to a relational schema. The wrapper facilitates applications that use convention SQL queries and views to operate on that information stored according to the relational schema. The wrapper also responds to query optimizer requests for costs associated with queries against external data sources associated with the wrapper.

    摘要翻译: XML包装器以动态方式查询XML文档,从而仅提取满足查询的文档中的父节点,然后不需要。 使用包含在数据定义语言(DDL)语句中的选项的XPath表达式来定位父节点和关联的后代节点。 满足查询和相关后代节点的父节点根据关系模式提取存储在数据库外部。 包装器便于使用常规SQL查询和视图的应用程序对根据关系模式存储的信息进行操作。 包装器还响应查询优化程序请求与与包装器相关联的外部数据源的查询相关的成本。

    METHOD AND SYSTEM FOR SELECTING ADVERTISEMENTS
    39.
    发明申请
    METHOD AND SYSTEM FOR SELECTING ADVERTISEMENTS 审中-公开
    选择广告的方法和系统

    公开(公告)号:US20100121706A1

    公开(公告)日:2010-05-13

    申请号:US12269365

    申请日:2008-11-12

    IPC分类号: G06Q30/00 G06Q40/00 G06Q90/00

    摘要: A system for selecting advertisements for a web page. The system includes an advertisement serving and optimization engine that receives an advertisement request. The advertisement serving and optimization engine evaluates the web page and identifies content attributes based on the content of the web page. The advertisement serving and optimization engine accesses a database that stores an association between the content attribute and an advertisement attribute, where the advertisement attribute is not lexically related to the content attribute. An association engine is also in communication with the database to define and store the attribute association between the advertisement attribute and the content attribute. The association engine generates attribute associations by evaluating external data sources. The advertisement attribute is used to retrieve advertisement results for display on the web page.

    摘要翻译: 一种用于选择网页的广告的系统。 该系统包括接收广告请求的广告服务和优化引擎。 广告服务和优化引擎根据网页的内容来评估网页并识别内容属性。 广告服务和优化引擎访问存储内容属性和广告属性之间的关联的数据库,其中广告属性不与内容属性词法相关。 关联引擎还与数据库通信以定义和存储广告属性和内容属性之间的属性关联。 关联引擎通过评估外部数据源来生成属性关联。 广告属性用于检索在网页上显示的广告结果。