摘要:
Provided are techniques for processing a query. A query is received, wherein the query is formed by one or more paths, and wherein each path includes one or more steps. A hierarchical document including one or more document nodes is received. While processing the query and traversing the hierarchical document, one or more extraction entries are constructed, wherein each extraction entry includes a step instance match candidate identifying a document node and a step instance ancestor path for the document node, and one or more tuples are constructed using the one or more extraction entries by associating the step instance match candidate from one of the one or more extraction entries with the step instance match candidate from at least one of the one or more other extraction entries.
摘要:
Provided are techniques for processing a query. The query is received, and the query is formed by one or more paths, where each path includes one or more steps. A hierarchical document is received that includes one or more document nodes. While processing the query and traversing the hierarchical document to find document nodes described by at least one of the one or more steps of the query, a match graph is constructed that includes one or more match nodes. Each of the match nodes identifies a step instance and is associated with step instances that are ancestors and descendants of the identified step instance. Also, each of the match nodes is associated with a level. In addition, the match graph includes zero or more edges between the match nodes indicating relationships between the match nodes. The match nodes in the match graph are traversed from lower levels to higher levels to construct results for the query.
摘要:
A system and method to facilitate classification and storage of events in a network are described. An event and associated content information are received from an entity over a network. The content information is further analyzed to determine one or more themes representing subject matter related to the content information. The event is further classified according to the themes into one or more corresponding categories. Finally, the event is stored into one or more corresponding databases of a data storage module according to the one or more corresponding categories.
摘要:
A system and method of generating bid values for sponsored search includes steps or acts of: receiving a bid phrase for an advertisement for an item, wherein the bid phrase specifies a search query for which the advertisement should be displayed; receiving first information at a first input/output interface, the first information related to a bidding behavior of the advertiser; receiving second information at a second input/output interface, the second information relating to a history of bids by other advertisers for the bid phrase; and generating a bid value for the bid phrase submitted for the advertisement for the search query, based on the information received.
摘要:
Automatic generation of bid phrases for online advertising comprising storing a computer code representation of a landing page for use with a language model and a translation model (with a parallel corpus) to produce a set of candidate bid phrases that probabilistically correspond to the landing page, and/or to web search phrases. Operations include extracting a set of raw candidate bid phrases from a landing page, generating a set of translated candidate bid phrases using a parallel corpus in conjunction with the raw candidate bid phrases. In order to score and/or reduce the number of candidate bid phrases, a translation table is used to capture the probability that a bid phrase from the raw bid phrases is generated from a bid phrase from the set of translated candidate bid phrases. Scoring and ranking operations reduce the translated candidate bid phrases to just those most relevant to the landing page inputs.
摘要:
Described are a system and method for determining an event occurrence rate. A sample set of content items may be obtained. Each of the content items may be associated with at least one region in a hierarchical data structure. A first impression volume may be determined for the at least one region as a function of a number of impressions registered for the content items associated with the at least one region. A scale factor may be applied to the first impression volume to generate a second impression volume. The scale factor may be selected so that the second impression volume is within a predefined range of a third impression volume. A click-through-rate (CTR) may be estimated as a function of the second impression volume and a number of clicks on the content item.
摘要:
Provided is a method for modeling the cost of XML as well as relational operators. As with traditional relational cost estimation, a set of system catalog statistics that summarizes the XML data is exploited; however, the novel use of a set of simple path statistics is also proposed. A new statistical learning technique called transform regression is utilized instead of detailed analytical models to predict the overall cost of an operator. Additionally, a query optimizer in a database is enabled to be self-tuning, automatically adapting to changes over time in the query workload and in the system environment.
摘要:
Provided are techniques for processing a query. A query is received, wherein the query is formed by one or more paths, and wherein each path includes one or more steps. A hierarchical document including one or more document nodes is received. While processing the query and traversing the hierarchical document, one or more extraction entries are constructed, wherein each extraction entry includes a step instance match candidate identifying a document node and a step instance ancestor path for the document node, and one or more tuples are constructed using the one or more extraction entries by associating the step instance match candidate from one of the one or more extraction entries with the step instance match candidate from at least one of the one or more other extraction entries.
摘要:
A system, method, and computer program product for updating a partitioned index of a dataset. A document is indexed by separating it into indexable sections, such that different ones of the indexable sections may be contained in different partitions of the partitioned index. The partitioned index is updated using an updated version of the document by updating only those sections of the index corresponding to sections of the document that have been updated in the updated version.
摘要:
A system and method to facilitate mapping and storage of data within one or more data taxonomies are described. Content information is received over a network. The content information is further analyzed to determine at least one theme representing subject matter related to the content information. Finally, the content information is stored within respective predetermined categories organized within at least one taxonomy, the predetermined categories being associated with the at least one theme.