摘要:
Architecture that scales up the non-negative matrix factorization (NMF) technique to a distributed NMF (denoted DNMF) to handle large matrices, for example, on a web scale that can include millions and billions of data points. To analyze web-scale data, DNMF is applied through parallelism on distributed computer clusters, for example, with thousands of machines. In order to maximize the parallelism and data locality, matrices are partitioned in the short dimension. The probabilistic DNMF can employ not only Gaussian and Poisson NMF techniques, but also exponential NMF for modeling web dyadic data (e.g., dwell time of a user on browsed web pages).
摘要:
Architecture that scales up the non-negative matrix factorization (NMF) technique to a distributed NMF (denoted DNMF) to handle large matrices, for example, on a web scale that can include millions and billions of data points. To analyze web-scale data, DNMF is applied through parallelism on distributed computer clusters, for example, with thousands of machines. In order to maximize the parallelism and data locality, matrices are partitioned in the short dimension. The probabilistic DNMF can employ not only Gaussian and Poisson NMF techniques, but also exponential NMF for modeling web dyadic data (e.g., dwell time of a user on browsed web pages).
摘要:
The claimed subject matter provides a system or method for web crawling hidden files. An exemplary method comprises loading a web page with a browser agent, and executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values. A list of form controls may be retrieved from the web page using the browser agent, and the controls may be analyzed using a driver component. Form control values may be sent from the driver component to the browser agent, and an event may be submitted to the web page by the browser agent or scripted content may be run to trigger operations on the web page corresponding to the form control values. A URL may be generated for various form control values using a generalizer.
摘要:
Techniques are described for generating a statistical model from observed click chains. The model can be used to compute a probability that a document is relevant to a given search query. With the model, a probability of a user examining a given document in a given search result conditionally depends on: a probability that a preceding document in the given search result is examined by a user viewing the given search result; a probability that the preceding document is clicked on by a user viewing the given search result, which conditionally depends directly on the probability that the preceding document is examined and on a probability of relevance of the preceding document.
摘要:
Techniques are described for generating a statistical model from observed click chains. The model can be used to compute a probability that a document is relevant to a given search query. With the model, a probability of a user examining a given document in a given search result conditionally depends on: a probability that a preceding document in the given search result is examined by a user viewing the given search result; a probability that the preceding document is clicked on by a user viewing the given search result, which conditionally depends directly on the probability that the preceding document is examined and on a probability of relevance of the preceding document.
摘要:
The claimed subject matter provides a system or method for web crawling hidden files. An exemplary method includes loading a web page with a browser agent, and executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values. A list of form controls may be retrieved from the web page using the browser agent, and the controls may be analyzed using a driver component. Form control values may be sent from the driver component to the browser agent, and an event may be submitted to the web page by the browser agent or scripted content may be run to trigger operations on the web page corresponding to the form control values. A URL may be generated for various form control values using a generalizer.
摘要:
Probabilistic gradient boosted machines are described herein. A probabilistic gradient boosted machine can be utilized to learn a function based at least in part upon sets of observations of a target attribute that is common across a plurality of entities and feature vectors that are representative of such entities. The sets of observations are assumed to accord to a distribution function in the exponential family. The learned function is utilized to generate values that are employed parameterize the distribution function, such that sets of observations can be predicted for different entities.
摘要:
A system is described herein that includes a preference deriver component that receives a predefined preference rule that indicates a hierarchy pertaining to entities belonging to a domain, wherein each of the entities has attributes and values for such attributes corresponding thereto, and wherein the preference deriver component outputs preferences between various subsets of entities based at least in part upon the preference rule. The system also includes a learning component that learns a computer-implemented ranker component that is configured to rank the entities belonging to the domain, wherein the learning component learns the computer-implemented ranker based at least in part upon the preferences between the various subsets of the entities output by the preference deriver component.
摘要:
A system is described herein that includes a preference deriver component that receives a predefined preference rule that indicates a hierarchy pertaining to entities belonging to a domain, wherein each of the entities has attributes and values for such attributes corresponding thereto, and wherein the preference deriver component outputs preferences between various subsets of entities based at least in part upon the preference rule. The system also includes a learning component that learns a computer-implemented ranker component that is configured to rank the entities belonging to the domain, wherein the learning component learns the computer-implemented ranker based at least in part upon the preferences between the various subsets of the entities output by the preference deriver component.
摘要:
The present invention relates to a method for displaying information associated with television program, which includes: fetching a plurality of sequentially arranged program listings and corresponding program notes; generating an electronic program guide according to the program listings and corresponding program notes; and displaying the electronic program guide. The electronic program guide includes a program listing, a program note associated with the program listing, and at least one icon indicating that the user would select to display the previous or the next program listing in the electronic program guide. The present invention further provides an electronic program guide and a processing apparatus for generating the electronic program guide. The electronic program guide can display information associated with TV program in a more intuitive manner.