摘要:
A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception date, and rank the document with regard to at least one other document based, at least in part, on the score.
摘要:
A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score.
摘要:
A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranking the document with regard to at least one other document based, at least in part, on the score.
摘要:
A system may determine time-varying behavior of links pointing to a document, generate a score for the document based, at least in part, on the time-varying behavior of the links pointing to the document, and rank the document with regard to at least one other document based, at least in part, on the score.
摘要:
Communications between entities who may share common interests. For entities determined to be sharing common interests (e.g., searching using the same terms or topics, browsing a page, a site or a groups of topically related sites), options for communication among the entities are provided. For example, a chat room may be dynamically created for persons who are currently searching or browsing the same or related information. As another example, a “homepage” may be created for each query and contain various types of information related to the query. A permission module controls which entities may participate, what types of information (and from what sources) an entity can (or desires to) receive, what types of information the entity may (or desires to) share.
摘要:
A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.
摘要:
A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.
摘要:
A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values. The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data.
摘要:
Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
摘要:
Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.