摘要:
Systems and methods for the decision-theoretic control and optimization of crowd-sources workflows utilize a computing device to map a workflow to complete a directive. The directive includes a utility function, and the workflow comprises an ordered task set. Decision points precede and follow each task in the task set, and each decision point may require (a) posting a call for workers to complete instances of tasks in the task set; (b) adjusting parameters of tasks in the task set; or (c) submitting an artifact generated by a worker as output. The computing device accesses a plurality of workers having capability parameters that describe the workers' respective abilities to complete tasks. The computing device implements the workflow by optimizing and/or selecting user-preferred choices at decision points according to the utility function and submits an artifact as output. The computing device may also implement a training phase to ascertain worker capability parameters.
摘要:
A procedure is disclosed for automatically constructing wrappers for performing information-extraction from sites such as Internet resources that display relevant information, interspersed with extraneous text fragments, such as HTML formatting commands or advertisements. The procedure has three basic steps. First, a set of example pages are collected with a subroutine named GatherExamples. Gather Examples is provided with information describing how to pose example queries to the site whose wrapper is to be learned. Second, these example pages are labeled by a subroutine named LabelExamples—i.e., the information to be extracted from each example is identified for use in the third step. The LabelExamples subroutine uses a general framework for labeling pages using site-specific heuristics called recognizers, as well as allowing users to correct and modify the recognized instances. Finally, the labeled example pages are passed to a BuildWrapper subroutine, which constructs a wrapper.
摘要:
This invention provides assistance to a user in accessing network attached information sources. In one aspect, the invention is a method for intelligently routing a user query to information sources relevant to that query, extracting relevant data fields from received responses, and intelligently presenting the extracted data in order of estimated relevance. The system of this invention implements one or more steps of the method in a centralized or distributed manner on one or more network attached computers. Further, this invention provides a novel language and implementation that facilitates easily written and maintained descriptions of information source query and response formats.
摘要:
This invention provides methods to locate and plan the retrieval of data from networked information sources in response to a user query. The methods utilize descriptions of the information sources, the information domain of the sources, and of the query. The methods of this invention integrate both legacy systems and full relational databases with an efficient, domain-independent, query-planning algorithm, reason about the capabilities of different information sources, handle partial goal satisfaction i.e., gather as much data as possible when all that the user requested cannot be gathered, are both sound and complete, and are efficient.
摘要:
This invention provides assistance to a user in accessing network attached information sources. In one aspect, the invention is a method for intelligently routing a user query to information sources relevant to that query, extracting relevant data fields from received responses, and intelligently presenting the extracted data in order of estimated relevance. The system of this invention implements one or more steps of the method in a centralized or distributed manner on one or more network attached computers. Further, this invention provides a novel language and implementation that facilitates easily written and maintained descriptions of information source query and response formats.