-
公开(公告)号:US09443250B1
公开(公告)日:2016-09-13
申请号:US13859530
申请日:2013-04-09
Applicant: Google Inc.
Inventor: Fei Xiao , Cristos Goodrow
CPC classification number: G06Q30/0201 , G06F17/30864 , G06Q30/02 , G06Q30/06 , G06Q30/0641
Abstract: A learning module of an information retrieval system is configured to automatically learn distinctive characteristics used by different web sites when presenting data variables of interest. The learned information can then be used to identify data variables of interest on arbitrary web pages of the web sites. In one embodiment, the learning process is guided by feeds provided by the web sites that list values for data variables of interest, and by web pages also provided by the web sites. The values of the feeds enable the learning module to identify candidate portions of the web pages that may represent a data variable of interest. Weights are computed for different values of various properties of the candidate portions, aggregated over all the analyzed pages, and used to identify one of the candidate portions as being the best candidates.