-
公开(公告)号:US11714954B1
公开(公告)日:2023-08-01
申请号:US17119465
申请日:2020-12-11
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Vijay Daniel Manason , Sathya Prakash Podila Venkata Subramanya , Ansar Pasha , Meghana Agrawal , Mandar Subhashrao Joshi , Shrikant G Nayak , Sandeep Bhaskar , Antonisamy Arokiasamy , Navin Anand
IPC: G06F40/137 , G06F40/143 , G06F16/901 , G06F16/958 , G06F40/30
CPC classification number: G06F40/137 , G06F16/9024 , G06F16/986 , G06F40/143 , G06F40/30
Abstract: A webpage containing information to be extracted may undergo changes to a layout of elements that present the information. These changes could result in an inability to retrieve the information later. A first graph is determined that represents elements of a first version of a webpage at a first time. An element in the first graph for which information is being acquired is specified. A relevant portion of the first graph is designated that includes the element and immediate neighbors in the first graph. Later, a second version of the webpage is retrieved, and a second graph of that second version is determined. The relevant portion of the first graph is compared to the second graph. If a match is found, the information of interest is extracted from the specified element of the second graph. This allows extraction of information to proceed even if the layout of elements changes.