专利检索 ap:("EVERNOTE CORP") AND inv:"COARNA GABRIEL ALEXANDRU" 第 1 页

1.

发明公开
EXTRACTING PRINCIPAL CONTENT FROM WEB PAGES 审中-公开
标题翻译： WEBSITES的提取基本内涵

公开(公告)号：EP2776945A4

公开(公告)日：2015-05-27

申请号：EP12847034

申请日：2012-11-07

申请人： EVERNOTE CORP

发明人： BIGNERT JAKOB , COARNA GABRIEL ALEXANDRU

IPC分类号： G06F17/30

CPC分类号： G06F17/3089 , G06F17/30707

摘要： Extracting principal content from Web pages includes identifying and classifying items on the Web page, building a list of candidates, calculating candidate scores, selecting a top score candidate, performing clean up processing for the top score candidate, and performing final page processing for the top score candidate. Candidate scores may vary according to a number of paragraphs and images grouped according to size. A word length of CJK (Chinese-Japanese-Korean) text may be determined according to punctuation therein. Candidate scores may be modified according to a number of containers and pieces and wherein a container is a Web page element that is associated with tags ‘body’, ‘div’, ‘td’, ‘li’, ‘article/section’ and pieces are candidates that do not include other candidates. Candidate scores may be modified according to a number of ratios corresponding to text and link density.