-
公开(公告)号:US09773182B1
公开(公告)日:2017-09-26
申请号:US13614858
申请日:2012-09-13
CPC分类号: G06K9/3208 , G06F17/2745
摘要: A method and system for classifying document data is described. An exemplary method includes identifying a markup language document having a plurality of portions, determining a set of substantive content metrics and a set of noise metrics for each of the plurality of portions, calculating a noise-to-content ratio for each of the plurality of portions based on a corresponding set of substantive content metrics and a corresponding set of noise metrics, and removing noise from the markup language document using the noise-to-content ratio.
-
公开(公告)号:US09268858B1
公开(公告)日:2016-02-23
申请号:US13534699
申请日:2012-06-27
申请人: Sherif M. Yacoub , Dongmei Jia , Bernhard Wolkerstorfer , Nicholas Alan Tostenrude , Stephen Kang , Gerald J. Strode
发明人: Sherif M. Yacoub , Dongmei Jia , Bernhard Wolkerstorfer , Nicholas Alan Tostenrude , Stephen Kang , Gerald J. Strode
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06F17/30525 , G06F17/30554 , G06F17/30648 , G06F17/30699
摘要: Techniques are described for identifying potentially interesting portions of a content item to be provided as a preview of the content item for prospective purchasers, based on previously collected data associated with the content item. Portions of a content item may be identified as potentially interesting based on a number of annotations (e.g., highlights, bookmarks, notes, and shares) previously made by viewers of a digital version of the content item. Potentially interesting portions may also include portions which prior viewers spent more time viewing, portions related to identified interests of the potential buyer, portions that are identified as separable for particular categories of content, and/or portions that have been previously identified as associated with elements of the content such as character, plot, and/or keywords.
摘要翻译: 描述技术,用于基于先前收集的与内容项目相关联的数据来识别要作为预期购买者的内容项目的预览来提供的内容项目的潜在有趣部分。 基于先前由内容项的数字版本的观众做出的许多注释(例如,亮点,书签,注释和共享),内容项的部分可以被识别为潜在有趣的。 潜在有趣的部分还可以包括先前观看者花费更多时间观看的部分,与潜在买家的识别的兴趣有关的部分,被识别为针对特定类别的内容可分离的部分,和/或先前已被识别为与元素相关联的部分 的内容,如角色,情节和/或关键字。
-