-
公开(公告)号:US20110289182A1
公开(公告)日:2011-11-24
申请号:US12783620
申请日:2010-05-20
申请人: Xiao Kong , Shouqiu Yu , Wei Wang , Jiang-Ming Yang , Rui Cai , Haifeng Li , Xiaosong Yang
发明人: Xiao Kong , Shouqiu Yu , Wei Wang , Jiang-Ming Yang , Rui Cai , Haifeng Li , Xiaosong Yang
CPC分类号: G06F17/30781 , G06F17/30864
摘要: A classifier may be integrated into a pipeline of a general web crawler. The classifier may classify crawled webpages as either video pages or non-video pages. Video pages and information regarding domain importance may be aggregated. Ones of the domains of the video pages may be selected based on domain importance rankings. Webpages of the selected domains may be randomly sampled. The sampled webpages may be structurally analyzed and hint information may be generated with respect to each of the selected domains. The hint information may guide a deep crawling operation for discovering all video pages within the selected domains. Video links within the video pages may be found, one or more videos may be downloaded, and one or more representations of the one or more videos may be indexed.
摘要翻译: 分类器可以集成到通用网络爬虫的管道中。 分类器可以将抓取的网页分类为视频页面或非视频页面。 视频页面和关于域重要性的信息可以被聚合。 可以基于域重要性排名来选择视频页面的域。 所选域的网页可以是随机抽样的。 可以对采样的网页进行结构分析,并且可以针对每个所选择的域生成提示信息。 提示信息可能会指导深入抓取操作,以发现所选域中的所有视频页面。 视频页面中的视频链接可以被发现,可以下载一个或多个视频,并且可以对一个或多个视频的一个或多个表示进行索引。
-
公开(公告)号:US08473574B2
公开(公告)日:2013-06-25
申请号:US12783620
申请日:2010-05-20
申请人: Xiao Kong , Shouqiu Yu , Wei Wang , Jiang-Ming Yang , Rui Cai , Haifeng Li , Xiaosong Yang
发明人: Xiao Kong , Shouqiu Yu , Wei Wang , Jiang-Ming Yang , Rui Cai , Haifeng Li , Xiaosong Yang
CPC分类号: G06F17/30781 , G06F17/30864
摘要: A classifier may be integrated into a pipeline of a general web crawler. The classifier may classify crawled webpages as either video pages or non-video pages. Video pages and information regarding domain importance may be aggregated. Ones of the domains of the video pages may be selected based on domain importance rankings. Webpages of the selected domains may be randomly sampled. The sampled webpages may be structurally analyzed and hint information may be generated with respect to each of the selected domains. The hint information may guide a deep crawling operation for discovering all video pages within the selected domains. Video links within the video pages may be found, one or more videos may be downloaded, and one or more representations of the one or more videos may be indexed.
摘要翻译: 分类器可以集成到通用网络爬虫的流水线中。 分类器可以将抓取的网页分类为视频页面或非视频页面。 视频页面和关于域重要性的信息可以被聚合。 可以基于域重要性排名来选择视频页面的域。 所选域的网页可以是随机抽样的。 可以对采样的网页进行结构分析,并且可以针对每个所选择的域生成提示信息。 提示信息可能会指导深入抓取操作,以发现所选域中的所有视频页面。 视频页面中的视频链接可以被发现,可以下载一个或多个视频,并且可以对一个或多个视频的一个或多个表示进行索引。
-
公开(公告)号:US20120330952A1
公开(公告)日:2012-12-27
申请号:US13166813
申请日:2011-06-23
申请人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
发明人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
IPC分类号: G06F17/30
CPC分类号: G06F17/3082 , G06F17/30864
摘要: Video entity templates defining common features that relate to various metadata types shared among a group of video Web pages are generated for target Web sites. Metadata associated with videos contained within Web pages belonging to a particular target Web site can then be automatically and accurately extracted using a video entity template generated for the particular target Web site. This metadata can then be indexed for use by video search applications in providing video search results.
摘要翻译: 为目标网站生成定义与一组视频网页之间共享的各种元数据类型相关的共同特征的视频实体模板。 然后可以使用为特定目标网站生成的视频实体模板自动准确地提取与属于特定目标网站的网页中包含的视频相关联的元数据。 然后,该元数据可以被索引以供视频搜索应用使用,以提供视频搜索结果。
-
公开(公告)号:US08645353B2
公开(公告)日:2014-02-04
申请号:US13166810
申请日:2011-06-23
申请人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
发明人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
IPC分类号: G06F17/30
CPC分类号: G06F17/30047 , G06F17/30864 , G06F17/30867
摘要: Anchor images and information associated therewith are accumulated during a Web crawling operation. One or more rules are applied to the accumulated candidate anchor images to filter out candidate anchor images that are not appropriate for use as the anchor image for a particular target video. The remaining candidate anchor image is then selected as the anchor image for the particular video.
摘要翻译: 锚网图像和与之相关的信息在Web爬行操作期间被累积。 一个或多个规则被应用于累积的候选锚图像以过滤不适合用作特定目标视频的锚图像的候选锚图像。 然后选择剩余的候选锚图像作为特定视频的定位图像。
-
公开(公告)号:US08645354B2
公开(公告)日:2014-02-04
申请号:US13166813
申请日:2011-06-23
申请人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
发明人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
IPC分类号: G06F17/30
CPC分类号: G06F17/3082 , G06F17/30864
摘要: Video entity templates defining common features that relate to various metadata types shared among a group of video Web pages are generated for target Web sites. Metadata associated with videos contained within Web pages belonging to a particular target Web site can then be automatically and accurately extracted using a video entity template generated for the particular target Web site. This metadata can then be indexed for use by video search applications in providing video search results.
摘要翻译: 为目标网站生成定义与一组视频网页之间共享的各种元数据类型相关的共同特征的视频实体模板。 然后可以使用为特定目标网站生成的视频实体模板自动准确地提取与属于特定目标网站的网页中包含的视频相关联的元数据。 然后,该元数据可以被索引以供视频搜索应用使用,以提供视频搜索结果。
-
公开(公告)号:US20120330922A1
公开(公告)日:2012-12-27
申请号:US13166810
申请日:2011-06-23
申请人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
发明人: Xiao Kong , Wei Wang , Rui Cai , Haifeng Li , Yanfeng Sun
IPC分类号: G06F17/30
CPC分类号: G06F17/30047 , G06F17/30864 , G06F17/30867
摘要: Anchor images and information associated therewith are accumulated during a Web crawling operation. One or more rules are applied to the accumulated candidate anchor images to filter out candidate anchor images that are not appropriate for use as the anchor image for a particular target video. The remaining candidate anchor image is then selected as the anchor image for the particular video.
摘要翻译: 锚网图像和与之相关的信息在Web爬行操作期间被累积。 一个或多个规则被应用于累积的候选锚图像以过滤不适合用作特定目标视频的锚图像的候选锚图像。 然后选择剩余的候选锚图像作为特定视频的定位图像。
-
公开(公告)号:US08606780B2
公开(公告)日:2013-12-10
申请号:US13179258
申请日:2011-07-08
申请人: Rui Hu , Xin-Jing Wang , Juan Xu , Xiao Kong
发明人: Rui Hu , Xin-Jing Wang , Juan Xu , Xiao Kong
CPC分类号: G06F17/30265
摘要: Search queries for images are received from users. An original order of responsive images to the query is determined. Duplicate images and words associated with the duplicate images are identified for each of the responsive images. Common words associated with the duplicate images are identified. The responsive images are annotated with the common words and an annotated order is determined. A re-ranked order is determined based on the original order and the annotated order. Responsive images are presented to the user in the re-ranked order.
摘要翻译: 从用户那里收到图像的搜索查询。 确定响应图像到查询的原始顺序。 针对每个响应图像识别与重复图像相关联的重复图像和词。 识别与重复图像相关联的常用词。 响应图像用公共字注释,并且确定注释顺序。 根据原始订单和注释顺序确定重新排序的订单。 以重新排序的顺序将响应性图像呈现给用户。
-
-
-
-
-
-