专利检索 ap:"Paul A. Viola" 第 1 页

1.

发明申请
IDENTIFICATION OF PEOPLE USING MULTIPLE TYPES OF INPUT 有权
标题翻译：使用多种输入类型识别人

公开(公告)号：US20110313766A1

公开(公告)日：2011-12-22

申请号：US13221640

申请日：2011-08-30

申请人： Cha Zhang , Paul A. Viola , Pei Yin , Ross G. Cutler , Xinding Sun , Yong Rui

发明人： Cha Zhang , Paul A. Viola , Pei Yin , Ross G. Cutler , Xinding Sun , Yong Rui

IPC分类号： G10L17/00

CPC分类号： G06K9/6256 , G06K9/4614 , G10L25/78 , G10L2021/02166 , H04N7/147 , H04N7/15 , H04N21/42203 , H04N21/4223 , H04N21/4394 , H04N21/44008 , H04N21/44213 , H04N21/4788

摘要： Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

摘要翻译： 公开了以自动方式检测人或扬声器的系统和方法。可以识别包括多于一种类型的输入（例如音频输入和视频输入）的功能池，并与学习算法一起使用以生成识别人或扬声器的分类器。可以评估所得分类器以检测人或扬声器。

2.

发明授权
Spatial recognition and grouping of text and graphics 失效
标题翻译：文本和图形的空间识别和分组

公开(公告)号：US07729538B2

公开(公告)日：2010-06-01

申请号：US10927452

申请日：2004-08-26

申请人： Michael Shilman , Paul A. Viola , Kumar H. Chellapilla

发明人： Michael Shilman , Paul A. Viola , Kumar H. Chellapilla

IPC分类号： G06K9/00

CPC分类号： G06K9/726 , G06K9/00402 , G06K9/344 , G06K9/4614 , G06K2209/01

摘要： The present invention leverages spatial relationships to provide a systematic means to recognize text and/or graphics. This allows augmentation of a sketched shape with its symbolic meaning, enabling numerous features including smart editing, beautification, and interactive simulation of visual languages. The spatial recognition method obtains a search-based optimization over a large space of possible groupings from simultaneously grouped and recognized sketched shapes. The optimization utilizes a classifier that assigns a class label to a collection of strokes. The overall grouping optimization assumes the properties of the classifier so that if the classifier is scale and rotation invariant the optimization will be as well. Instances of the present invention employ a variant of AdaBoost to facilitate in recognizing/classifying symbols. Instances of the present invention employ dynamic programming and/or A-star search to perform optimization. The present invention applies to both hand-sketched shapes and printed handwritten text, and even heterogeneous mixtures of the two.

摘要翻译： 本发明利用空间关系来提供识别文本和/或图形的系统手段。这允许以其符号意义来增加草图形状，实现许多功能，包括智能编辑，美化和视觉语言的交互式模拟。空间识别方法从同时分组和识别的草图形状的可能分组的大空间中获得基于搜索的优化。优化利用了将类标签分配给笔画集合的分类器。整体分组优化假设分类器的属性，以便如果分类器是缩放和旋转不变量，则优化将同样如此。本发明的实施例采用AdaBoost的变体来促进识别/分类符号。本发明的实施例采用动态规划和/或A星搜索来执行优化。本发明适用于手绘形状和印刷手写文本，甚至适用于两者的异构混合物。

3.

发明授权
Application of grammatical parsing to visual recognition tasks 有权
标题翻译：语法解析在视觉识别任务中的应用

公开(公告)号：US07639881B2

公开(公告)日：2009-12-29

申请号：US11151708

申请日：2005-06-13

申请人： Paul A. Viola , Michael Shilman

发明人： Paul A. Viola , Michael Shilman

IPC分类号： G06K9/00 , G06K9/34 , G06K9/62 , G06K9/72

CPC分类号： G06K9/726 , G06F17/271 , G06K2209/01

摘要： Image recognition is utilized to facilitate in scoring parse trees for two-dimensional recognition tasks. Trees and subtrees are rendered as images and then utilized to determine parsing scores. Other instances of the subject invention can incorporate additional features such as stroke curvature and/or nearby white space as rendered images as well. Geometric constraints can also be employed to increase performance of a parsing process, substantially improving parsing speed, some even resolvable in polynomial time. Additional performance enhancements can be achieved in yet other instances of the subject invention by employing constellations of integral images and/or integral images of document features.

摘要翻译： 图像识别用于方便得分解树进行二维识别任务。树和子树被渲染为图像，然后用于确定解析分数。本发明的其它实例可以将诸如笔画曲率和/或附近的空白区域的附加特征也作为渲染图像。也可以采用几何约束来提高解析过程的性能，大大提高解析速度，甚至可以在多项式时间内解析。通过使用整体图像的星座和/或文档特征的整体图像，可以在本发明的其它实例中实现附加的性能增强。

4.

发明申请
Image Organization 有权
标题翻译：图像组织

公开(公告)号：US20080226174A1

公开(公告)日：2008-09-18

申请号：US11725129

申请日：2007-03-15

申请人： Gang Hua , Steven M. Drucker , Michael Revow , Paul A. Viola , Richard Zemel

发明人： Gang Hua , Steven M. Drucker , Michael Revow , Paul A. Viola , Richard Zemel

IPC分类号： G06K9/68 , G06K9/46

CPC分类号： G06K9/00228 , G06K9/6251

摘要： A system for organizing images includes an extraction component that extracts visual information (e.g., faces, scenes, etc.) from the images. The extracted visual information is provided to a comparison component which computes similarity confidence data between the extracted visual information. The similarity confidence data is an indication of the likelihood that items of extracted visual information are similar. The comparison component then generates a visual distribution of the extracted visual information based upon the similarity confidence data. The visual distribution can include groupings of the extracted visual information based on computed similarity confidence data. For example, the visual distribution can be a two-dimensional layout of faces organized based on the computed similarity confidence data—with faces in closer proximity faces computed to have a greater probability of representing the same person. The visual distribution can then be utilized by a user to sort, organize and/or tag images.

摘要翻译： 用于组织图像的系统包括从图像中提取视觉信息（例如，面部，场景等）的提取组件。提取的视觉信息被提供给计算提取的视觉信息之间的相似性置信度数据的比较部件。相似性置信度数据是提取的视觉信息的项目相似的可能性的指示。然后，比较组件基于相似性置信度数据生成所提取的视觉信息的视觉分布。视觉分布可以包括基于计算的相似性置信度数据提取的视觉信息的分组。例如，视觉分布可以是基于所计算的相似性置信度数据组织的面部的二维布局，其中更接近的面中的面被计算为具有更大的代表同一人的概率。然后用户可以利用视觉分布来对图像进行分类，组织和/或标记。

5.

发明授权
Methods and apparatus for populating electronic forms from scanned documents 有权
标题翻译：从扫描文件填充电子表格的方法和装置

公开(公告)号：US07305129B2

公开(公告)日：2007-12-04

申请号：US10808194

申请日：2004-03-24

申请人： Kumar H. Chellapilla , Cormac E. Herley , Trausti T. Kristjansson , Paul A. Viola

发明人： Kumar H. Chellapilla , Cormac E. Herley , Trausti T. Kristjansson , Paul A. Viola

IPC分类号： G06K9/46

CPC分类号： G06K9/033 , G06F17/243 , G06K9/00449 , G06K9/2054 , G06K2209/01

摘要： A computer-implemented method and apparatus are provided for populating an electronic form from an electronic image. The method and apparatus identify a size, orientation and position of an object within the electronic image, and identify information elements from pixels within the image that correspond to the object. Fields of the electronic form are displayed to a user along with the identified information elements through a graphical user interface. The information elements are parsed into tagged groups of different information types. At least some of the fields of the electronic form are populated with the tagged groups to produce a populated form. The user is allowed to edit the populated fields through the graphical user interface.

摘要翻译： 提供了一种用于从电子图像填充电子表格的计算机实现的方法和装置。该方法和装置识别电子图像内的对象的大小，取向和位置，并且从图像中对应于对象的像素识别信息元素。通过图形用户界面，电子表格的字段与所标识的信息元素一起显示给用户。信息元素被分析成不同信息类型的标记组。电子表格的至少一些字段填充有标记的组以产生填充形式。允许用户通过图形用户界面编辑填充字段。

6.

发明授权
Method and system for providing an audio element cache in a customized personal radio broadcast 失效
标题翻译：用于在定制的个人无线电广播中提供音频元素高速缓存的方法和系统

公开(公告)号：US06985694B1

公开(公告)日：2006-01-10

申请号：US09656884

申请日：2000-09-07

申请人： Jeremy S. De Bonet , Paul A. Viola

发明人： Jeremy S. De Bonet , Paul A. Viola

IPC分类号： H04H1/00

CPC分类号： H04H20/40 , H04H60/66

摘要： An audio element cache is provided that is capable of caching audio elements for each user in a personal radio server system. In operation, customized radio content is provided to remote listeners in a personal radio server system by: storing a plurality of audio elements in a file server; retrieving a subset of the plurality of audio elements from the file server by predicting the content desired by a remote listener based on a user profile of the remote listener; storing the subset of the plurality of audio elements in an audio element cache; selecting audio elements to provide to a remote listener from the audio element cache; and transmitting the audio elements to the remote listener. In an embodiment, the plurality of audio elements are stored in the audio element cache when a remote listener logs-on the personal radio server system.

摘要翻译： 提供音频元素高速缓存，其能够在个人无线电服务器系统中缓存每个用户的音频元素。在操作中，通过以下方式将定制的无线电内容提供给个人无线电服务器系统中的远程收听者：将多个音频元素存储在文件服务器中; 基于所述远程侦听器的用户简档，通过预测远程侦听器所期望的内容来从所述文件服务器检索所述多个音频元素的子集; 将所述多个音频元素的子集存储在音频元素高速缓存中; 选择音频元素以从音频元素高速缓存提供给远程收听者; 并将音频元素发送到远程收听者。在一个实施例中，当远程侦听器登录在个人无线电服务器系统上时，多个音频元素被存储在音频元素高速缓存中。

7.

发明授权
Identification of people using multiple types of input 有权
标题翻译：识别使用多种输入的人

公开(公告)号：US08510110B2

公开(公告)日：2013-08-13

申请号：US13546153

申请日：2012-07-11

申请人： Cha Zhang , Paul A. Viola , Pei Yin , Ross G. Cutler , Xinding Sun , Yong Rui

发明人： Cha Zhang , Paul A. Viola , Pei Yin , Ross G. Cutler , Xinding Sun , Yong Rui

IPC分类号： G10L15/00

CPC分类号： G06K9/6256 , G06K9/4614 , G10L25/78 , G10L2021/02166 , H04N7/147 , H04N7/15 , H04N21/42203 , H04N21/4223 , H04N21/4394 , H04N21/44008 , H04N21/44213 , H04N21/4788

摘要： Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

摘要翻译： 公开了以自动方式检测人或扬声器的系统和方法。可以识别包括多于一种类型的输入（例如音频输入和视频输入）的功能池，并与学习算法一起使用以生成识别人或扬声器的分类器。可以评估所得分类器以检测人或扬声器。

8.

发明授权
Grammatical parsing of document visual structures 有权
标题翻译：文字视觉结构的语法解析

公开(公告)号：US08249344B2

公开(公告)日：2012-08-21

申请号：US11173280

申请日：2005-07-01

申请人： Paul A. Viola , Michael Shilman

发明人： Paul A. Viola , Michael Shilman

IPC分类号： G06K9/34 , G06K9/72

CPC分类号： G06K9/726 , G06F17/271 , G06K2209/01

摘要： A two-dimensional representation of a document is leveraged to extract a hierarchical structure that facilitates recognition of the document. The visual structure is grammatically parsed utilizing two-dimensional adaptations of statistical parsing algorithms. This allows recognition of layout structures (e.g., columns, authors, titles, footnotes, etc.) and the like such that structural components of the document can be accurately interpreted. Additional techniques can also be employed to facilitate document layout recognition. For example, grammatical parsing techniques that utilize machine learning, parse scoring based on image representations, boosting techniques, and/or “fast features” and the like can be employed to facilitate in document recognition.

摘要翻译： 利用文档的二维表示来提取便于识别文档的层次结构。使用统计解析算法的二维适应来语法解析视觉结构。这允许识别布局结构（例如，列，作者，标题，脚注等）等，使得可以准确地解释文档的结构组件。还可以采用附加技术来促进文档布局识别。例如，可以采用利用机器学习，基于图像表示的分析评分，增强技术和/或“快速特征”等的语法解析技术，以促进文档识别。

9.

发明授权
Face recognition using discriminatively trained orthogonal tensor projections 有权
标题翻译：使用区分训练正交张量投影的人脸识别

公开(公告)号：US07936906B2

公开(公告)日：2011-05-03

申请号：US11763909

申请日：2007-06-15

申请人： Gang Hua , Paul A Viola , Steven M. Drucker , Michael Revow

发明人： Gang Hua , Paul A Viola , Steven M. Drucker , Michael Revow

IPC分类号： G06K9/00

CPC分类号： G06K9/00288 , G06K9/6232

摘要： Systems and methods are described for face recognition using discriminatively trained orthogonal rank one tensor projections. In an exemplary system, images are treated as tensors, rather than as conventional vectors of pixels. During runtime, the system designs visual features—embodied as tensor projections—that minimize intraclass differences between instances of the same face while maximizing interclass differences between the face and faces of different people. Tensor projections are pursued sequentially over a training set of images and take the form of a rank one tensor, i.e., the outer product of a set of vectors. An exemplary technique ensures that the tensor projections are orthogonal to one another, thereby increasing ability to generalize and discriminate image features over conventional techniques. Orthogonality among tensor projections is maintained by iteratively solving an ortho-constrained eigenvalue problem in one dimension of a tensor while solving unconstrained eigenvalue problems in additional dimensions of the tensor.

摘要翻译： 使用区分训练的正交秩一张量投影描述用于人脸识别的系统和方法。在示例性系统中，图像被视为张量，而不是像传统的像素矢量。在运行期间，系统设计视觉特征 - 体现为张量投影 - 最大限度地减少不同人脸部和脸部之间的类间差异，从而最大限度地减少同一脸部实例之间的差异。张量投影在训练图像集上顺序追溯，并采取一级张量的形式，即一组向量的外积。示例性技术确保张量投影彼此正交，从而增加了与常规技术相比的概括和区分图像特征的能力。通过迭代求解张量的一维中的邻域约束特征值问题，同时解决张量的附加维度中的无约束特征值问题，维持张量投影中的正交性。

10.

发明授权
Systems and methods that facilitate improved display of electronic documents 有权
标题翻译：促进电子文档显示的系统和方法

公开(公告)号：US07661065B2

公开(公告)日：2010-02-09

申请号：US11135717

申请日：2005-05-24

申请人： Radoslav Petrov Nickolov , Kumar H. Chellapilla , David M. Bargeron , Patrice Y. Simard , Paul A. Viola

发明人： Radoslav Petrov Nickolov , Kumar H. Chellapilla , David M. Bargeron , Patrice Y. Simard , Paul A. Viola

IPC分类号： G06F17/00

CPC分类号： G06Q10/10

摘要： A computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document. An analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document. In accordance with one aspect of the subject invention, the viewing mode can be one of a conventional viewing mode and a viewing mode associated with enhanced readability.

摘要翻译： 计算机实现的文字处理系统包括接收与电子文档相关联的特征向量的接口组件。通信地耦合到接口组件的分析组件分析特征向量并且确定在其中显示电子文档的观看模式。根据本发明的一个方面，观看模式可以是与增强的可读性相关联的常规观看模式和观看模式之一。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类