Facial feature extraction method and apparatus for a neural network
acoustic and visual speech recognition system
    2.
    发明授权
    Facial feature extraction method and apparatus for a neural network acoustic and visual speech recognition system 失效
    用于神经网络声学和视觉语音识别系统的面部特征提取方法和装置

    公开(公告)号:US5680481A

    公开(公告)日:1997-10-21

    申请号:US488840

    申请日:1995-06-09

    摘要: A facial feature extraction method and apparatus uses the variation in light intensity (gray-scale) of a frontal view of a speaker's face. The sequence of video images are sampled and quantized into a regular array of 150.times.150 pixels that naturally form a coordinate system of scan lines and pixel position along a scan line. Left and right eye areas and a mouth are located by thresholding the pixel gray-scale and finding the centroids of the three areas. The line segment joining the eye area centroids is bisected at right angle to form an axis of symmetry. A straight line through the centroid of the mouth area that is at right angle to the axis of symmetry constitutes the mouth line. Pixels along the mouth line and the axis of symmetry in the vicinity of the mouth area form a horizontal and vertical gray-scale profile, respectively. The profiles could be used as feature vectors but it is more efficient to select peaks and valleys (maximas and minimas) of the profile that correspond to the important physiological speech features such as lower and upper lip, mouth corner, and mouth area positions and pixel values and their time derivatives as visual vector components. Time derivatives are estimated by pixel position and value changes between video image frames. A speech recognition system uses the visual feature vector in combination with a concomitant acoustic vector as inputs to a time-delay neural network.

    摘要翻译: 面部特征提取方法和装置使用说话者脸部正视图的光强度(灰度)的变化。 视频图像的序列被采样和量化为150×150像素的规则阵列,其自然地沿着扫描线形成扫描线和像素位置的坐标系。 通过对像素灰度进行阈值定位并找到三个区域的质心来定位左眼区域和右眼区域。 连接眼睛区域重心的线段以直角平分,形成对称轴。 通过与对称轴成直角的口区域的质心的直线构成口线。 沿嘴口的像素和口区附近的对称轴分别形成水平和垂直的灰度轮廓。 轮廓可以用作特征向量,但是更有效地选择对应于重要的生理语音特征(例如下唇和上唇,嘴角和嘴区域位置和像素)的轮廓的峰和谷(最大值和最小值) 值和它们的时间导数作为视觉矢量分量。 时间导数由视频图像帧之间的像素位置和值变化来估计。 语音识别系统使用视觉特征向量与伴随的声矢量相结合,作为时间延迟神经网络的输入。

    Speaker recognition using spatiotemporal cues
    3.
    发明授权
    Speaker recognition using spatiotemporal cues 失效
    演讲人识别使用时空线索

    公开(公告)号:US5625704A

    公开(公告)日:1997-04-29

    申请号:US336974

    申请日:1994-11-10

    摘要: A speaker recognition method uses visual image representations of mouth movements associated with the generation of an acoustic utterance by a speaker that is the person to be recognized. No acoustic data is used and normal ambient lighting conditions are used. The method generates a spatiotemporal gray-level function representative of the spatiotemporal inner month area confined between the lips during the utterance from which a cue-block is generated that isolates the essential information from which a feature vector for recognition is generated. The feature vector includes utterance duration, maximum lip-to-lip separation, and location in time, or speed of lip movement opening, speed of lip movement closure, and a spatiotemporal area measure representative of the area enclosed between the lips during the utterance and representative of the frontal area of the oral cavity during the utterance. Experimental data shows distinct clustering in feature space for different speakers.

    摘要翻译: 扬声器识别方法使用与被识别的人的扬声器相关联的与声音发音相关联的口部动作的视觉图像表示。 不使用声学数据,使用正常的环境照明条件。 该方法产生代表在产生提示块的话语期间限制在嘴唇之间的时空内月区域的时空灰度函数,其隔离生成用于识别的特征向量的基本信息。 特征向量包括话语持续时间,最大唇到唇分离,以及时间上的位置,或唇部运动开口的速度,唇部运动闭合的速度,以及代表在话语期间唇部之间包围的区域的时空面积度量,以及 在言语中代表口腔正面区域。 实验数据显示不同扬声器的特征空间中的不同聚类。

    Method and apparatus for document verification and tracking
    5.
    发明授权
    Method and apparatus for document verification and tracking 失效
    用于文件验证和跟踪的方法和装置

    公开(公告)号:US5671282A

    公开(公告)日:1997-09-23

    申请号:US376861

    申请日:1995-01-23

    CPC分类号: G07D7/00 G07F7/122

    摘要: A document processing system in which a server subsystem stores information corresponding to a document containing human readable and machine readable information and a client subsystem receives the document and interprets the machine readable information. The client subsystem contacts the server to verify validity of information in the document using a communications network that allows information to be exchanged between the server and the client.

    摘要翻译: 一种文档处理系统,其中服务器子系统存储对应于包含人类可读和机器可读信息的文档的信息,并且客户子系统接收文档并解释机器可读信息。 客户端子系统与服务器联系,以使用允许在服务器和客户端之间交换信息的通信网络来验证文档中信息的有效性。