摘要:
A method and apparatus for encoding color information in monochrome (black and white) documents. The present invention includes a method and apparatus to produce color documents that interprets color information stored on a document containing an image in monochrome and generates an image in color based on the color information.
摘要:
A facial feature extraction method and apparatus uses the variation in light intensity (gray-scale) of a frontal view of a speaker's face. The sequence of video images are sampled and quantized into a regular array of 150.times.150 pixels that naturally form a coordinate system of scan lines and pixel position along a scan line. Left and right eye areas and a mouth are located by thresholding the pixel gray-scale and finding the centroids of the three areas. The line segment joining the eye area centroids is bisected at right angle to form an axis of symmetry. A straight line through the centroid of the mouth area that is at right angle to the axis of symmetry constitutes the mouth line. Pixels along the mouth line and the axis of symmetry in the vicinity of the mouth area form a horizontal and vertical gray-scale profile, respectively. The profiles could be used as feature vectors but it is more efficient to select peaks and valleys (maximas and minimas) of the profile that correspond to the important physiological speech features such as lower and upper lip, mouth corner, and mouth area positions and pixel values and their time derivatives as visual vector components. Time derivatives are estimated by pixel position and value changes between video image frames. A speech recognition system uses the visual feature vector in combination with a concomitant acoustic vector as inputs to a time-delay neural network.
摘要:
A speaker recognition method uses visual image representations of mouth movements associated with the generation of an acoustic utterance by a speaker that is the person to be recognized. No acoustic data is used and normal ambient lighting conditions are used. The method generates a spatiotemporal gray-level function representative of the spatiotemporal inner month area confined between the lips during the utterance from which a cue-block is generated that isolates the essential information from which a feature vector for recognition is generated. The feature vector includes utterance duration, maximum lip-to-lip separation, and location in time, or speed of lip movement opening, speed of lip movement closure, and a spatiotemporal area measure representative of the area enclosed between the lips during the utterance and representative of the frontal area of the oral cavity during the utterance. Experimental data shows distinct clustering in feature space for different speakers.
摘要:
A conversion method and apparatus that allows for converting a hardcopy document into a hyperdocument and vice versa. During hardcopy to hyperdocument conversion, hypertext information stored on the hardcopy document is used to set up links to other documents. During hyperdocument to hardcopy document conversion, hypertext link information is encoded and stored on the hardcopy document.
摘要:
A document processing system in which a server subsystem stores information corresponding to a document containing human readable and machine readable information and a client subsystem receives the document and interprets the machine readable information. The client subsystem contacts the server to verify validity of information in the document using a communications network that allows information to be exchanged between the server and the client.