摘要:
A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analysed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.
摘要:
A method and system for identifying personal context of auser having a portable mobile communication device at a particular location for deriving social interaction information of the user, wherein the user within a predefined range is identified using personal context of the user at the particular location and the identified personal context of the user is assigned with the confidence value. Further the current location information of the user within the particular location is obtained by fusing assigned confidence value. Further the proximity of the user in the current location is estimated by finding the accurate straight line distance between users. Further the two users having similar current location information at the particular location are grouped together with the predefined density criteria. Finally the social interaction information of the user is derived by multimodal sensor data fusion at the fusion engine and represented using a human network graph.
摘要:
Devices and methods are provided for non-invasive goal oriented and personalized monitoring of substance consumption directed towards aiding reduction of substance intake by a user. Based on the substance consumption characteristics and the user's profile, the user's substance consumption profile is identified and average amount of the substance in the body at a given time is computed. A threshold corresponding to amount of substance the body can sustain is then computed based on goals set by the user and the substance consumption characteristics and the user's profile. Alerts can be generated and transmitted to the user based on pre-determined conditions to help the user achieve his set goals.
摘要:
Text output of speech recognition engines tend to be erroneous when spoken data has domain specific terms. The present disclosure facilitates automatic correction of errors in speech to text conversion using abstractions of evolutionary development and artificial development. The words in a speech recognition engine text output are treated as a set of injured genes in a biological cell that need repair which are then repaired and form genotypes that are then repaired to phenotypes through a series of repair steps based on a matching, mapping and linguistic repair through a fitness criteria. A basic genetic level repair involves phonetic MATCHING function together with a FITNESS function to select the best among the matching genes. A second genetic level repair involves a contextual MAPPING function for repairing remaining 'injured' genes of the speech recognition engine output. Finally, a genotype to phenotype repair involves using linguistic rules and semantic rules of the domain.