摘要:
A method, apparatus and computer program product may be provided for generating a plurality of compressed feature descriptors that can be represented by a relatively small number of bits, thereby facilitating transmission and storage of the feature descriptors. A method, apparatus and computer program product may also be provided for permitting a compressed representation of a feature descriptor to be compared with a plurality of compressed representations of feature descriptors of respective predefined features. By permitting the comparison to be performed utilizing compressed representations of feature descriptors, a respective feature descriptor may be identified without having to first decompress the feature descriptor, thereby potentially increasing the efficiency with which feature descriptors may be identified.
摘要:
Various methods for tracking and recognition with rotation invariant feature descriptors are provided. One example method includes generating an image pyramid of an image frame, detecting a plurality of interest points within the image pyramid, and extracting feature descriptors for each respective interest point. According to some example embodiments, the feature descriptors are rotation invariant. Further, the example method may also include tracking movement by matching the feature descriptors to feature descriptors of a previous frame and performing recognition of an object within the image frame based on the feature descriptors. Related example methods and example apparatuses are also provided.
摘要:
Various methods for tracking and recognition with rotation invariant feature descriptors are provided. One example method includes generating an image pyramid of an image frame, detecting a plurality of interest points within the image pyramid, and extracting feature descriptors for each respective interest point. According to some example embodiments, the feature descriptors are rotation invariant. Further, the example method may also include tracking movement by matching the feature descriptors to feature descriptors of a previous frame and performing recognition of an object within the image frame based on the feature descriptors. Related example methods and example apparatuses are also provided.
摘要:
A method, apparatus and computer program product may be provided for generating a plurality of compressed feature descriptors that can be represented by a relatively small number of bits, thereby facilitating transmission and storage of the feature descriptors. A method, apparatus and computer program product may also be provided for permitting a compressed representation of a feature descriptor to be compared with a plurality of compressed representations of feature descriptors of respective predefined features. By permitting the comparison to be performed utilizing compressed representations of feature descriptors, a respective feature descriptor may be identified without having to first decompress the feature descriptor, thereby potentially increasing the efficiency with which feature descriptors may be identified.
摘要:
Systems and methods are described that can provide users with personalized video content feeds. In several embodiments, a multi-modal segmentation process is utilized that relies upon cues derived from video, audio and/or text data present in a video data stream. In a number of embodiments, video streams from a variety of sources are segmented. Links are identified between video segments and between video segments and online articles containing additional information relevant to the video segments. The additional information obtained by linking a video segment to an additional source of data can be utilized in the generation of personalized playlists. In the context of news programming, the dynamic mixing and aggregation of news videos from multiple sources can greatly enrich the news watching experience. In several embodiments, processes for linking video segments to additional sources of data can be implemented as part of a video search engine service.
摘要:
A system and method for performing echo suppression on a server in browser-based online audio conferences without downloading or installing software on a participant's computing device is disclosed. Streams of audio communication data from the participants in an audio conference are received at the server. An echo suppression application determines the first party that speaks by analyzing the streams to locate speech data, and assigns that party as the “owner” of the audio channel. The speech data is sent to the other participants in the conference. The application then determines whether newly received audio from the owner of the channel is new speech; if so, then the party remains the owner of the channel, and the new speech data is also sent to the other parties in the conference. The channel is surrendered if no new speech is received from the owner in a defined period, and the next party that speaks becomes the new owner of the channel. The other audio data from the participants is replaced by silence.
摘要:
A circuit arrangement for controlling audio signal transmissions for a communications system that includes a microphone and a video camera. The arrangement comprises a video processor configured and arranged to receive a video signal from the video camera, detect movement of an object in the video signal, and provide a motion-indicating signal indicating movement relative to the object. An audio processor is coupled to the video processor and is configured and arranged to modify the audio signal to be transmitted responsive to the motion-indicating signal. In another embodiment, a video signal processor is configured and arranged to receive a video signal from the video camera, detect mouth movement of a person and provide a mouth-movement signal indicative of movement of the person's mouth. An echo-cancellation circuit is coupled to the video signal processor and configured and arranged to filter from an audio signal provided by the microphone sound energy output by the speaker responsive to the mouth-movement signal.
摘要:
A wireless user-interface method and arrangement having a light source and a circuit arrangement that detects the presence and position of the light source. A light source emits modulated light having a first modulation frequency that is captured by a camera circuit arrangement. A circuit arrangement uses relative pixel luminances and differences in pixel luminances between frames to detect the presence of modulated light from the light source. The position of the light source is tracked from frame to frame and the position information is output for use by application software or circuitry to direct movement of a pointer in a computer display, for example. Modulated light having a second modulation frequency from the light source is detected by the circuit arrangement and interpreted as selection of a control function, which is provided as output to application software or circuitry. The application uses the position of the light source along with control signals to identify an operation to perform.
摘要:
Systems and methods are described that can provide users with personalized video content feeds. In several embodiments, a multi-modal segmentation process is utilized that relies upon cues derived from video, audio and/or text data present in a video data stream. In a number of embodiments, video streams from a variety of sources are segmented. Links are identified between video segments and between video segments and online articles containing additional information relevant to the video segments. The additional information obtained by linking a video segment to an additional source of data can be utilized in the generation of personalized playlists. In the context of news programming, the dynamic mixing and aggregation of news videos from multiple sources can greatly enrich the news watching experience. In several embodiments, processes for linking video segments to additional sources of data can be implemented as part of a video search engine service.
摘要:
Systems and methods are described that can provide users with personalized video content feeds. In several embodiments, a multi-modal segmentation process is utilized that relies upon cues derived from video, audio and/or text data present in a video data stream. In a number of embodiments, video streams from a variety of sources are segmented. Links are identified between video segments and between video segments and online articles containing additional information relevant to the video segments. The additional information obtained by linking a video segment to an additional source of data can be utilized in the generation of personalized playlists. In the context of news programming, the dynamic mixing and aggregation of news videos from multiple sources can greatly enrich the news watching experience. In several embodiments, processes for linking video segments to additional sources of data can be implemented as part of a video search engine service.