Abstract:
In one embodiment, a video conference endpoint may detect a one or more participants within a field of view of a camera of the video conference endpoint. The video conference endpoint may determine one or more alternative framings of an output of the camera of the video conference endpoint based on the detected one or more participants. The video conference endpoint may send the output of the camera of the video conference endpoint to one or more far-end video conference endpoints participating in a video conference with the video conference endpoint. The video conference endpoint may send data descriptive of the one or more alternative framings of the output of the camera to the far-end video conference endpoints. The far-end video conference endpoints may utilize the data to display one of the one or more alternative framings.
Abstract:
In one embodiment, a method is provided to intelligently frame groups of participants in a meeting. This gives a more pleasing experience with fewer switches, better contextual understanding, and more natural framing, as would be seen in a video production made by a human director. Furthermore, in accordance with another embodiment, conversational framing techniques are provided. During speaker tracking, when two local participants are addressing each other, a method is provided to show a close-up framing showing both participants. By evaluating the direction participants are looking and a speaker history, it is determined if there is a local discussion going on, and an appropriate framing is selected to give far-end participants the most contextually rich experience.
Abstract:
In one embodiment, a method is provided to intelligently frame groups of participants in a meeting. This gives a more pleasing experience with fewer switches, better contextual understanding, and more natural framing, as would be seen in a video production made by a human director. Furthermore, in accordance with another embodiment, conversational framing techniques are provided. During speaker tracking, when two local participants are addressing each other, a method is provided to show a close-up framing showing both participants. By evaluating the direction participants are looking and a speaker history, it is determined if there is a local discussion going on, and an appropriate framing is selected to give far-end participants the most contextually rich experience.
Abstract:
A video conference endpoint includes a camera to capture video and a microphone array to sense audio. One or more preset views are defined. Images in the captured video are processed with a face detection algorithm to detect faces. Active talkers are detected from the sensed audio. The camera is controlled to capture video from the preset views, and from dynamic views created without user input and which include a dynamic overview and a dynamic close-up view. The camera is controlled to dynamically adjust each of the dynamic views to track changing positions of detected faces over time, and dynamically switch the camera between the preset views, the dynamic overview, and the dynamic close-up view over time based on positions of the detected faces and the detected active talkers relative to the preset views and the dynamic views.
Abstract:
A camera system for a video conference endpoint includes a fixed wide lens camera providing a view of a space, a first fixed camera providing a view of a first portion of the space, a second fixed camera providing a view of a second portion of the space, a third fixed camera providing a view of a third portion of the space, and a processor operatively coupled to each of the cameras. Each of the cameras is configured to produce a video signal and the processor is configured to receive the video signals and select a relevant video signal from the video signals. The processor is also configured to process the relevant video signal by digitally panning, tilting, and zooming of the relevant video signal to generate a video stream from the processed video signal.
Abstract:
Adaptively adjusting the layout of a video conference includes receiving, at a first video conference endpoint, a video stream from a second video conference endpoint. Activity in an environment of the first video conference endpoint or an environment of the second video conference endpoint is detected with at least one physical activity sensor. Based on the detected activity, a presentation of the video stream at the first video conference endpoint is dynamically adjusted.
Abstract:
At a video conference endpoint including a microphone array and a camera, different camera framings are established to frame different views of a talker based on different sets of pan, tilt, and focal length settings of the camera. Different video frames of the different views are captured using the different camera framings, respectively. A sound source direction of the talker relative to the microphone array in a fixed three-dimensional (3D) global coordinate system is determined for the different views based on sound from the talker detected by the microphone array. The sound source direction relative to the microphone array is converted to different sound source positions in planar coordinates relative to the different video frames based on the different sets of pan, tilt, and focal length settings, respectively. The different video frames, the sound, and the different sound source positions in planar coordinates are transmitted.
Abstract:
A video conference endpoint includes one or more cameras to capture video of different views and a microphone array to sense audio. One or more preset views are defined. The endpoint detects faces in the captured video and active audio sources from the sensed audio. The endpoint detects any active talker detected faces that coincide positionally with detected active audio sources, and also detects whether any active talker is in one of the preset views. Based on whether an active talker is detected in any of the preset views, the endpoint switches between capturing video of one of the preset views, and capturing video of a dynamic view.
Abstract:
In one embodiment, a method is provided for handling a call from a conferencing endpoint configured to handle a conference between multiple participants. A request to call a participant is received from the conferencing endpoint. Information is inferred about a presence of one or more participants in the call, based on a detection of the one or more participants by presence detection equipment associated with the conferencing endpoint;. Additional call context information is determined based on the inferred information. The additional call context information is provided to the participant in addition to the call, wherein the additional call context information is accessible to the participant.
Abstract:
Presented herein are techniques for cropping video streams to create an optimized layout in which participants of a meeting are a similar size. A user device receives a plurality of video streams, each video stream including at least one face of a participant participating in a video communication session. Faces in one or more of the plurality of video streams are cropped so that faces in the plurality of video streams are approximately equal in size, to produce a plurality of processed video streams. The plurality of processed video streams are sorted according to video stream widths to produce sorted video streams and the plurality of sorted video streams are distributed for display across a smallest number of rows possible on a display of the user device.