Abstract:
A computer-implemented method including at a server in communication with at least first and second collaboration endpoints each located within a same physical space: determining a relative positioning of the first and second collaboration endpoints; and configuring content displayed at each of the first and second endpoints based on the relative positioning of the first and second collaboration endpoints is disclosed.
Abstract:
A camera system for a video conference endpoint includes a fixed wide lens camera providing a view of a space, a first fixed camera providing a view of a first portion of the space, a second fixed camera providing a view of a second portion of the space, a third fixed camera providing a view of a third portion of the space, and a processor operatively coupled to each of the cameras. Each of the cameras is configured to produce a video signal and the processor is configured to receive the video signals and select a relevant video signal from the video signals. The processor is also configured to process the relevant video signal by digitally panning, tilting, and zooming of the relevant video signal to generate a video stream from the processed video signal.
Abstract:
In one embodiment, a method is provided to intelligently frame groups of participants in a meeting. This gives a more pleasing experience with fewer switches, better contextual understanding, and more natural framing, as would be seen in a video production made by a human director. Furthermore, in accordance with another embodiment, conversational framing techniques are provided. During speaker tracking, when two local participants are addressing each other, a method is provided to show a close-up framing showing both participants. By evaluating the direction participants are looking and a speaker history, it is determined if there is a local discussion going on, and an appropriate framing is selected to give far-end participants the most contextually rich experience.
Abstract:
In one embodiment a system and method is described, the system and method including a first camera, which, when activated, captures a first video of a first field of view (FOV) a first display spatially associated with the first camera, the first display for displaying video received from a remote site when the first camera is activated, a second camera, which, when activated, captures a second video of a second FOV, a second display spatially associated with the second camera, the second display for displaying video received from the remote site when the second camera is activated, and a processor which controls the first camera, the second camera, the first display, the second display, and a triggering mechanism, wherein the triggering mechanism activates the first camera to capture video in the first FOV, identifies over time if a mode change occurs and upon identifying the mode change, deactivates the first camera and the first display and activates the second camera and the second display. Related apparatus, systems and methods are also described.
Abstract:
A camera system for a video conference endpoint includes a fixed wide lens camera providing a view of a space, a first fixed camera providing a view of a first portion of the space, a second fixed camera providing a view of a second portion of the space, a third fixed camera providing a view of a third portion of the space, and a processor operatively coupled to each of the cameras. Each of the cameras is configured to produce a video signal and the processor is configured to receive the video signals and select a relevant video signal from the video signals. The processor is also configured to process the relevant video signal by digitally panning, tilting, and zooming of the relevant video signal to generate a video stream from the processed video signal.
Abstract:
In one embodiment a system and method is described, the system and method including a first camera, which, when activated, captures a first video of a first field of view (FOV) a first display spatially associated with the first camera, the first display for displaying video received from a remote site when the first camera is activated, a second camera, which, when activated, captures a second video of a second FOV, a second display spatially associated with the second camera, the second display for displaying video received from the remote site when the second camera is activated, and a processor which controls the first camera, the second camera, the first display, the second display, and a triggering mechanism, wherein the triggering mechanism activates the first camera to capture video in the first FOV, identifies over time if a mode change occurs and upon identifying the mode change, deactivates the first camera and the first display and activates the second camera and the second display. Related apparatus, systems and methods are also described.
Abstract:
A video conference endpoint detects faces at associated face positions in video frames capturing a scene. The endpoint frames the video frames to a view of the scene encompassing all of the detected faces. The endpoint detects that a previously detected face is no longer detected. In response, a timeout period is started and independently of detecting faces, motion is detected across the view. It is determined if any detected motion (i) coincides with the face position of the previously detected face that is no longer detected, and (ii) occurs before the timeout period expires. If conditions (i) and (ii) are met, the endpoint restarts the timeout period and repeats the independently detecting motion and the determining. Otherwise, the endpoint reframes the view to encompass the remaining detected faces.
Abstract:
Distance-based framing includes obtaining at least a video stream during an online conference session. The video stream, an audio stream received with the video stream, or both the video stream and the audio stream are analyzed and a framing that either focuses on a speaker in the video stream or provides an overview of participants in the video stream, the framing being is composed based on the analyzing. A potential error in the framing is detected based on further analysis of the video stream, the audio stream, and an amount of motion in the room. If the distance sensor data contradicts the potential error, the framing is maintained, but if the distance sensor data confirms the potential error, a new framing is generated.
Abstract:
Presented herein are techniques for cropping video streams to create an optimized layout in which participants of a meeting are a similar size. A user device receives a plurality of video streams, each video stream including at least one face of a participant participating in a video communication session. Faces in one or more of the plurality of video streams are cropped so that faces in the plurality of video streams are approximately equal in size, to produce a plurality of processed video streams. The plurality of processed video streams are sorted according to video stream widths to produce sorted video streams and the plurality of sorted video streams are distributed for display across a smallest number of rows possible on a display of the user device.
Abstract:
A method, computer system, and computer program product are provided for virtual background replacement during a video communication session. A frame comprising a user image and a captured background is obtained from a video capture device as part of a video stream. The frame is processed to replace the captured background with a virtual background. A change to an exposure characteristic of the video capture device is determined. In response to determining the change to the exposure characteristic, the virtual background is modified to match a virtual background brightness with a user image brightness to produce a modified virtual background. The frame is modified by combining the user image with the modified virtual background.