Abstract:
Distance-based framing includes obtaining at least a video stream during an online conference session. The video stream, an audio stream received with the video stream, or both the video stream and the audio stream are analyzed and a framing that either focuses on a speaker in the video stream or provides an overview of participants in the video stream, the framing being is composed based on the analyzing. A potential error in the framing is detected based on further analysis of at least one of the video stream, the audio stream, or distance sensor data received with the video stream. The potential error may be contradicted or confirmed based on an amount of motion when the framing focuses on the speaker. If the distance sensor data contradicts the potential error, the framing is maintained, but if the distance sensor data confirms the potential error, a new framing is generated.
Abstract:
In one embodiment, a video conference endpoint may detect a one or more participants within a field of view of a camera of the video conference endpoint. The video conference endpoint may determine one or more alternative framings of an output of the camera of the video conference endpoint based on the detected one or more participants. The video conference endpoint may send the output of the camera of the video conference endpoint to one or more far-end video conference endpoints participating in a video conference with the video conference endpoint. The video conference endpoint may send data descriptive of the one or more alternative framings of the output of the camera to the far-end video conference endpoints. The far-end video conference endpoints may utilize the data to display one of the one or more alternative framings.
Abstract:
In one embodiment, a video conference endpoint may detect a one or more participants within a field of view of a camera of the video conference endpoint. The video conference endpoint may determine one or more alternative framings of an output of the camera of the video conference endpoint based on the detected one or more participants. The video conference endpoint may send the output of the camera of the video conference endpoint to one or more far-end video conference endpoints participating in a video conference with the video conference endpoint. The video conference endpoint may send data descriptive of the one or more alternative framings of the output of the camera to the far-end video conference endpoints. The far-end video conference endpoints may utilize the data to display one of the one or more alternative framings.
Abstract:
The present disclosure provides methods and systems related to automatic adjustment of screen brightness for optimized presentation to both physically present and remote audience during the multimedia collaboration session. In one aspect, a method includes detecting presence of a screen in the field of view of a camera in a meeting room; determining if exposure of the camera or brightness of the screen is to be adjusted, to yield a determination; and controlling at least one of the exposure of the camera or the brightness of the screen based on the determination such that viewing of meeting room and the screen are legible for one or more audience and the screen is legible for one or more audience present in the meeting room.
Abstract:
In one embodiment, a video conference endpoint may detect a one or more participants within a field of view of a camera of the video conference endpoint. The video conference endpoint may determine one or more alternative framings of an output of the camera of the video conference endpoint based on the detected one or more participants. The video conference endpoint may send the output of the camera of the video conference endpoint to one or more far-end video conference endpoints participating in a video conference with the video conference endpoint. The video conference endpoint may send data descriptive of the one or more alternative framings of the output of the camera to the far-end video conference endpoints. The far-end video conference endpoints may utilize the data to display one of the one or more alternative framings.
Abstract:
A video conference system may include two or more video conference endpoints, each having a display configured to display content. The video conference system may detect a plurality of participants within a field of view of a camera of the system. The video conference system may determine an attention score for each endpoint based on the participants. The video conference system may determine whether the content of the first endpoint and/or the content of the second endpoint are active content based on whether the attention scores exceed a predetermined threshold value. The video conference system may send to secondary video conference systems an indication of the active content to enable the secondary video conference systems to display the active content.
Abstract:
A video conference endpoint includes one or more cameras to capture video of different views and a microphone array to sense audio. One or more closeup views are defined. The endpoint detects faces in the captured video and active audio sources from the sensed audio. The endpoint detects any active talker having detected face positions that coincide with detected active audio sources, and also uses speaker clustering to detect whether any active talker is associated with a previously stored closeup views. Based on whether an active talker is detected in any of the stored closeup views, the endpoint switches between capturing video of one of the closeup views and a best overview of the participants in the conference room.
Abstract:
A video conference endpoint includes a camera to capture video and a microphone array to sense audio. One or more preset views are defined. Images in the captured video are processed with a face detection algorithm to detect faces. Active talkers are detected from the sensed audio. The camera is controlled to capture video from the preset views, and from dynamic views created without user input and which include a dynamic overview and a dynamic close-up view. The camera is controlled to dynamically adjust each of the dynamic views to track changing positions of detected faces over time, and dynamically switch the camera between the preset views, the dynamic overview, and the dynamic close-up view over time based on positions of the detected faces and the detected active talkers relative to the preset views and the dynamic views.
Abstract:
A video conference endpoint detects faces at associated face positions in video frames capturing a scene. The endpoint frames the video frames to a view of the scene encompassing all of the detected faces. The endpoint detects that a previously detected face is no longer detected. In response, a timeout period is started and independently of detecting faces, motion is detected across the view. It is determined if any detected motion (i) coincides with the face position of the previously detected face that is no longer detected, and (ii) occurs before the timeout period expires. If conditions (i) and (ii) are not both met, the endpoint reframes the view.
Abstract:
A loudspeaker transmits an ultrasonic signal into a spatial region. A microphone transduces ultrasonic sound, including an echo of the transmitted ultrasonic signal, received from the spatial region into a received ultrasonic signal. A controller transforms the ultrasonic signal and the received ultrasonic signal into respective time-frequency domains that cover respective ultrasound frequency ranges. The controller computes an error signal, representative of an estimate of an echo-free received ultrasonic signal, based on the transformed ultrasonic signal and the transformed received ultrasonic signal. The controller computes power estimates of the error signal over time, and detects a change in people presence in the spatial region based on a change in the power estimates of the error signal over time.