摘要:
Disclosed herein are methods and systems for identifying background in video data using geometric primitives. One embodiment takes the form of a process that includes obtaining video data depicting at least a portion of a user. The process also includes detecting at least one geometric primitive within the video data. The at least one detected geometric primitive is a type of geometric primitive included in a set of geometric-primitive models. The process also includes identifying a respective region within the video data associated with each detected geometric primitive. The process also includes classifying each respective region as background of the video data.
摘要:
Disclosed herein are methods and systems for classifying pixels as foreground using both short-range depth data and long-range depth data. One embodiment takes the form of a process that includes obtaining video data depicting at least a portion of a user. The process also includes obtaining short-range depth data associated with the video data. The process also includes obtaining long-range depth data associated with the video data. The video data, short-range depth data, and long-range depth data may be obtained via a single 3-D video camera. The process also includes classifying pixels of the video data as foreground based at least in part on both the short-range depth data and the long-range depth data. In some embodiments, classifying pixels of the video data as foreground comprises employing an alpha mask. The alpha mask may comprise binary foreground (hard) indicators. The alpha mask may comprise foreground-likelihood (soft) indicators.
摘要:
Disclosed herein are methods and systems for assigning pixels distance-cost values using a flood fill technique. One embodiment takes the form of a process that includes obtaining video data depicting a head of a user, obtaining depth data associated with the video data, and selecting seed pixels for a flood fill at least in part by using the depth information. The process also includes performing the flood fill from the selected seed pixels. The flood fill assigns respective distance-cost values to pixels of the video data based on position-space cost values and color-space cost values. In some embodiments, the process also includes classifying pixels of the video data as foreground based at least in part on the assigned distance-cost values. In some other embodiments, the process also includes assigning pixels of the video data foreground-likelihood values based at least in part on the assigned distance-cost values.
摘要:
Systems and methods relate receiving video streams captured of a subject by video cameras, each video stream including video frames that are time-synchronized with the video, each video camera having a known vantage point in a predetermined coordinate system; obtaining at least one three-dimensional (3D) mesh of the subject, the mesh being time-synchronized and including a plurality of mesh vertices with known locations; identifying a user-selected viewpoint, and identifying a viewpoint-specific subset of the mesh vertices visible; generating 3D submeshes of the subject by calculating visible-vertices lists from the vantage point of each video camera from which the viewpoint-specific subset of mesh vertices is visible; projecting mesh vertices from the calculated visible-vertices lists on to video pixels; and rendering viewpoint-adaptive 3D personas of the subject by weighting video pixel colors from different video-camera vantage points according to the geometric relationship of each video-camera vantage point to the user-selected viewpoint.
摘要:
Methods and systems for real-time user extraction using deep learning networks. In one embodiment, user extraction comprises obtaining a given frame of color pixel data, checking whether a reset flag is cleared or set, and generating a trimap for the given frame. If the reset flag is cleared, generating the trimap comprises: obtaining a user-extraction contour based on a preceding frame; and generating the trimap based on the obtained user-extraction contour. If the reset flag is set, generating the trimap comprises: detecting at least one persona feature in the given frame; generating an alpha mask by aligning an intermediate contour with the detected persona feature(s), wherein the intermediate contour is based on a color-based flood-fill operation performed on a previous frame which was segmented by a machine-learning-segmentation process; and generating the trimap based on the generated alpha mask. The generated trimap is output for extracting a user persona.
摘要:
Disclosed herein are methods and systems for assigning pixels distance-cost values using a flood fill technique. One embodiment takes the form of a process that includes obtaining video data depicting a head of a user, obtaining depth data associated with the video data, and selecting seed pixels for a flood fill at least in part by using the depth information. The process also includes performing the flood fill from the selected seed pixels. The flood fill assigns respective distance-cost values to pixels of the video data based on position-space cost values and color-space cost values. In some embodiments, the process also includes classifying pixels of the video data as foreground based at least in part on the assigned distance-cost values. In some other embodiments, the process also includes assigning pixels of the video data foreground-likelihood values based at least in part on the assigned distance-cost values.
摘要:
Disclosed herein are methods and systems for assigning pixels distance-cost values using a flood fill technique. One embodiment takes the form of a process that includes obtaining video data depicting a head of a user, obtaining depth data associated with the video data, and selecting seed pixels for a flood fill at least in part by using the depth information. The process also includes performing the flood fill from the selected seed pixels. The flood fill assigns respective distance-cost values to pixels of the video data based on position-space cost values and color-space cost values. In some embodiments, the process also includes classifying pixels of the video data as foreground based at least in part on the assigned distance-cost values. In some other embodiments, the process also includes assigning pixels of the video data foreground-likelihood values based at least in part on the assigned distance-cost values.
摘要:
Disclosed herein are methods and systems for classifying pixels as foreground using both short-range depth data and long-range depth data. One embodiment takes the form of a process that includes obtaining video data depicting at least a portion of a user. The process also includes obtaining short-range depth data associated with the video data. The process also includes obtaining long-range depth data associated with the video data. The video data, short-range depth data, and long-range depth data may be obtained via a single 3-D video camera. The process also includes classifying pixels of the video data as foreground based at least in part on both the short-range depth data and the long-range depth data. In some embodiments, classifying pixels of the video data as foreground comprises employing an alpha mask. The alpha mask may comprise binary foreground (hard) indicators. The alpha mask may comprise foreground-likelihood (soft) indicators.
摘要:
Systems and methods relate to encoded video streams including geometric-data streams transmitted to a receiver for rendering of a viewpoint-adaptive 3D persona. A method includes obtaining at least one triangle-based three-dimensional (3D) submesh of a subject, wherein the obtained triangle-based 3D submesh includes a plurality of submesh vertices that define a plurality of submesh triangles, identifying a plurality of strips of the submesh triangles, generating triangle-strip data representing the identified strips of submesh triangles, generating compressed-submesh data that includes the triangle-strip data, and transmitting the compressed-submesh data to a receiver for reconstruction of the triangle-based 3D submesh of the subject.
摘要:
Disclosed herein are methods and systems for assigning pixels distance-cost values using a flood fill technique. One embodiment takes the form of a process that includes obtaining video data depicting a head of a user, obtaining depth data associated with the video data, and selecting seed pixels for a flood fill at least in part by using the depth information. The process also includes performing the flood fill from the selected seed pixels. The flood fill assigns respective distance-cost values to pixels of the video data based on position-space cost values and color-space cost values. In some embodiments, the process also includes classifying pixels of the video data as foreground based at least in part on the assigned distance-cost values. In some other embodiments, the process also includes assigning pixels of the video data foreground-likelihood values based at least in part on the assigned distance-cost values.