摘要:
Systems and method are disclosed for enabling a user to interact with gestures in a natural way with image(s) displayed on the surface of an integrated monitor whose display contents are governed by an appliance, perhaps a PC, smart phone or tablet. Some embodiments include the display as well as the appliance, in a single package such as all-in-one computers. User interaction includes gestures that may occur within a three-dimensional hover zone spaced apart from the display surface.
摘要:
Systems and methods for natural interaction with graphical user interfaces using gestural and vocal input in accordance with embodiments of the invention are disclosed. In one embodiment, a method for interpreting a command sequence that includes a gesture and a voice cue to issue an application command includes receiving image data, receiving an audio signal, selecting an application command from a command dictionary based upon a gesture identified using the image data, a voice cue identified using the audio signal, and metadata describing combinations of a gesture and a voice cue that form a command sequence corresponding to an application command, retrieving a list of processes running on an operating system, selecting at least one process based upon the selected application command and the metadata, where the metadata also includes information identifying at least one process targeted by the application command, and issuing an application command to the selected process.
摘要:
Systems and methods for tracking human hands by performing parts based template matching using images captured from multiple viewpoints are described. One embodiment includes a processor, a reference camera, an alternate view camera, and memory containing: a hand tracking application; and a plurality of edge feature templates that are rotated and scaled versions of a finger template that includes an edge features template. In addition, the hand tracking application configures the processor to: detect at least one candidate finger in a reference frame, where each candidate finger is a grouping of pixels identified by searching the reference frame for a grouping of pixels that have image gradient orientations that match one of the plurality of edge feature templates; and verify the correct detection of a candidate finger in the reference frame by locating a grouping of pixels in an alternate view frame that correspond to the candidate finger.
摘要:
Systems and methods in accordance with embodiments of the invention implement three-dimensional (3D) gesture based graphical user interfaces (GUI) using gesture reactive interface objects. One embodiment includes using a computing device to render an initial user interface comprising a set of interface objects, detect a targeting 3D gesture in captured image data that identifies a targeted interface object within the user interface, change the rendering of at least the targeted interface object within the user interface in response to the targeting 3D gesture that targets the interface object, detect an interaction 3D gesture in additional captured image data that identifies a specific interaction with a targeted interface object, modify the user interface in response to the interaction with the targeted interface object identified by the interaction 3D gesture, and render the modified user interface.
摘要:
User interaction with a display is detected using at least two cameras whose intersecting FOVs define a three-dimensional hover zone within which user interactions can be imaged. Each camera substantially simultaneously acquires from its vantage point two-dimensional images of the user within the hover zone. Separately and collectively the image data is analyzed to identify therein a relatively few landmarks definable on the user. A substantially unambiguous correspondence is established between the same landmark on each acquired image, and as to those landmarks a three-dimensional reconstruction is made in a common coordinate system. This landmark identification and position information can be converted into a command causing the display to respond appropriately to a gesture made by the user. Advantageously size of the hover zone can far exceed size of the display, making the invention usable with smart phones as well as large size entertainment TVs.
摘要:
User interaction with a display is detected substantially simultaneously using at least two cameras whose intersecting FOVs define a three-dimensional hover zone within which user interactions can be imaged. Separately and collectively image data is analyzed to identify a relatively few user landmarks. A substantially unambiguous correspondence is established between the same landmark on each acquired image, and a three-dimensional reconstruction is made in a common coordinate system. Preferably cameras are modeled to have characteristics of pinhole cameras, enabling rectified epipolar geometric analysis to facilitate more rapid disambiguation among potential landmark points. Consequently processing overhead is substantially reduced, as are latency times. Landmark identification and position information is convertible into a command causing the display to respond appropriately to a user gesture. Advantageously size of the hover zone can far exceed size of the display, making the invention usable with smart phones as well as large size entertainment TVs.
摘要:
Systems and methods for initializing motion tracking of human hands are disclosed. One embodiment includes a processor; a reference camera; and memory containing: a hand tracking application; and a plurality of edge feature templates that are rotated and scaled versions of a base template. The hand tracking application configures the processor to: determine whether any pixels in a frame of video are part of a human hand, where a part of a human hand is identified by searching the frame of video data for a grouping of pixels that have image gradient orientations that match the edge features of one of the plurality of edge feature templates; track the motion of the part of the human hand visible in a sequence of frames of video; confirm that the tracked motion corresponds to an initialization gesture; and commence tracking the human hand as part of a gesture based interactive session.
摘要:
Systems and methods for tracking human hands by performing parts based template matching using images captured from multiple viewpoints are described. One embodiment of the invention includes a processor, a reference camera, an alternate view camera, and memory containing: a hand tracking application; and a plurality of edge feature templates that are rotated and scaled versions of a finger template that includes an edge features template. In addition, the hand tracking application configures the processor to: detect at least one candidate finger in a reference frame, where each candidate finger is a grouping of pixels identified by searching the reference frame for a grouping of pixels that have image gradient orientations that match one of the plurality of edge feature templates; and verify the correct detection of a candidate finger in the reference frame by locating a grouping of pixels in an alternate view frame that correspond to the candidate finger.
摘要:
Systems and methods for initializing motion tracking of human hands within bounded regions are disclosed. One embodiment includes: a processor; reference and alternate view cameras; and memory containing a plurality of templates that are rotated and scaled versions of a base template. In addition, a hand tracking application configures the processor to: obtain reference and alternate view frames of video data; generate a depth map; identify at least one bounded region within the reference frame of video data containing pixels having distances from the reference camera that are within a specific range of distances; determine whether any of the pixels within the at least one bounded region are part of a human hand; track the motion of the part of the human hand in a sequence of frames of video data obtained from the reference camera; and confirm that the tracked motion corresponds to a predetermined initialization gesture.
摘要:
Natural three-dimensional (xw,yw,zw,tw) gesture player interaction with a two-dimensional game application rendered on a two or three dimensional display includes mapping acquired (xw,yw,zw,tw) gesture data to virtual game-world (xv,yv,zv,tv) coordinates or vice versa, and scaling if needed. The game application is caused to render at least one image on the display responsive to the mapped and scaled (xw,yw,zw) data, where the display and game interaction is rendered from the player's perception viewpoint. The (xw,yw,zw) data preferably is acquired using spaced-apart two-dimensional cameras coupled to software to reduce the acquired images to a relatively small number of landmark points, from which player gestures may be recognized. The invention may be implemented in a handheld device such as a smart phone or tablet, which device may include a gyroscope and/or accelerometer.