摘要:
A system that includes a head mounted display device and a processing unit connected to the head mounted display device is used to fuse virtual content into real content. In one embodiment, the processing unit is in communication with a hub computing device. The system creates a volumetric model of a space, segments the model into objects, identifies one or more of the objects including a first object, and displays a virtual image over the first object on a display (of the head mounted display) that allows actual direct viewing of at least a portion of the space through the display.
摘要:
Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
摘要:
Techniques are provided for synchronization of sensor signals between devices. One or more of the devices may collect sensor data. The device may create a sensor signal from the sensor data, which it may make available to other devices upon a publisher/subscriber model. The other devices may subscribe to sensor signals they choose. A device could be a provider or a consumer of the sensor signals. A device may have a layer of code between an operating system and software applications that processes the data for the applications. The processing may include such actions as synchronizing the data in a sensor signal to a local time clock, predicting future values for data in a sensor signal, and providing data samples for a sensor signal at a frequency that an application requests, among other actions.
摘要:
Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
摘要:
A media server in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of resource information regarding resources to rendering devices. In one case, the resource sharing service consults a criterion to determine whether an identified network device is authorized to receive resource information. In another case, the resource sharing service consults another criterion to determine whether a specified individual associated with the media server must consent to the transfer of the resource information in order for the transfer to occur. The resource information may include resource metadata that describes high level information regarding resources, as well as resource content. The media server includes various user interface presentations that allow the media server user to specify shared resources and distribution criteria.
摘要:
A media server in a Universal Plug and Play (UPnP) network includes a resource sharing service to govern the distribution of media resource information to rendering devices. The media server includes: a media service module operating in a clamped down user context (e.g., a local service user context) and configured to share resource information over the network; a supplemental module operating in a local system user context and configured to assist the media service module in sharing resource information over the network; and a control panel module operating in a logged on user context and configured to interact with a user via a user interface display. The local system user context provides a higher level of access to media server resources compared to the clamped down user context. The media server also provides fast user switching (FUS) functionality that allows multiple users to have respective instances of the control panel module pending at the same time. Further, the media server includes a mechanism to prevent rogue applications from masquerading as the control panel module and thereby gaining unauthorized access to the media service module.
摘要:
A UPnP network provides a flexible technique for retrieving a resource content item from a media server using a parameterized uniform resource locator (URL). In operation, the media server sends a control point a parameterized URL in response to a consumer's browse or search request. The URL includes at least one parameter that specifies a characteristic attribute of the resource content item, which determines the manner in which the resource content item can be presented. For example, the parameter can describe a format type of the resource content item, a format resolution of the resource content item, and/or other property of the resource content item. The control point can modify a value associated with the parameter to produce a modified URL. This modified URL is submitted to the media server, whereupon the media server locates the resource content item and converts it to the characteristic state specified by the modified URL (if conversion is needed). The media server then provides the located (and potentially converted) resource content item to a rendering device for presentation thereat.
摘要:
A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for localizing the mobile device and rendering virtual objects from a point of view of the mobile device. This distributed approach provides an efficient way for supporting mapping and localization processes for a large number of mobile devices, which are typically constrained by form factor and battery life limitations.
摘要:
A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal.
摘要:
An audio/visual system (e.g., such as an entertainment console or other computing device) plays a base audio track, such as a portion of a pre-recorded song or notes from one or more instruments. Using a depth camera or other sensor, the system automatically detects that a user (or a portion of the user) enters a first collision volume of a plurality of collision volumes. Each collision volume of the plurality of collision volumes is associated with a different audio stem. In one example, an audio stem is a sound from a subset of instruments playing a song, a portion of a vocal track for a song, or notes from one or more instruments. In response to automatically detecting that the user (or a portion of the user) entered the first collision volume, the appropriate audio stem associated with the first collision volume is added to the base audio track or removed from the base audio track.