摘要:
A system and method for forming a segmented speech signal from an input speech signal having a plurality of frames. The input speech signal is converted from a time domain signal to a frequency domain signal having a plurality of speech frames, wherein each speech frame in the frequency domain signal is represented by at least one spectral value associated with the speech frame. A spectral difference value is then determined for each pair of adjacent frames in the frequency domain signal, wherein the spectral difference value for each pair of adjacent frames is representative of a difference between the at least one spectral value associated with each frame in the pair of adjacent frames. An initial cluster boundary is set between each pair of adjacent frames in the frequency domain signal, and a variance value is assigned to each cluster in the frequency domain signal, wherein the variance value for each cluster is equal to one of the determined spectral difference values. Next, a plurality of cluster merge parameters is calculated, wherein each of the cluster merge parameters is associated with a pair of adjacent clusters in the frequency domain signal. A minimum cluster merge parameter is selected from the plurality of cluster merge parameters. A merged cluster is then formed by canceling a cluster boundary between the clusters associated with the minimum merge parameter and assigning a merged variance value to the merged cluster, wherein the merged variance value is representative of the variance values assigned to the clusters associated with the minimum merge parameter. The process is repeated in order to form a plurality of merged clusters, and the segmented speech signal is formed in accordance with the plurality of merged clusters.
摘要:
The example techniques of this disclosure are directed to generating a stereoscopic view from an application designed to generate a mono view. For example, the techniques may modify source code of a vertex shader to cause the modified vertex shader, when executed, to generate graphics content for the images of the stereoscopic view. As another example, the techniques may modify a command that defines a viewport for the mono view to commands that define the viewports for the images of the stereoscopic view.
摘要:
This disclosure describes a multi-stage tessellation technique for tessellating a curve during graphics rendering. In particular, a first tessellation stage tessellates the curve into a first set of line segments that each represents a portion of the curve. A second tessellation stage further tessellates the portion of the curve represented by each of the line segments of the first set into additional line segments that more finely represent the shape of the curve. In this manner, each portion of the curve that was represented by only one line segment after the first tessellation stage is represented by more than one line segment after the second tessellation stage. In some instances, more than two tessellation stages may be performed to tessellate the curve.
摘要:
The speech recognition training unit is modified to store digitized speech samples into a speech database that can be accessed at recognition time. The improved recognition unit comprises a noise analysis, modeling, and synthesis unit which continually analyzes the noise characteristics present in the audio environment and produces an estimated noise signal with similar characteristics. The recognition unit then constructs a noise-compensated template database by adding the estimated noise signal to each of the speech samples in the speech database and performing parameter determination on the resulting sums. This procedure accounts for the presence of noise in the recognition phase by retraining all the templates using an estimated noise signal with similar characteristics as the actual noise signal that corrupted the word to be recognized. This method improves the likelihood of a good template match, which increases the recognition accuracy.
摘要:
The present disclosure provides systems, methods and apparatus, including computer programs encoded on computer storage media, for providing virtual keyboards. In one aspect, a system includes a camera, a display, a video feature extraction module and a gesture pattern matching module. The camera captures a sequence of images containing a finger of a user, and the display displays each image combined with a virtual keyboard having a plurality of virtual keys. The video feature extraction module detects motion of the finger in the sequence of images relative to virtual sensors of the virtual keys, and determines sensor actuation data based on the detected motion relative to the virtual sensors. The gesture pattern matching module uses the sensor actuation data to recognize a gesture.
摘要:
This disclosure describes techniques for modifying application program interface (API) calls in a manner that can cause a device to render native three dimensional (3D) graphics content in stereoscopic 3D. The techniques of this disclosure can be implemented in a manner where API calls themselves are modified, but the API itself and the GPU hardware are not modified. The techniques of the present disclosure include using the same viewing frustum defined by the original content to generate a left-eye image and a right-eye image and shifting the viewport offset of the left-eye image and the right-eye image.
摘要:
An apparatus and method for displaying content is disclosed. A particular method includes determining a viewing orientation of a user relative to a display and providing a portion of content to the display based on the viewing orientation. The portion includes at least a first viewable element of the content and does not include at least one second viewable element of the content. The method also includes determining an updated viewing orientation of the user and updating the portion of the content based on the updated viewing orientation. The updated portion includes at least the second viewable element. A display difference between the portion and the updated portion is non-linearly related to an orientation difference between the viewing orientation and the updated viewing orientation.
摘要:
A method and device for performing and processing user-defined clipping in object space to reduce the number of computations needed for the clipping operation. The method and device also combine the modelview transformation of the vertex coordinates with projection transform. The user-defined clipping in object space provides a higher performance and less power consumption by avoiding generation of eye coordinates if there is no lighting. The device includes a driver for the user-defined clipping in the object space to perform dual mode user-defined clipping in object space when a lighting function is disabled and in eye space when the lighting function is enabled.
摘要:
A system and method of video processing are disclosed. In a particular implementation, a device includes a processor configured to generate index data for video content. The index data includes a summary frame and metadata. The summary frame is associated with a portion of the video content and illustrates multiple representations of an object included in the portion of the video content. The metadata includes marker data that indicates a playback position of the video content. The playback position is associated with the summary frame. The device also includes a memory configured to store the index data.
摘要:
An apparatus and method for displaying content is disclosed. A particular method includes determining a viewing orientation of a user relative to a display and providing a portion of content to the display based on the viewing orientation. The portion includes at least a first viewable element of the content and does not include at least one second viewable element of the content. The method also includes determining an updated viewing orientation of the user and updating the portion of the content based on the updated viewing orientation. The updated portion includes at least the second viewable element. A display difference between the portion and the updated portion is non-linearly related to an orientation difference between the viewing orientation and the updated viewing orientation.