摘要:
A method for producing 3D multi-view visual contents including capturing a visual scene from at least one first point of view for generating a first bidimensional image of the scene and a corresponding first depth map indicative of a distance of different parts of the scene from the first point of view. The method further includes capturing the visual scene from at least one second point of view for generating a second bidimensional image; processing the first bidimensional image to derive at least one predicted second bidimensional image predicting the visual scene captured from the at least one second point of view; deriving at least one predicted second depth map predictive of a distance of different parts of the scene from the at least one second point of view by processing the first depth map, the at least one predicted second bidimensional image and the second bidimensional image.
摘要:
Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.
摘要:
A filter such as a Wiener filter for noise reduction in a signal, such as a speech signal, affected by background noise includes a circuit for determining values of an update function relating new value of estimated noise power to a previous value of estimated noise power, the update function being a function of said previous estimated noise power and a mean input power spectral density. The circuit includes a look-up table having values for the update function stored therein with the previous value of estimated noise power and the mean input power spectral density as a first and a second search entry, respectively. These search entries are entered via an input module and exploited by search circuitry associated with the look-up table for selectively searching values for the update function in the look-up table. The search is preferably carried out on the basis of an index computed starting from said first and second search entries.
摘要:
A method for compressing data, the data being represented by an input vector having Q features, wherein Q is an integer higher than 1, including the steps of 1) providing a vector codebook of sub-sets of indexed Q-feature reference vectors and threshold values associated with the sub-sets for a prefixed feature; 2) identifying a sub-set of reference vectors among the sub-sets by progressively comparing the value of a feature of the input vector which corresponds to the prefixed feature, with the threshold values associated with the sub-sets; and 3) identifying the reference vector which, within the sub-set identified in step 2), provides the lowest distortion with respect to the input vector.
摘要:
An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal. The decision model to be used by the emotional state decider for determining the emotional state of the speaker is selectable based on a context of use of the recognition system.
摘要:
A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.
摘要:
A method for compressing data, the data being represented by an input vector having Q features, wherein Q is an integer higher than 1, including the steps of 1) providing a vector codebook of sub-sets of indexed Q-feature reference vectors and threshold values associated with the sub-sets for a prefixed feature; 2) identifying a sub-set of reference vectors among the sub-sets by progressively comparing the value of a feature of the input vector which corresponds to the prefixed feature, with the threshold values associated with the sub-sets; and 3) identifying the reference vector which, within the sub-set identified in step 2), provides the lowest distortion with respect to the input vector.
摘要:
A device for maintaining fine alignment between an incoming spread spectrum signal and a locally generated code in a digital communication receiver comprises:—delay line (56) for storing a plurality of consecutive samples (E−1, E, M, L, L+1) of the incoming spread spectrum signal;—three digitally controlled interpolators (24, 26, 28) for determining by interpolation between consecutive samples an interpolated early sample (e), an interpolated middle sample (m) and an interpolated late sample (1);—two correlators (30, 32) for calculating an error signal (ξ) as the difference between the energy of the symbols computed from the interpolated early (e) and late (1) samples;—a circuit for generating a control signal (SOUT?) for controlling the interpolation phase of the digitally controlled interpolator (24) for the early sample (e), and—a digital non-linear filter (68), for smoothing the control signal (SOUT?) of the interpolator (24) for the early sample (e), enabling the update operation of the control signal only when the absolute value (|ξ(n)|) of the error signal at a time instant n is smaller than the absolute value (|ξ(n−1)|) of the same error signal at a time instant n−1.
摘要:
An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal. The decision model to be used by the emotional state decider for determining the emotional state of the speaker is selectable based on a context of use of the recognition system.
摘要:
Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.