Abstract:
To enable, on a receiving side, processing obtaining predetermined information to be performed easily and appropriately in a case the predetermined information is divided into a predetermined number of audio frames and transmitted. The predetermined information is inserted into an audio compressed data stream. The audio compressed data stream into which the predetermined information is inserted is transmitted. It is possible to insert each of the pieces of divided information obtained by dividing the predetermined information into the predetermined number of audio frames of the audio compressed data stream. Information indicating the overall size of the predetermined information is added to a first piece of divided information. It is possible to ensure space for storing the predetermined information in a storage medium on the basis of the information indicating the overall size of the predetermined information at a time point where the first piece of divided information is obtained.
Abstract:
The present disclosure relates to an information processing device and information processing method capable of recognizing an acquisition position of voice data on an image. A web server transmits image frame size information indicating image frame size of image data and audio position information indicating acquisition position of voice data. The present disclosure is applicable to an information processing system or other like system including file generation device, web server, and video playback terminal to perform tiled streaming using a manner compliant with moving picture experts group phase-dynamic adaptive streaming over HTTP (MPEG-DASH).
Abstract:
The present invention relates to a signal processing apparatus and method, a program, and a data recording medium configured such that the playback level of an audio signal can be easily and effectively enhanced without requiring prior analysis. An analyzer 21 generates mapping control information in the form of the root mean square of samples in a given segment of a supplied audio signal. A mapping processor 22 takes a nonlinear function determined by the mapping control information taken as a mapping function, and conducts amplitude conversion on a supplied audio signal using the mapping function. In this way, by conducting amplitude conversion of an audio signal using a nonlinear function that changes according to the characteristics in respective segments of an audio signal, the playback level of an audio signal can be easily and effectively enhanced without requiring prior analysis. The present invention may be applied to portable playback apparatus.
Abstract:
Provided is a voice processing apparatus including a feature quantity calculation section extracting a feature quantity from a target frame of an input voice signal, a sound pressure estimation candidate point updating section making each frame of the input voice signal a sound pressure estimation candidate point, retaining the feature quantity of each sound pressure estimation candidate point, and updating the sound pressure estimation candidate point based on the feature quantity of the sound pressure estimation candidate point and the feature quantity of the target frame, a sound pressure estimation section calculating an estimated sound pressure of the input voice signal, based on the feature quantity of the sound pressure estimation candidate point, a gain calculation section calculating a gain applied to the input voice signal based on the estimated sound pressure, and a gain application section performing a gain adjustment of the input voice signal based on the gain.
Abstract:
The present technology relates to a neural network device capable of improving recognition performance. The neural network device includes a non-linear transformation layer processing unit that performs a transformation with a non-linear function having a learnable parameter. The present technology can be applied to a neural network.
Abstract:
A device and method capable of performing image following type audio control or image non-following type audio control are implemented. Images in different directions are selectively displayed on the display unit, and an output audio is controlled in accordance with an image display. A data processing unit executes image following type audio control of moving an audio source direction in accordance with movement of the display image of the display unit and image non-following type audio control of not moving the audio source direction in accordance with the movement of an image in units of individual controllable audio elements. The data processing unit acquires audio control information from an MP4 file or a media presentation description (MPD) file and executes either the image following type audio control or the image non-following type audio control in accordance with the acquired audio control information in units of individual controllable audio elements.
Abstract:
The present technology relates to an information processing apparatus, an information processing method, and a program for achieving reduction of a processing load on a distribution side along with reduction of a transfer volume of information. The information processing apparatus includes an acquisition unit that acquires low accuracy position information having first accuracy and indicating a position of an object within a space where a user is located and acquires additional information for obtaining position information that has second accuracy higher than the first accuracy, indicates the position of the object within the space, and corresponds to a position of the user and a position information calculation unit that obtains the position information on the basis of the low accuracy position information and the additional information. The present technology is applicable to an information processing apparatus.
Abstract:
The present technique relates to a frequency band extension apparatus, a frequency band extension method, and a program which are configured to more easily obtain a high quality sound signal. An input signal may be divided into sub-band signals of a plurality of sub-bands, powers of high frequency sub-bands of the input signal may be estimated based on feature values extracted from the input signal to obtain high frequency sub-band power estimation values, the high frequency sub-band powers obtained from the sub-band signals of high-frequency sub-bands of the input signal may be compared with the high frequency sub-band power estimation values, and a high-frequency signal of the input signal may be generated based on a result of the comparison and the sub-band signals.
Abstract:
The present technology relates to an encoding device and method, a decoding device and method, and a program therefor capable of improving audio signal transmission efficiency.An identification information generation unit determines whether or not an audio signal is to be encoded on the basis of the audio signal, and generates identification information indicating the determination result. An encoding unit encodes only audio signals determined to be encoded. A packing unit generates a bit stream containing the identification information and encoded audio signals. As a result of storing only encoded audio signals in the bit stream and storing the identification information indicating whether or not the respective audio signals are to be encoded in the bit stream in this manner, the transmission efficiency of audio signals can be improved. The present technology can be applied to an encoder and a decoder.
Abstract:
To suitably regulate sound pressure of object content on a receiving side.An audio stream including coded data of a predetermined number of pieces of object content is generated. A container of a predetermined format including the audio stream is transmitted. Information indicating a range within which sound pressure is allowed to increase and decrease for each piece of object content is inserted into a layer of the audio stream and/or a layer of the container. On a receiving side, sound pressure of each piece of object content increases and decreases within the allowable range based on the information.