摘要:
Various embodiments of the present invention are directed to systems and methods for multimodal object localization using one or more depth sensors and two or more microphones. In one aspect, a method comprises capturing three-dimensional images of a region of space wherein the object is located. The images comprise three-dimensional depth sensor observations. The method collects ambient audio generated by the object, providing acoustic observation regarding the ambient audio time difference of arrival at the audio sensors. The method determines a coordinate location of the object corresponding to the maximum of a joint probability distribution characterizing the probability of the acoustic observations emanating from each coordinate location in the region of space and the probability of each coordinate location in the region of space given depth sensor observations.
摘要:
Various embodiments of the present invention are directed to systems and methods for multimodal object localization using one or more depth sensors and two or more microphones. In one aspect, a method comprises capturing three-dimensional images of a region of space wherein the object is located. The images comprise three-dimensional depth sensor observations. The method collects ambient audio generated by the object, providing acoustic observation regarding the ambient audio time difference of arrival at the audio sensors. The method determines a coordinate location of the object corresponding to the maximum of a joint probability distribution characterizing the probability of the acoustic observations emanating from each coordinate location in the region of space and the probability of each coordinate location in the region of space given depth sensor observations.
摘要:
Systems and methods for performing sound source localization are provided. In one aspect, a method for locating a sound source using a computing device subdivides a space into subregions. The method then computes a sound source power for each of subregions and determines which of the sound source energies is the largest. When the volume of the subregion is less than a threshold volume, the method outputs the subregion having the largest sound source power. Otherwise, the stages of partitioning, computing, and determining the subregion having the largest sound source power is repeated.
摘要:
A method for time delay estimation performed by a physical computing system includes passing a first input signal obtained by a first sensor through a filter bank to form a first set of sub-band output signals, passing a second input signal obtained by a second sensor through the filter bank to form a second set of sub-band output signals, the second sensor placed a distance from the first sensor, computing cross-correlation data between the first set of sub-band output signals and the second set of sub-band output signals, and applying a time delay determination function to the cross-correlation to determine a time delay estimation.
摘要:
A method for time delay estimation performed by a physical computing system includes passing a first input signal obtained by a first sensor through a filter bank to form a first set of sub-band output signals, passing a second input signal obtained by a second sensor through the filter bank to form a second set of sub-band output signals, the second sensor placed a distance from the first sensor, computing cross-correlation data between the first set of sub-band output signals and the second set of sub-band output signals, and applying a time delay determination function to the cross-correlation to determine a time delay estimation.
摘要:
The present invention describes a method of determining the active talker for display on a video conferencing system, including the steps of: for each participant, capturing audio data using an audio capture sensor and video data using a video capture sensor; determining the probability of active speech (pA, pB . . . pN), where the probability of active speech is a function of the probability of soft voice detection captured by the audio capture sensor and the probability of lip motion detection captured by the video capture sensor; and automatically displaying at least the participant that has the highest probability of active speech.
摘要:
Systems and methods for performing sound source localization are provided. In one aspect, a method for locating a sound source using a computing device subdivides a space into subregions. The method then computes a sound source power for each of subregions and determines which of the sound source energies is the largest. When the volume of the subregion is less than a threshold volume, the method outputs the subregion having the largest sound source power. Otherwise, the stages of partitioning, computing, and determining the subregion having the largest sound source power is repeated.
摘要:
Embodiments of the present invention disclose a system and method for distributed meeting capture. According to one embodiment, the system includes a plurality of personal devices configured to capture video data and audio data associated with at least one operating user. A media hub includes a plurality of I/O ports and is configured to receive video and audio data from the plurality of personal devices. In addition, the media hub is configured to collect the video data and/or audio data from the plurality of personal devices and output at least one audio-visual data stream for facilitating video conferencing over a network.
摘要:
Various embodiments of the present invention are directed to methods for dereverberation of audio generated in a room. In one aspect, a method for dereverberating reverberant digital signals comprises transforming a reverberant digital signal from the time domain into Fourier domain signals using a computing device, each Fourier domain signal corresponding to a subband. For each subband of the Fourier domain signal, the method computes autoregressive model coefficients of the reverberation with the current and previous magnitudes of the Fourier digital signal, and inverse filters the magnitude of the Fourier domain signal using the computing device, based on the autoregressive model coefficients and previous magnitudes of the Fourier digital signal. The method includes inverse transforming the Fourier domain signals with filtered magnitudes into an approximate dereverberated digital signal.
摘要:
Various embodiments of the present invention are directed to methods for dereverberation of audio generated in a room. In one aspect, a method for dereverberating reverberant digital signals comprises transforming a reverberant digital signal from the time domain into Fourier domain signals using a computing device, each Fourier domain signal corresponding to a subband. For each subband of the Fourier domain signal, the method computes autoregressive model coefficients of the reverberation with the current and previous magnitudes of the Fourier digital signal, and inverse filters the magnitude of the Fourier domain signal using the computing device, based on the autoregressive model coefficients and previous magnitudes of the Fourier digital signal. The method includes inverse transforming the Fourier domain signals with filtered magnitudes into an approximate dereverberated digital signal.