摘要:
A method is performed of selecting a signal of interest from a compound signal. The method includes generating a matrix and initializing a first row of the matrix to zero. The compound signal is obtained as a digital waveform signal, with sampling rate S samples per second. For each sample of the digital signal, a new entry is recursively computed for the matrix. For each frequency bin in the matrix (where f is the center frequency of the bin) the value in the new row is computed by multiplying the value for that bin in the previous row by the complex number r*ei2πf/S and adding the new signal sample multiplied by a real constant. The method includes identifying the signal of interest from the matrix, whereby an uncertainty of which frequency bin the signal of interest exists in is eliminated.
摘要:
System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. The semantic information may be associated with audio signature dataExtracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
摘要:
System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. The semantic information may be associated with audio signature dataExtracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
摘要:
During operation, a “coarse search” stage applies variable-scale windowing on the query pitch contours to compare them with fixed-length segments of target pitch contours to find matching candidates while efficiently scanning over variable tempo differences and target locations. Because the target segments are of fixed-length, this has the effect of drastically reducing the storage space required in a prior-art method. Furthermore, by breaking the query contours into parts, rhythmic inconsistencies can be more flexibly handled. Normalization is also applied to the contours to allow comparisons independent of differences in musical key. In a “fine search” stage, a “segmental” dynamic time warping (DTW) method is applied that calculates a more accurate similarity score between the query and each candidate target with more explicit consideration toward rhythmic inconsistencies.
摘要:
A technique is disclosed for evaluating an audio characteristic such as singing ability, and processing an image to be displayed according to the evaluation result in a manner that can attract the interest of a user. JPEG 2000 code data of a moving image for a karaoke system, for example, are transmitted from a server to a client along with accompanying audio data, and the code data are then decoded at a decoder to form an image to be displayed. An audio signal such as the voice of the user that is input to a microphone is evaluated at an evaluation unit, and the evaluation result is transmitted to the server. Based on this evaluation result, an inter-code transform unit conducts image processing by selectively discarding codes from code data of an image that are to be transmitted to the client.
摘要:
A music classification technique computes histograms of Daubechies wavelet coefficients at various frequency subbands with various resolutions. The coefficients are then used as an input to a machine learning technique to identify the genre and emotional content of music.
摘要:
An additive sound synthesis process for generating complex, realistic sounds is realized in a computationally efficient manner. In accordance with one aspect of the invention, polyphony is efficiently achieved by dosing the energy of a given partial between separate transform sums corresponding to different channels. In accordance with another aspect of the invention, noise is injected by randomly perturbing the phase of the sound, either on a per-partial basis or on a transform-sum basis. In the latter instance, the phase is perturbed in different regions of the spectrum to a degree determined by the amount of energy present in the respective regions of the spectrum. In accordance with yet another aspect of the invention, a transform sum representing a sound is processed in the transform domain to achieve with great economy effects achievable only at much greater expense outside the transform domain. Other transforms besides the Fourier transform may be used to advantage. For example, use of the Hartley transform produces comparable results but allows transforms to be computed at approximately twice the speed as the Fourier transform.
摘要:
An apparatus for making music includes a cymbal, an acoustic transducer, a signal-processing system that receives a first signal from the acoustic transducer and that generates a second signal based on a property of the first signal, and a classifier that determines a particular manner in which the cymbal was struck based on the second signal, and provides an output trigger signal for triggering production of a sound that consistent with the particular manner.
摘要:
An apparatus for making music includes a cymbal, an acoustic transducer, a signal-processing system that receives a first signal from the acoustic transducer and that generates a second signal based on a property of the first signal, and a classifier that determines a particular manner in which the cymbal was struck based on the second signal, and provides an output trigger signal for triggering production of a sound that consistent with the particular manner.
摘要:
A method for producing an electronically-simulated live musical performance, the method comprising providing morph-friendly solo tracks, morphing the morph-friendly solo tracks to produce a morphed track, and post-processing the morphed track. The method may also include combining the post-processed morphed track with one or more supporting tracks to produce an acoustic image for playback.