摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is generated, as output of a first recurrent neural network in response to receiving the first vector sequence as input, where the first recurrent neural network has a transition matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary. An output vector sequence of a second recurrent neural network is generated. A transcription for the utterance is generated based on the output vector sequence generated by the second recurrent neural network. The transcription for the utterance is provided.
摘要:
A computer implemented method is provided for generating a prediction of a next musical note by a computer having at least a processor and a memory. A computer processor system is also provided for generating a prediction of a next musical note. The method includes storing sequential musical notes in the memory. The method further includes dividing, by the processor, the sequential musical notes into sections of a given length based on a Generative Theory of Tonal Music. The method also includes generating, by the processor, the prediction of the next musical note based upon a music model, the sections, and the sequential musical notes stored in the memory. The given length is determined based on one or more conditions.
摘要:
An automated music composition and generation system allowing uses to create and deliver electronic messages and documents such as text, SMS and email, augmented with automatically-composed music generated using user-selected music emotion and style descriptors. The automated music composition and generation system includes an automated music composition and generation engine operably connected to a system user interface, and the infrastructure of the Internet. Mobile and desktop client machines provide text, SMS and/or email services supported on the Internet. Each client machine has a text application, SMS application and/or email application that is augmented by the addition of automatically-composed music by users using the automated music composition and generation engine. By selecting and providing musical emotion and style descriptor icons to the engine, music is automatically composed, generated, and embedded in text, SMS and/or email messages for delivery to other client machines over the infrastructure of the Internet.
摘要:
Photodiodes in combination with an amplifier of transimpedance configuration provides an optical vibration detector having a linear frequency response with a light emitter and sensor of sufficiently small size to be inserted between strings of a musical instrument in order to provide signals suitable for amplification. The frequencies of vibrating strings of a musical instrument can be converted in accordance with either of two converter embodiments to control a music synthesizer, an automatic music transcription arrangement or the like.
摘要:
A method of creating autonomous musical output: including creating a mutually inhibiting neuronal network including a plurality of nodes arranged to integrate and fire; associating each of the plurality of nodes with a musical instrument; and creating, when a node fires, a musical output corresponding to the musical instrument associated with the firing node.
摘要:
The present invention relates to a method and apparatus for selectively and retroactively recording only a music section out of radio broadcast content. According to the present invention, there is provided a method for selectively and retroactively recording only a music section out of radio broadcast content, comprising the steps of (a) detecting a start point of the music section; (b) temporarily recording the music section from the start point in a buffer memory; (c) detecting a command to record the music section placed by a user; and (d) transferring the music section recorded in the buffer memory to a semi-permanent memory.
摘要:
An improved control structure for music synthesis is provided in which: 1) the sound representation provided to the adaptive function mapper allows for a greatly increased degree of control over the sound produced; and 2) training of the adaptive function mapper is performed using an error measure, or error norm, that greatly facilitates learning while ensuring perceptual identity of the produced sound with the training example. In accordance with one embodiment of the invention, sound data is produced by applying to an adaptive function mapper control parameters including: at least one parameter selected from the set of time and timbre space coordinates; and at least one parameter selected from the set of pitch, .DELTA.pitch, articulation and dynamic. Using an adaptive function mapper, mapping is performed from the control parameters to synthesis parameters to be applied to a sound synthesizer. In accordance with another embodiment of the invention, an adaptive function mapper is trained to produce, in accordance with information stored in a mapping store, synthesis parameters to be applied to a sound synthesizer, by steps including: analyzing sounds to produce sound parameters describing the sounds; further analyzing the sound parameters to produce control parameters; applying the control parameters to the adaptive function mapper, the adaptive function mapper in response producing trial synthesis parameters comparable to the sound parameters; deriving from the sound parameters and the trial synthesis parameters an error measure in accordance with a perceptual error norm in which at least some error contributions are weighted in approximate degree to which they are perceived by the human ear during synthesis; and adapting the information stored in the mapping store in accordance with the error measure.
摘要:
In an electronic musical apparatus having an acoustic instrument manually operable to commence an acoustic vibration and a tone generator responsive to the acoustic vibration to generate a musical tone having a pitch corresponding to that of the acoustic vibration, a pitch detecting device utilizes a pickup for picking up the acoustic vibration to convert the same into a waveform signal. Further, a first detector operates according to a fast algorithm for processing the waveform signal so as responsively produce a first output representative of the pitch of the acoustic vibration, and a second detector operates in parallel to the first detector for processing the same waveform signal according to a slow algorithm so as to stably produce a second output representative of the pitch of the acoustic vibration. A selector selectively feeds one of the first output and the second output to the tone generator so that the first detector and the second detector can cooperate to ensure responsive and stable detection of the pitch. An additional detector processes the waveform signal to measure a time interval between a pair of the peaks so as to detect a plucking point. A controller controls the tone generator according to the detected plucking point to change the timbre of the tone generator.
摘要:
The present disclosure provides systems and methods that leverage one or more machine-learned models to generate music from text. In particular, a computing system can include a music generation model that is operable to extract one or more structural features from an input text. The one or more structural features can be indicative of a structure associated with the input text. The music generation model can generate a musical composition from the input text based at least in part on the one or more structural features. For example, the music generation model can generate a musical composition that exhibits a musical structure that mimics or otherwise corresponds to the structure associated with the input text. For example, the music generation model can include a machine-learned audio generation model. In such fashion, the systems and methods of the present disclosure can generate music that exhibits a globally consistent theme and/or structure.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.