摘要:
The disclosed embodiments provide a system that performs a sound-recognition operation. During operation, the system recognizes a sequence of sound primitives in an audio stream, wherein a sound primitive is associated with a semantic label comprising one or more words that describe a sound characterized by the sound primitive. Next, the system feeds the sequence of sound primitives into a finite-state automaton that recognizes events associated with sequences of sound primitives. Finally, the system feeds the recognized events into an output system that generates an output associated with the recognized events to be displayed to a user.
摘要:
The disclosed embodiments provide a system that generates sound primitives to facilitate sound recognition. First, the system performs a feature-detection operation on sound samples to detect a set of sound features, wherein each sound feature comprises a measurable characteristic of a window of consecutive sound samples. Next, the system creates feature vectors from coefficients generated by the feature-detection operation, wherein each feature vector comprises a set of coefficients for sound features detected in a window. The system then performs a clustering operation on the feature vectors to produce feature-vector clusters, wherein each feature-vector cluster comprises a set of feature vectors that are proximate to each other in a feature-vector space that contains the feature vectors. After the clustering operation, the system defines a set of sound primitives, wherein each sound primitive is associated with a feature-vector cluster. Finally, the system associates semantic labels with the set of sound primitives.
摘要:
The disclosed embodiments provide a system for recognizing a sound event in raw sound. During operation, the system receives the raw sound, wherein the raw sound comprises a sequence of digital samples of sound. Next, the system segments the raw sound into a sequence of tiles, wherein each tile comprises a set of consecutive digital samples. The system then converts the sequence of tiles into a sequence of snips, wherein each snip includes a symbol representing an associated tile in the sequence of tiles. Next, the system generates annotations for the sequence of snips and the raw sound, wherein each annotation specifies a property associated with one or more snips in the sequence of snips or the raw sound. Finally, the system recognizes the sound event based on the generated annotations.
摘要:
The disclosed embodiments provide a system that transforms a sound into a symbolic representation. During operation, the system extracts a sequence of tiles, comprising spectrogram slices, from the sound. Next, the system determines tile features for each tile in the sequence of tiles. The system then performs a clustering operation based on the tile features to identify clusters of tiles and to associate each tile with a cluster. Finally, the system associates each identified cluster with a unique symbol, and represents the sound as a sequence of symbols representing clusters, which are associated with the sequence of tiles.
摘要:
The disclosed embodiments provide a system that generates sound primitives to facilitate sound recognition. First, the system performs a feature-detection operation on sound samples to detect a set of sound features, wherein each sound feature comprises a measurable characteristic of a window of consecutive sound samples. Next, the system creates feature vectors from coefficients generated by the feature-detection operation, wherein each feature vector comprises a set of coefficients for sound features detected in a window. The system then performs a clustering operation on the feature vectors to produce feature-vector clusters, wherein each feature-vector cluster comprises a set of feature vectors that are proximate to each other in a feature-vector space that contains the feature vectors. After the clustering operation, the system defines a set of sound primitives, wherein each sound primitive is associated with a feature-vector cluster. Finally, the system associates semantic labels with the set of sound primitives.
摘要:
Systems and methods for identifying a perceived sound event are provided. In one exemplary embodiment, the system includes an audio signal receiver, a processor, and an analyzer. The system deconstructs a received audio signal into a plurality of audio chunks, for which one or more sound identification characteristics are determined. One ore more distances of a distance vector are then calculated based on one or more of the sound identification characteristics. The distance vector can be a sound gene that serves as an identifier for the sound event. The distance vector for a received audio signal is compared to distance vectors of predefined sound events to identify the source of the received audio signal. A variety of other systems and methods related to sound identification are also provided.
摘要:
The disclosed embodiments provide a system that performs a sound-recognition operation. During operation, the system recognizes a sequence of sound primitives in an audio stream, wherein a sound primitive is associated with a semantic label comprising one or more words that describe a sound characterized by the sound primitive. Next, the system feeds the sequence of sound primitives into a finite-state automaton that recognizes events associated with sequences of sound primitives. Finally, the system feeds the recognized events into an output system that generates an output associated with the recognized events to be displayed to a user.
摘要:
A method, apparatus and system for transforming a progressing sound signal into a progressing visual pattern, the progressing visual pattern being perceptible and recognizable as the progressing sound signal to a user in real time. The progressing visual pattern displays in real time a set of optical attributes, the set of optical attributes being transformations from a set of sound features that define the sound signal in real time. The sound features and optical attributes, along with changes in the sound features and optical attributes over time, are preselected to be isomorphic to sound, perceptible to human vision, efficiently processed by human cognition, and therefore to be recognizable to a human who has been exposed and actively or passively trained to it.