摘要:
A method for recognizing an environmental sound in a client device in cooperation with a server is disclosed. The client device includes a client database having a plurality of sound models of environmental sounds and a plurality of labels, each of which identifies at least one sound model. The client device receives an input environmental sound and generates an input sound model based on the input environmental sound. At the client device, a similarity value is determined between the input sound model and each of the sound models to identify one or more sound models from the client database that are similar to the input sound model. A label is selected from labels associated with the identified sound models, and the selected label is associated with the input environmental sound based on a confidence level of the selected label.
摘要:
A method for recognizing an environmental sound in a client device in cooperation with a server is disclosed. The client device includes a client database having a plurality of sound models of environmental sounds and a plurality of labels, each of which identifies at least one sound model. The client device receives an input environmental sound and generates an input sound model based on the input environmental sound. At the client device, a similarity value is determined between the input sound model and each of the sound models to identify one or more sound models from the client database that are similar to the input sound model. A label is selected from labels associated with the identified sound models, and the selected label is associated with the input environmental sound based on a confidence level of the selected label.
摘要:
A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range.
摘要:
A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range.
摘要:
A particular method includes transitioning out of a low-power state at a processor. The method also includes retrieving audio feature data from a buffer after transitioning out of the low-power state. The audio feature data indicates features of audio data received during the low-power state of the processor.
摘要:
A method for providing information for a conference at one or more locations is disclosed. One or more mobile devices monitor one or more starting requirements of the conference and transmit input sound information to a server when the one or more starting requirements of the conference is detected. The one or more starting requirements may include a starting time of the conference, a location of the conference, and/or acoustic characteristics of a conference environment. The server generates conference information based on the input sound information from each mobile device and transmits the conference information to each mobile device. The conference information may include information on attendees, a current speaker among the attendees, an arrangement of the attendees, and/or a meeting log of attendee participation at the conference.
摘要:
A processor is configured to transition in and out of a low-power state at a first rate and to operate in a first mode or a second mode. In a particular method, the processor while coupled to a coder/decoder (CODEC) retrieves audio feature data from a buffer after transitioning out of the low-power state. The CODEC is configured to operate at a second rate in the first mode and at a third rate in the second mode, the second rate and the third rate each greater than the first rate. The audio feature data indicates features of audio data received during the low-power state of the processor. A ratio of CODEC activity to processor activity in the second mode is less than the ratio in the first mode.
摘要:
A method for grouping a plurality of client devices is disclosed. The method includes receiving sound descriptors from the plurality of client devices. The sound descriptors are extracted from the environmental sound. Each of the sound descriptors is transmitted to a server, which determines a similarity of the sound descriptors received from the client devices. The server groups the plurality of client devices into at least one similar context group based on the similarity of the sound descriptors.
摘要:
A sleep monitoring application is installed on a mobile device. The mobile device is placed in a location when a user sleeps and records environmental sound. The sleep monitoring application determines indicators of sleep activity such as breathing sounds made by the user, and determines a sleep state of the user based on the indicators of sleep activity. Sleep disorders can be detected from the indicators of sleep activity. The sleep monitoring application may generate a report that summarizes the user's sleep states and alerts the user to any sleep disorders. The sleep monitoring application can use the environmental sound and the determined sleep states to determine ambient sound that is associated with good sleep. Later, if the sleep application determines the user is having problems sleeping, the sleep monitoring application can play the determined ambient sound to help the user sleep.
摘要:
Embodiments of the invention describe methods and apparatus for performing context-sensitive OCR. A device obtains an image using a camera coupled to the device. The device identifies a portion of the image comprising a graphical object. The device infers a context associated with the image and selects a group of graphical objects based on the context associated with the image. Improved OCR results are generated using the group of graphical objects. Input from various sensors including microphone, GPS, and camera, along with user inputs including voice, touch, and user usage patterns may be used in inferring the user context and selecting dictionaries that are most relevant to the inferred contexts.