Abstract:
User interface techniques provide user vocalists with mechanisms for seeding subsequent performances by other users (e.g., joiners). A seed may be a full-length seed spanning much or all of a pre-existing audio (or audiovisual) work and mixing, to seed further contributions of one or more joiners, a user's captured media content for at least some portions of the audio (or audiovisual) work. A short seed may span less than all (and in some cases, much less than all) of the audio (or audiovisual) work. For example, a verse, chorus, refrain, hook or other limited “chunk” of an audio (or audiovisual) work may constitute a seed. A seeding user's call invites other users to join the full-length or short-form seed by singing along, singing a particular vocal part or musical section, singing harmony or other duet part, rapping, talking, clapping, recording video, adding a video clip from camera roll, etc. The resulting group performance, whether full-length or just a chunk, may be posted, livestreamed, or otherwise disseminated in a social network.
Abstract:
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
Abstract:
Vocal audio of a user together with performance synchronized video is captured and coordinated with audiovisual contributions of other users to form composite duet-style or glee club-style or window-paned music video-style audiovisual performances. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for presentation, at any given time along a given performance timeline, performance synchronized video of one or more of the contributors. Selections are in accord with a visual progression that codes a sequence of visual layouts in correspondence with other coded aspects of a performance score such as pitch tracks, backing audio, lyrics, sections and/or vocal parts.
Abstract:
Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.
Abstract:
Vocal musical performances may be captured and, in some cases or embodiments, pitch-corrected and/or processed in accord with a user selectable vocal effects schedule for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. Vocal effects schedules may also be selectively applied to such performances. In these ways, even amateur user/performers with imperfect pitch are encouraged to take a shot at “stardom” and/or take part in a game play, social network or vocal achievement application architecture that facilitates musical collaboration on a global scale and/or, in some cases or embodiments, to initiate revenue generating in-application transactions.
Abstract:
Embodiments described herein relate generally to systems comprising a display device, a display device-coupled computing platform, a mobile device in communication with the computing platform, and a content server in which methods and techniques of capture and/or processing of audiovisual performances are described and, in particular, description of techniques suitable for use in connection with display device connected computing platforms for rendering vocal performance captured by a handheld computing device.
Abstract:
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
Abstract:
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.