-
公开(公告)号:US20210183372A1
公开(公告)日:2021-06-17
申请号:US16717507
申请日:2019-12-17
Applicant: Spotify AB
Inventor: Andreas Jansson , Eric J. Humphrey , Rachel Malia Bittner , Sravana K. Reddy
IPC: G10L15/08 , G10L15/187 , G10L15/14
Abstract: Term masking is performed by generating a time-alignment value for a plurality of identifiable units of sound in vocal audio content contained in a mixed audio track, force-aligning each of the plurality of identifiable units of sound to the vocal audio content based on the time-alignment value, thereby generating a plurality of force-aligned identifiable units of sound, identifying from the plurality of force-aligned identifiable units of sound a force-aligned identifiable unit of sound to be muddled, and audio muddling the force-aligned identifiable unit of sound to be muddled.
-
公开(公告)号:US20210103422A1
公开(公告)日:2021-04-08
申请号:US16595404
申请日:2019-10-07
Applicant: SPOTIFY AB
Inventor: Michael Scibor , Thor Kell , Rachel Malia Bittner , Tristan Jehan
IPC: G06F3/16 , G06N3/04 , G06F16/683
Abstract: A cuepoint determination system utilizes a convolutional neural network (CNN) to determine cuepoint placements within media content items to facilitate smooth transitions between them. For example, audio content from a media content item is normalized to a plurality of beats, the beats are partitioned into temporal sections, and acoustic feature groups are extracted from each beat in one or more of the temporal sections. The acoustic feature groups include at least downbeat confidence, position in bar, peak loudness, timbre and pitch. The extracted acoustic feature groups for each beat are provided as input to the CNN on a per temporal section basis to predict whether a beat immediately following the temporal section within the media content item is a candidate for cuepoint placement. A cuepoint placement is then determined from among the candidate cuepoint placements predicted by the CNN.
-
公开(公告)号:US11714594B2
公开(公告)日:2023-08-01
申请号:US16595404
申请日:2019-10-07
Applicant: SPOTIFY AB
Inventor: Michael Scibor , Thor Kell , Rachel Malia Bittner , Tristan Jehan
IPC: G06F3/16 , G06F16/683 , G06N3/048
CPC classification number: G06F3/165 , G06F16/683 , G06N3/048 , G10H2210/031 , G10H2210/061 , G10H2210/091
Abstract: A cuepoint determination system utilizes a convolutional neural network (CNN) to determine cuepoint placements within media content items to facilitate smooth transitions between them. For example, audio content from a media content item is normalized to a plurality of beats, the beats are partitioned into temporal sections, and acoustic feature groups are extracted from each beat in one or more of the temporal sections. The acoustic feature groups include at least downbeat confidence, position in bar, peak loudness, timbre and pitch. The extracted acoustic feature groups for each beat are provided as input to the CNN on a per temporal section basis to predict whether a beat immediately following the temporal section within the media content item is a candidate for cuepoint placement. A cuepoint placement is then determined from among the candidate cuepoint placements predicted by the CNN.
-
公开(公告)号:US20230409281A1
公开(公告)日:2023-12-21
申请号:US18335060
申请日:2023-06-14
Applicant: Spotify AB
Inventor: Michael Scibor , Thor Kell , Rachel Malia Bittner , Tristan Jehan
IPC: G06F3/16 , G06F16/683 , G06N3/048
CPC classification number: G06F3/165 , G06F16/683 , G06N3/048 , G10H2210/031 , G10H2210/091 , G10H2210/061
Abstract: A cuepoint determination system utilizes a convolutional neural network (CNN) to determine cuepoint placements within media content items to facilitate smooth transitions between them. For example, audio content from a media content item is normalized to a plurality of beats, the beats are partitioned into temporal sections, and acoustic feature groups are extracted from each beat in one or more of the temporal sections. The acoustic feature groups include at least downbeat confidence, position in bar, peak loudness, timbre and pitch. The extracted acoustic feature groups for each beat are provided as input to the CNN on a per temporal section basis to predict whether a beat immediately following the temporal section within the media content item is a candidate for cuepoint placement. A cuepoint placement is then determined from among the candidate cuepoint placements predicted by the CNN.
-
公开(公告)号:US11574627B2
公开(公告)日:2023-02-07
申请号:US17379325
申请日:2021-07-19
Applicant: Spotify AB
Inventor: Andreas Jansson , Eric J. Humphrey , Rachel Malia Bittner , Sravana K. Reddy
IPC: G10L15/08 , G10L15/14 , G10L15/187
Abstract: Term masking is performed by generating a time-alignment value for a plurality of units of sound in vocal audio content contained in a mixed audio track, force-aligning each of the plurality of units of sound to the vocal audio content based on the time-alignment value, thereby generating a plurality of force-aligned identifiable units of sound, identifying from the plurality of force-aligned units of sound a force-aligned unit of sound to be altered, and altering the identified force-aligned unit of sound.
-
公开(公告)号:US11087744B2
公开(公告)日:2021-08-10
申请号:US16717507
申请日:2019-12-17
Applicant: Spotify AB
Inventor: Andreas Jansson , Eric J. Humphrey , Rachel Malia Bittner , Sravana K. Reddy
IPC: G10L15/08 , G10L15/187 , G10L15/14
Abstract: Term masking is performed by generating a time-alignment value for a plurality of identifiable units of sound in vocal audio content contained in a mixed audio track, force-aligning each of the plurality of identifiable units of sound to the vocal audio content based on the time-alignment value, thereby generating a plurality of force-aligned identifiable units of sound, identifying from the plurality of force-aligned identifiable units of sound a force-aligned identifiable unit of sound to be muddled, and audio muddling the force-aligned identifiable unit of sound to be muddled.
-
-
-
-
-