GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework
Abstract
GETMusic, a unified diffusion framework with a novel music representation, generates high-quality music tracks from scratch or conditioned on sources, outperforming previous methods in diverse combinations.
Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrumental tracks from scratch, or based on user-provided source tracks. Considering the diverse and flexible combination between source and target tracks, a unified model capable of generating any arbitrary tracks is of crucial necessity. Previous works fail to address this need due to inherent constraints in music representations and model architectures. To address this need, we propose a unified representation and diffusion framework named GETMusic (`GET' stands for GEnerate music Tracks), which includes a novel music representation named GETScore, and a diffusion model named GETDiff. GETScore represents notes as tokens and organizes them in a 2D structure, with tracks stacked vertically and progressing horizontally over time. During training, tracks are randomly selected as either the target or source. In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as ground truth. In the denoising process, GETDiff learns to predict the masked target tokens, conditioning on the source tracks. With separate tracks in GETScore and the non-autoregressive behavior of the model, GETMusic can explicitly control the generation of any target tracks from scratch or conditioning on source tracks. We conduct experiments on music generation involving six instrumental tracks, resulting in a total of 665 combinations. GETMusic provides high-quality results across diverse combinations and surpasses prior works proposed for some specific combinations.
Community
Hello mishig👋, I hope you're doing well. I recently noticed that the original checkpoints file for the relevant work (potentially related to the GETMusic framework mentioned in the paper) is no longer available, and I’m wondering if you happen to have any backup of this checkpoints file.
Having access to the checkpoints would be really helpful for further research or practice related to music generation models like GETMusic, which utilizes the GETScore representation and GETDiff diffusion model to handle diverse instrumental track combinations. If you have a backup, could you please let me know? If not, that’s totally understandable too—just wanted to check with you first. Thanks a lot for your time and help!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper