Masked Expressiveness: Conditioned Generation of Piano Key Striking Velocity Using Masked Language Modeling
Abstract: Creating and experiencing expressive renditions of composed musical pieces are fundamental to how people engage with music. Numerous studies have focused on building computational models of musical expressiveness to gain a deeper understanding of its nature and explore potential applications. This paper examines masked language modeling ( MLM) for modeling expressiveness in piano performance, specifically targeting the prediction of key striking velocity using vanilla Bidirectional Encoder Representations from Transformers (BERT). While MLM has been explored in previous studies, this work applies it in a novel direction by concentrating on the fine-grained conditioned prediction of velocity information. The results show that the model can predict masked velocity events in various contexts within an acceptable margin of error, relying solely on the pitch, timing, and velocity data encoded in Musical Instrument Digital Interface (MIDI) files. Additionally, the study employs a sequential masking and prediction approach toward rendering the velocity of unseen MIDI files and achieves more musically convincing results. This approach holds promise for developing interactive systems for expressive performance generation, such as advanced piano conducting or accompaniment systems.
conda create -n maskexp python=3.11 -y
conda activate maskexp
pip install -r requirements.txt
pip install -e .
This demo allows user to interact with a pretrained model by inputting user-performed fragments of a musical piece (e.g. the main melody line). Based on these fragments, the model reconstructs a full expressive performance of the piece, capturing and extending the user's style, with the help of an offline symbolic music alignment algorithm.
Program Arguments:
score_path
: absolute path to the musical score in musicxml format. You will need to rename its extension to.xml
for compatibility with the alignment algorithm.performance_path
: absolute path to your performance in the MIDI format (e.g. xxx.mid). Please make sure the extension name is.mid
instead of.midi
ref_midi_path
: in case that the score is too complex to be aligned well, you may instead provide a MIDI version of the score. This usually leads to better alignment result, given that the performance is an incomplete fraction of the musical piece.- if
ref_midi_path
is used,score_path
should be omitted.
- if
output_dir
: directory for writing the predicted performancefile_stem
: file name (without extension)ckpt_path
: path to the torch model checkpoint
You can download the pretrained model from this link.
python maskexp/demo/prediction.py \
--score_path [PATH_TO_XML_SCORE] \
--performance_path [PATH_TO_MIDI] \
# --ref_midi_path [PATH_TO_MIDI] \ # use score_path or ref_midi_path
--output_dir [ABSOLUTE_PATH] \
--file_stem [FILE_NAME] \
--ckpt_path [CKPT_PATH]
This repository is licensed under the MIT License. However, it includes portions of code from the Magenta project, which are stored in the magenta folder and licensed under the Apache License 2.0. Their usage is subject to the terms of the Apache License 2.0. The project also includes a copy of Nakamura et al.'s symbolic music alignment tool, which uses the MIT license.