File loading

musif is mainly built on top of MusicXML files. Specifically, it uses music21 to parse them and a caching system to cache the parsed objects and speed up successive parsing.

While MusicXML offers a rather complete representation of Western music notation, it does not offer a standard way of encoding complex harmonic annotations. For this reason, we adopted MuseScore files as well (another option could be the MEI or the IEEE 1599 formats).

Finally, we offer the option to load any file format supported by MuseScore or music21.

All these file formats are managed by 3 directories:

  1. data_dir: directory for files loaded by music21 (They default to .xml, .mxl, .musicxml, .mid, .mei)

  2. musescore_dir: directory for MuseScore files or files loaded by MuseScore

  3. cache_dir: directory for cached files

File precedence

The file loading system works as follows.

First, a list of files is obtained in this way:

  1. The data_dir directory is recuresively searched for a list of specific extension files (anything supported by music21 parser)

  2. The musescore_dir directory is recursively searched for a specific extension for harmony annotations only (used to calculate harmony features) (.mscx by default).

Even if no files are found in data_dir directory, musif will attempt to find files in cache_dir and proceed with the extraction. If even then no files are found, the extraction will stop.

Once the list of files has been obtained, we proceed to the parsing of each filename:

  1. If a corresponding file is found in cache_dir, unpickle it and skip the parsing.

  2. If filename has one of the extensions specified for music21, parse it using music21.

  3. If filename also exists in musescore_dir (with the extension specified for MuseScore) and harmonic features are requested, load it using ms3 in search of harmonic annotations. Harmonic features are listed (and editable) in a constant variable and by default they are 'harmony' and 'scale_relative' only.

In general, the recommended file format is MusicXML, as created by MakeMusic Finale® 27.

NOTE: We experienced errors due to the parsing of MusicXML by music21 and to the slight differences in the MusicXML files produced by different notation software, such as MuseScore, Finale, Sibelius, Dorico, etc.

Annotating

musif was created in the context of the ERC Didone project and thus it works better when using the same work pipeline and annotations.

In the Didone project, we adopted the following approach:

  1. An engraver transcribed the original manuscript source using Finale® 27 and exported it to a MusicXML file with .xml extension.

  2. Further checks and error fixing was performed by a group of musicologists and orchestra conductors.

  3. A music theorist annotated the harmonic analysis using MuseScore and saved it to a .mscx file; harmonic anotations were created following the guidelines presented in [1] and extended in [2].

As consequence, we suggest the use of MuseScore files for harmonic annotations and Finale® for engraving the music. Note that the harmonic features of musif only work with the annotation format presented in [1].

Experimental formats

musif also supports dfs_dir, which is an experimental feature to export the music data parsed using MuseScore and music21 to pandas DataFrames that are easily usable for datascience purposes. For now, this format is not usable for extracting features and can be accessed using the load_score_df function. More details are available in the API docs.

References

[1] Neuwirth, M., Harasim, D., Moss, F. C., & Rohrmeier, M. (2018). The Annotated Beethoven Corpus (ABC): A Dataset of Harmonic Analyses of All Beethoven String Quartets. Frontiers in Digital Humanities, 5. https://doi.org/10.3389/fdigh.2018.00016

[2] Hentschel, J., Neuwirth, M., & Rohrmeier, M. (2021). The Annotated Mozart Sonatas: Score, Harmony, and Cadence. Transactions of the International Society for Music Information Retrieval, 4(1), 67–80. https://doi.org/10.5334/tismir.63