Using musif in pro mode

This tutorial is intended for people who already have some programming skills. If you just want to try and explore musif, first check the Getting started tutorial. You will also find guide for installation procedure and set-up there.

Download the Advanced tutorial notebook here

! pip install musif
Requirement already satisfied: musif in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (1.2.4)
Requirement already satisfied: deepdiff>=6.2.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (8.0.1)
Requirement already satisfied: joblib>=1.0.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (1.4.2)
Requirement already satisfied: ms3==2.4.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (2.4.2)
Requirement already satisfied: music21>=9.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (9.1.0)
Requirement already satisfied: pandas>=1.3.3 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (2.2.3)
Requirement already satisfied: pyyaml>=5.4.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (6.0.1)
Requirement already satisfied: roman>=3.3 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (4.2)
Requirement already satisfied: scipy>=1.6.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (1.14.1)
Requirement already satisfied: tqdm>=4.56.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (4.66.5)
Requirement already satisfied: webcolors==1.12 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from musif) (1.12)
Requirement already satisfied: beautifulsoup4 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from ms3==2.4.2->musif) (4.12.3)
Requirement already satisfied: frictionless[pandas,visidata] in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from ms3==2.4.2->musif) (5.18.0)
Requirement already satisfied: lxml in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from ms3==2.4.2->musif) (5.3.0)
Requirement already satisfied: pathos in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from ms3==2.4.2->musif) (0.3.3)
Requirement already satisfied: pytablewriter==1.0.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from ms3==2.4.2->musif) (1.0.0)
Requirement already satisfied: GitPython in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from ms3==2.4.2->musif) (3.1.43)
Requirement already satisfied: typing-extensions in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from ms3==2.4.2->musif) (4.11.0)
Requirement already satisfied: setuptools>=38.3.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pytablewriter==1.0.0->ms3==2.4.2->musif) (75.1.0)
Requirement already satisfied: DataProperty<2,>=0.55.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pytablewriter==1.0.0->ms3==2.4.2->musif) (1.0.1)
Requirement already satisfied: mbstrdecoder<2,>=1.0.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pytablewriter==1.0.0->ms3==2.4.2->musif) (1.1.3)
Requirement already satisfied: pathvalidate<4,>=2.3.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pytablewriter==1.0.0->ms3==2.4.2->musif) (3.2.1)
Requirement already satisfied: tabledata<2,>=1.3.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pytablewriter==1.0.0->ms3==2.4.2->musif) (1.3.3)
Requirement already satisfied: tcolorpy<1,>=0.0.5 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pytablewriter==1.0.0->ms3==2.4.2->musif) (0.1.6)
Requirement already satisfied: typepy<2,>=1.2.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from typepy[datetime]<2,>=1.2.0->pytablewriter==1.0.0->ms3==2.4.2->musif) (1.3.2)
Requirement already satisfied: orderly-set==5.2.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from deepdiff>=6.2.1->musif) (5.2.2)
Requirement already satisfied: chardet in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from music21>=9.1->musif) (5.2.0)
Requirement already satisfied: jsonpickle in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from music21>=9.1->musif) (3.3.0)
Requirement already satisfied: matplotlib in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from music21>=9.1->musif) (3.9.2)
Requirement already satisfied: more-itertools in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from music21>=9.1->musif) (10.5.0)
Requirement already satisfied: numpy in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from music21>=9.1->musif) (2.1.2)
Requirement already satisfied: requests in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from music21>=9.1->musif) (2.32.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pandas>=1.3.3->musif) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pandas>=1.3.3->musif) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pandas>=1.3.3->musif) (2024.2)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas>=1.3.3->musif) (1.16.0)
Requirement already satisfied: soupsieve>1.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from beautifulsoup4->ms3==2.4.2->musif) (2.5)
Requirement already satisfied: attrs>=22.2.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (23.1.0)
Requirement already satisfied: humanize>=4.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (4.11.0)
Requirement already satisfied: isodate>=0.6 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.7.2)
Requirement already satisfied: jinja2>=3.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (3.1.4)
Requirement already satisfied: jsonschema>=4.20 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (4.23.0)
Requirement already satisfied: marko>=1.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (2.1.2)
Requirement already satisfied: petl>=1.6 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (1.7.15)
Requirement already satisfied: pydantic>=2.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (2.9.2)
Requirement already satisfied: python-slugify>=1.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (8.0.4)
Requirement already satisfied: rfc3986>=1.4 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (2.0.0)
Requirement already satisfied: simpleeval>=0.9.11 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (1.0.0)
Requirement already satisfied: stringcase>=1.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (1.2.0)
Requirement already satisfied: tabulate>=0.8.10 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.9.0)
Requirement already satisfied: typer>=0.12 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.12.5)
Requirement already satisfied: validators>=0.18 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.34.0)
Requirement already satisfied: visidata>=2.10 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (3.0.2)
Requirement already satisfied: pyarrow>=14.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from frictionless[pandas,visidata]->ms3==2.4.2->musif) (17.0.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from requests->music21>=9.1->musif) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from requests->music21>=9.1->musif) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from requests->music21>=9.1->musif) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from requests->music21>=9.1->musif) (2024.8.30)
Requirement already satisfied: gitdb<5,>=4.0.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from GitPython->ms3==2.4.2->musif) (4.0.11)
Requirement already satisfied: contourpy>=1.0.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib->music21>=9.1->musif) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib->music21>=9.1->musif) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib->music21>=9.1->musif) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib->music21>=9.1->musif) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib->music21>=9.1->musif) (24.1)
Requirement already satisfied: pillow>=8 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib->music21>=9.1->musif) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib->music21>=9.1->musif) (3.1.4)
Requirement already satisfied: ppft>=1.7.6.9 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pathos->ms3==2.4.2->musif) (1.7.6.9)
Requirement already satisfied: dill>=0.3.9 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pathos->ms3==2.4.2->musif) (0.3.9)
Requirement already satisfied: pox>=0.3.5 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pathos->ms3==2.4.2->musif) (0.3.5)
Requirement already satisfied: multiprocess>=0.70.17 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pathos->ms3==2.4.2->musif) (0.70.17)
Requirement already satisfied: smmap<6,>=3.0.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->GitPython->ms3==2.4.2->musif) (5.0.1)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from jinja2>=3.0->frictionless[pandas,visidata]->ms3==2.4.2->musif) (2.1.3)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from jsonschema>=4.20->frictionless[pandas,visidata]->ms3==2.4.2->musif) (2023.7.1)
Requirement already satisfied: referencing>=0.28.4 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from jsonschema>=4.20->frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.30.2)
Requirement already satisfied: rpds-py>=0.7.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from jsonschema>=4.20->frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.10.6)
Requirement already satisfied: annotated-types>=0.6.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pydantic>=2.0->frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pydantic>=2.0->frictionless[pandas,visidata]->ms3==2.4.2->musif) (2.23.4)
Requirement already satisfied: text-unidecode>=1.3 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from python-slugify>=1.2->frictionless[pandas,visidata]->ms3==2.4.2->musif) (1.3)
Requirement already satisfied: click>=8.0.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from typer>=0.12->frictionless[pandas,visidata]->ms3==2.4.2->musif) (8.1.7)
Requirement already satisfied: shellingham>=1.3.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from typer>=0.12->frictionless[pandas,visidata]->ms3==2.4.2->musif) (1.5.4)
Requirement already satisfied: rich>=10.11.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from typer>=0.12->frictionless[pandas,visidata]->ms3==2.4.2->musif) (13.9.2)
Requirement already satisfied: importlib-metadata>=3.6 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from visidata>=2.10->frictionless[pandas,visidata]->ms3==2.4.2->musif) (8.5.0)
Requirement already satisfied: zipp>=3.20 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from importlib-metadata>=3.6->visidata>=2.10->frictionless[pandas,visidata]->ms3==2.4.2->musif) (3.20.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from rich>=10.11.0->typer>=0.12->frictionless[pandas,visidata]->ms3==2.4.2->musif) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from rich>=10.11.0->typer>=0.12->frictionless[pandas,visidata]->ms3==2.4.2->musif) (2.15.1)
Requirement already satisfied: mdurl~=0.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer>=0.12->frictionless[pandas,visidata]->ms3==2.4.2->musif) (0.1.2)

First, let’s import musif. You may need to restart the notebook kernel here!

import musif
print(musif.__version__)
1.2.4

Data

To start, let’s download a dataset which contains various pop-rock scores in MIDI format, from Eric Clapton to The Beatles. Just run the following cell:

import urllib.request
import zipfile
from pathlib import Path

data_dir = Path("data_poprock")
dataset_path = "dataset.zip"
urllib.request.urlretrieve("https://figshare.com/ndownloader/articles/5436031/versions/1", dataset_path)
with zipfile.ZipFile(dataset_path, 'r') as zip_ref:
    zip_ref.extractall(data_dir)

Alternatively, you can download it from here from Zenodo. Uncompress it under a folder named data in the same directory as this notebook.

Create pre-cache hooks

Since musif was created with a strong focus on 18th-century opera arias, there might be some elements in this dataset that are not fully supported by the parsing engine, or some aspect of the music that is not considered by the features.

First, let’s focus on those notation elements that are not supported by our parsing engine. The best way to do this is by creating pre-cache hooks, i.e., pieces of code that are executed before the initialization of the cache system and, therefore, can alterate music21 objects.

There are several ways to create these hooks. For more information refer to the caching documentation page. Here, we’ll create them using the class approach.

Removing drums

Our pop-rock MIDI corpus contains one or more parts corresponding to drums or other percussion instruments. Currently, musif is not able to handle such kind of notation and would throw errors about the score key signature. For this reason, we are going to remove drum parts before starting the extraction.

Let’s create a RemoveDrums class that finds every part that corresponds to a drum part and removes it from the data dictionary, so musif can extract the score peacefully.

# musif.extract.constants contains various constants useful to access the `data` dictionary
import musif.extract.constants as C

# the following list is specific to this dataset
drums_list = ['drumset', 'tambourine', 'drum', 'concert snare drum', 'hi-hat', 'automobile brake drums']

# a hook is any object that contains a `execute(cfg, data)` method
class RemoveDrums:
    def execute(cfg, data):
        # let's get the list of parts
        parts = list(data[C.DATA_FILTERED_PARTS])
        # C.DATA_SCORE is a music21.stream.Score object
        drums = [i for i in data[C.DATA_SCORE].parts if i.partName.split(',')[0].lower() in drums_list]
        # remove all the drums from the score
        data[C.DATA_SCORE].remove(drums)
        # reset the list of parts in the dictionary
        data[C.DATA_FILTERED_PARTS] = tuple([p for p in parts if p not in drums])
        # return the data
        return data

Renaming similar parts

musif differentiates between part features based on their name. For instance, if two parts have the name Guitar, it will merge all the notes in just one part. We decided to enforce this requirement by design.

For this reason, we must create a hook that renames all parts that have the same name in our internal data dictionary, so extraction will be run with names already altered.

class RenameSimilarParts:
    def execute(cfg, data):
        parts = list(data[C.DATA_FILTERED_PARTS])
        part_names = set()
        counter = 1
        for part in data[C.DATA_SCORE].parts:
            # If the part is already in part_names set, we must change its partName, Abbreviation and Id. For example when having two guitar parts.
            if part.partName in part_names:
                part.partName = part.partName + f'({counter})'
                part.partAbbreviation = part.partAbbreviation + f'({counter})'
                part.id = part.id + int(f'{counter}')
                counter += 1
                part_names.add(part.partName)
            else:
                part_names.add(part.partName)
        # By altering the part object of music21, the change its already reflected also in the parts list.       
        data[C.DATA_FILTERED_PARTS] = tuple(parts)
        return data

Create your own custom feature

Now, we are going to create a custom feature. Methods to create custom features are detailed in the custom features documentation page. Briefly, there are two main ways to do it:

  • As a module: like the stock musif features. If you have musif installed you can just copy one of the modules and adapt it to your taste, for instance musif.extract.features.ambitus could be a good starting point.

  • As a class: We need to create a class that contains a handler sub-class that will run our calculations. This is the method we are going to show now, as it is more suitable for small modules and for a Jupyter notebook.

First, let’s create a class. In our case, we will create a dummy feature that calculates the number of beats for which the vocal part is in silence over the total of beats of the song. Note that musif is actually already computing the opposite, i.e., the ratio of measures in which the vocal part is singing.

For the purpose of the tutorial, we also show how the file name of a music score can be used to retrieve information, e.g., metadata.

Firts, let’s import some musif utilities:

from ntpath import basename
from musif.musicxml.tempo import get_number_of_beats
import musif.extract.constants as C
from musif.extract.features.core.constants import DATA_NOTES_AND_RESTS, DATA_MEASURES
from musif.extract.features.tempo.constants import TIME_SIGNATURE
from musif.extract.features.prefix import get_part_feature

Considering a musif stock feature like ambitus, musif.extract.features is the feature package while musif.extract.features.ambitus is a feature module.

Here, since we are working inside a Jupyter notebook, we will substitute the package containing the features with a class named custom_feature_package, so that the hierarchy is feature package > feature module > handler. Inside the feature package there can be multiple features.

We will name our feature voice_silence_beats and custom_file_name. They will also be the feature module names. Each feature must contain an object called handler, which in turn must contain two methods: update_part_objects and update_score_objects.

In the case of our custom feature, custom_feature_package, voice_silence_beats, custom_file_name, and handler are all classes, but they could also be packages and modules.

Since we want that custom_file_name is computed only once for each window, we implement is inside a different feature package, namely the class custom_basic_module. However, it could be implemented inside custom_feature_package as well.

VOICE_SILENCE_BEATS = 'voice_silence_beats'

class custom_feature_package:
    class voice_silence_beats:
        """
        This feature calculates the number of beats the voice is in silence over the total of the song
        """
        class handler:
            def update_part_objects(
                score_data: dict = None,
                part_data: list = None,
                cfg: object = None,
                parts_features: list = None,
            ):
                # We will extract the number of beats that voice parts are in silence and 
                # divide it by the total number of beats. First we need to filter only voice/singer parts. 
                if part_data[C.DATA_FAMILY] == 'voice':
                    rests = [i for i in part_data[DATA_NOTES_AND_RESTS] if i.isRest]
                    rests_duration = sum([i.quarterLength for i in rests])
                    number_of_measures = score_data[DATA_MEASURES]
                    total_beats = get_number_of_beats(score_data[TIME_SIGNATURE]) * number_of_measures
                    voice_silence = rests_duration / total_beats if total_beats else 0
                    parts_features[get_part_feature(part_data[C.DATA_PART_ABBREVIATION], VOICE_SILENCE_BEATS)] = voice_silence
                else:
                    pass
            def update_score_objects(
                score_data: dict = None,
                parts_data: list = None,
                cfg: object = None,
                parts_features: list = None,
                score_features: dict = None,
            ):
                # Finally, we need to add the data to score_features, 
                # the dictionary where all final info is stored. 
                # Otherwise it will not be reflected in the final dataframe.
                features = {}
                for i, part_data in enumerate(parts_data):
                    if part_data[C.DATA_FAMILY] == 'voice':
                        part_abbreviation = part_data[C.DATA_PART_ABBREVIATION]
                        feature_name = get_part_feature(part_abbreviation, VOICE_SILENCE_BEATS)
                        features[feature_name]=parts_features[i][get_part_feature(part_abbreviation, VOICE_SILENCE_BEATS)]
                score_features.update(features)

class custom_basic_module:
    class custom_file_name:
        "Set up artist and title from the file name"
        class handler:
            def update_score_objects(
                score_data,
                parts_data,
                cfg,
                parts_features,
                score_features
            ):
                chunks = basename(score_data[C.DATA_FILE]).split('.')
                artist = chunks[0]
                title = chunks[-1]

                score_features.update(
                    {
                        'Artist': artist,
                        'Title': title,
                    }
                )

            def update_part_objects(
                score_data, part_data, cfg, part_features
            ):
                pass

Configuration

Let’s create a configuration for our experiment. For more information, see the Getting started tutorial.

Here, we will point your attention towards the following options:

  1. features: We added voice_silence_beats, our custom feature.

  2. feature_module_addresses and basic_modules_addresses: We added custom_feature_package and custom_basic_module.

  3. precache_hooks: We added our hooks RemoveDrums and RenameSimilarParts, but it can also contain strings representing modules or packages that can be imported by Python, similarly to feature_modules_addresses.

  4. window_size and overlap: We are extracting features at the window level, where each window is 8 measure long and the hop-size is 4 measures.

from musif.config import ExtractConfiguration
from pathlib import Path
config = ExtractConfiguration(
    None,
    data_dir = "data_poprock",
    # We can pass our custom features as python variables or as a strings that is importable by __import__
    feature_modules_addresses=["musif.extract.features", custom_feature_package], 
    basic_modules_addresses=["musif.extract.basic_modules", custom_basic_module],
    # Now, we list the features that we want to extract 
    basic_modules = ['scoring', "custom_file_name"],
    features = ["core", "ambitus", "melody", "tempo"
                "density", "texture", "lyrics", "scale", 
                "key", "dynamics", "rhythm", 
                "voice_silence_beats"],
    # As for features, hooks can be expressed as variables or as strings that can imported by __import__
    precache_hooks = [RemoveDrums, RenameSimilarParts], 
    # window size is 8 measures with overlap of 50%
    window_size = 8, 
    overlap = 4,
    # Important! This parameter allows to extract all files skipping those that
    # fail during extraction. If you encounter any eerors please report them andopen an issue on Github and we w'll take
    # a look as soon as possible!
    ignore_errors=True,
    # cache_dir='__tutorial_cache', #If cache use is desired
    parallel = -1, #Set number of cores. 1 for no parallel, -1 for all cores
    output_dir = "output_dir"
)

Feature extraction

Now that we have our configuration, we pass it to the FeaturesExtractor constructor:

from musif.extract.extract import FeaturesExtractor

extractor = FeaturesExtractor(config)
import musif.musescore.constants as musescore_c
musescore_c.MUSESCORE_FILE_EXTENSION = '.mid'
df = extractor.extract()

Since we extracted features at the window level, the dataset has now a multi-index: The first index indicates the music score, while the second index indicates the window in the score:

# To show the DataFrame in a Jupyter notebook, just use it as last instruction of the cell, like this:
df.shape
(566, 61250)

Post-processing

We will now postprocess the data, see the Getting started tutorial for more info.

try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

# Check if in colab
if IN_COLAB:
    print('in colab')
    import urllib.request
    # Replace with the raw URL of the YAML file on GitHub
    github_url = "https://raw.githubusercontent.com/DIDONEproject/musif/main/config_postprocess_example.yml"  
    # Replace with the desired local file name
    local_file_name = "config_postprocess_example.yml"  
    urllib.request.urlretrieve(github_url, local_file_name)
    print(f"File downloaded to: {local_file_name}")
else:
    local_file_name = "../../config_postprocess_example.yml"  
from musif.process.processor import DataProcessor

processed_df = DataProcessor(df,local_file_name).process().data

# with `.shape` you can see the number of rows and columns of the DataFrame.
processed_df.shape

Post-processing data...
(566, 162)

As you see, the columns are now less than before!

Let’s try to remove NaN (elements that should be number but that cannot be expressed, e.g., the division $0/0$). To do this, we’ll use the dropna method of the pandas.DataFrame object.

processed_df.dropna(axis=1, inplace=True)
processed_df.shape
(566, 162)

Visualization

We will now visualize the 610 windows contained in the dataset using Linear Discriminant Analysis (LDA). Note that LDA is a supervised method, so a leave-one-out cross-validation would be needed to properly monitor overfitting. However, for the purpose of this tutorial, we will limit to showcase an overfitted model.

For this visualization, you will need to install scikit-learn and seaborn. After installing them, remember to restart the kernel of the notebook!

%pip install scikit-learn seaborn
Hide code cell output
Requirement already satisfied: scikit-learn in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (1.5.2)
Requirement already satisfied: seaborn in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (0.13.2)
Requirement already satisfied: numpy>=1.19.5 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from scikit-learn) (2.1.2)
Requirement already satisfied: scipy>=1.6.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from scikit-learn) (1.14.1)
Requirement already satisfied: joblib>=1.2.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from scikit-learn) (3.5.0)
Requirement already satisfied: pandas>=1.2 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from seaborn) (2.2.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from seaborn) (3.9.2)
Requirement already satisfied: contourpy>=1.0.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (24.1)
Requirement already satisfied: pillow>=8 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.1.4)
Requirement already satisfied: python-dateutil>=2.7 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pandas>=1.2->seaborn) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from pandas>=1.2->seaborn) (2024.2)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/envs/musif_tutorials/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
from sklearn.preprocessing import StandardScaler
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import VarianceThreshold
from sklearn.decomposition import PCA
import seaborn as sns

target = processed_df['Artist']
data = processed_df.drop(columns=['Id', 'WindowId']).select_dtypes([int, float])
data = make_pipeline(StandardScaler(), 
                     VarianceThreshold(0.25),
                     LDA(n_components=2, solver='svd')).fit_transform(data, target)
sns.scatterplot(x=data[:, 0], y=data[:, 1], hue=target)
<Axes: >
_images/504257acb86c3be616e4fb9f9f694b3fe98027b44208617b90857d9dbb066e59.png

Beatles, Rolling Stone, and Eric Clapton are rather similar, apparently. Taking some simple statistics of the windows, such as the mean, doesn’t help:

target = processed_df['Artist'].groupby(level=0).nth(0) # 'Artist' of each window number 0
data = processed_df.drop(columns=['Id', 'WindowId']).select_dtypes([int, float])
data = data.groupby(level=0).mean() # taking the average of the features across the windows here!
data = make_pipeline(StandardScaler(),
                     VarianceThreshold(0.25),
                     LDA(n_components=2, solver='svd')).fit_transform(data, target)
sns.scatterplot(x=data[:, 0], y=data[:, 1], hue=target)
<Axes: >
_images/504257acb86c3be616e4fb9f9f694b3fe98027b44208617b90857d9dbb066e59.png

Let’s see if the development of the features across time can reveal something.

We will repeat the same process for creating the embedding, but we will insert a signal processing approach consisting in the computation of the Discrete Fourier Transformation (DFT). In this way, the LDA computes an embedding taking into account the development of the features across time.

Note that the size of the FFT may be increased when a larger dataset is used.

from scipy.fft import fft
from sklearn.preprocessing import FunctionTransformer
import pandas as pd
import numpy as np

# data selection
target = processed_df.set_index('Id').groupby(level=0).nth(0)['Artist']  # 'Artist' of each window number 0
score_groups = processed_df['Id'].to_numpy().astype(str) # groups of samples used for the DCT
data = processed_df.select_dtypes([float, int])          # features

# function to compute the FFT of a score
def compute_fft(x):
    groups = pd.DataFrame(x).groupby(score_groups)
    ret = groups.apply(lambda df: fft(df, n=8, axis=0).mean(axis=0))
    return np.absolute(np.stack(ret.to_list()))

# training model
model = make_pipeline(StandardScaler(), 
                      VarianceThreshold(0.2),
                      FunctionTransformer(compute_fft),
                      LDA(n_components=2, solver='svd'))
data = model.fit_transform(data, target)
sns.scatterplot(x=data[:, 0], y=data[:, 1], hue=target, legend=True)
<Axes: >
_images/3e0facc3781029a8f7289af172d0eefc170d474cb8e5288c482cd6c5453a77e5.png

As we can see, development of the features across time sets a clear difference between differnet scores and composers. We can extract some conclusions out of this, like the composer stile along the piece has a significant role.

This will be it! It’s now your turn to use create your own features, pre-hooks and take the most potential out of musif to get interesing musicological results.

In case in any doubt or missfunction, please do not hesitate to open an issue or to reach us and we’ll do oiur best to solve it.

Enjoy!