musif.extract package

Submodules

musif.extract.common module

musif.extract.constants module

musif.extract.constants.GLOBAL_TIME_SIGNATURE = 'global_ts'

The name used for the column indicating the global time signature

musif.extract.constants.ID = 'Id'

The name used for the column of the music score’s id

musif.extract.constants.MUSIC21_FILE_EXTENSIONS = ['.xml', '.mxl', '.musicxml', '.mid', '.mei']

Extensions used by music21. Defaults to [“.xml”, “.mxl”, “.musicxml”, “.mid”, “.mei”]

musif.extract.constants.PLAYTHROUGH = 'playthrough'

Constant for playthrough (count fo measures) added to ms3 dataframe

musif.extract.constants.REQUIRE_MSCORE = ['harmony', 'scale_relative']

Names of modules taht require harmonic analysis in a .mscx file

musif.extract.constants.VOICES_LIST = ['sop', 'ten', 'alt', 'bar', 'bbar', 'bass']

List of prefixes of singers’s names that might appear in the scores

musif.extract.constants.WINDOW_ID = 'WindowId'

The name used for the column of the window’s id

musif.extract.constants.WINDOW_RANGE = 'WindowRange'

The name used for the column indicating the start and end of a window

musif.extract.extract module

class musif.extract.extract.FeaturesExtractor(*args, **kwargs)[source]

Bases: object

Extract features for a score or a list of scores, according to the parameters established in the configuration files. It extracts musical features using music21 and ms3 library, based on the configuration and stores them in a dictionary (score features) that at the end will be returned as a DataFrame by the extract method.

During the parsing, unpitched objects, (e.g. objects referred to percussion instruments) may be removed (see the option remove_unpitched_objects in the configuration).

__init__(*args, **kwargs)[source]
Parameters:
  • *args (Could be a path to a .yml file, an AbstractExtractConfiguration object or a dictionary. Length zero or one.)

  • **kwargs (Get keywords to construct ExtractConfiguration.)

  • limit_files (List[str] = None) – List of file names relative to obj. Only these files are taken. Incompatible with exclude_files

  • exclude_files (List[str] = None) – List of file names relative to obj. None of these files are taken. Incompatible with limit_files

Raises:
  • TypeError

    • If the type is not the expected (str, dict or ExtractConfiguration).

  • ValueError

    • If there is too many arguments(args)

  • FileNotFoundError

    • If any of the files/directories path inside the expected configuration doesn’t exit.

extract() pandas.DataFrame[source]

Extracts features given in the configuration data getting a file, directory or several file paths, returning a DataFrame containing musical features.

Return type:

Score dataframe with the extracted features of given scores. For one score only, a DataFrem is returned with one row only.

Raises:
  • ParseFileError – If the musicxml file can’t be parsed for any reason.

  • KeyError – If features aren’t loaded in corrected order or dependencies

musif.extract.extract.find_files(extensions: str, base_dir: str | List[str | PurePath], limit_files: List[str] | None = None, exclude_files: List[str] | None = None) List[PurePath][source]

Extracts the paths to files given an extension

Given a directory path, return a list of paths of files found, in alphabetic order. It searches recursively inside base_dir. If base_dir is a fileor a list of paths or directories with extension, it is returned in a list. If given neither a string nor list of strings raise a TypeError and if the file doesn’t exists returns a ValueError.

Parameters:
  • extension (str or Iterable[str]) – A list of strings representing the extensions that will be looked for

  • base_dir (Union[str, Iterable[str]]) – A path or directory

  • limit_files (Iterable[str] = None) – List of file names relative to base_dir. Only these files are taken. Incompatible with exclude_files

  • exclude_files (Iterable[str] = None) – List of file names relative to base_dir. None of these files are taken. Incompatible with limit_files

Returns:

resp – The list of musicxml files found in the provided arguments This list will be returned in alphabetical order

Return type:

List[PurePath]

Raises:
  • TypeError – If the type is not the expected (str or List[str]).

  • ValueError – If the provided string is neither a directory nor a file path

musif.extract.extract.parse_filename(file_path: str, split_keywords: List[str], expand_repeats: bool = False, export_dfs_to: str | PurePath | None = None, remove_unpitched_objects: bool = True) music21.stream.Score[source]

This function parses a musicxml file and returns a music21 Score object. If the file has already been parsed, it will be loaded from cache instead of processing it again. Split a part in different parts if the instrument family is in keywords argument and expands repeats if indicated.

Parameters:
  • file_path (str)

  • path. (A path to a music xml)

  • split_keywords (List[str]) – A lists of keywords based on music21 instrument sound names to split in different parts.

  • expand_repeats (bool) – Determines whether to expand or not the repetitions. Default value is False.

  • export_dfs_to (Union[str, PurePath]) – Path to a directory where dataframes containing the score data are exported. If None, no score is exported. Default value is None.

Returns:

resp – The score saved in cache or the new score parsed with the necessary parts split.

Return type:

Score

Raises:

ParseFileError – If the xml file can’t be parsed for any reason.

musif.extract.extract.parse_musescore_file(file_path: str, expand_repeats: bool = False) pandas.DataFrame[source]

This function parses a musescore file and returns a pandas dataframe. If the file has already been parsed, it will be loaded from cache instead of processing it again.

Parameters:
  • file_path (str) – A path to a music mscx path.

  • expand_repeats (bool) – Determines whether to expand or not the repetitions. Default value is False.

Returns:

resp – The score saved in cache or the new score parsed in the form of a dataframe.

Return type:

pd.DataFrame

Raises:

ParseFileError – If the musescore file can’t be parsed for any reason.

musif.extract.utils module

musif.extract.utils.expand_score_repetitions(score, repeat_elements: list)[source]

Given a music21 Score object and a list containing repetition elements, expands the score object and places all measures in their correspondent cronological order :param score: Score object parsed by music21 :type score: music21 Score :param expand_repeats: List containing all repetition elements :type expand_repeats: list

Returns:

final_score – Score object with expanded repetitions

Return type:

music21 Score

musif.extract.utils.extract_global_time_signature(score_data)[source]

Extracts a global time signature for the score for cases where is not possibel to get measure-by-measure TS

musif.extract.utils.process_musescore_file(file_path: str, expand_repeats: bool = False) pandas.DataFrame[source]

Given a mscx file name, parses the file using ms3 library and returns a dataframe containing all harmonic information. Adds Playthrough column that contains number of every measure in the cronological order :param file_path: Path to mscx file :type file_path: str :param expand_repeats: Directory path to musescore file :type expand_repeats: bool

Returns:

harmonic_analysis – Dataframe containing harmonic information

Return type:

str