Skip to content

Commit

Permalink
add tutorial files
Browse files Browse the repository at this point in the history
  • Loading branch information
RichardFreedman committed Jun 27, 2023
1 parent 8399efd commit bd637a1
Show file tree
Hide file tree
Showing 45 changed files with 2,032 additions and 0 deletions.
Binary file modified .DS_Store
Binary file not shown.
218 changes: 218 additions & 0 deletions tutorial/01_Introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# CRIM Intervals: A Python Library for Analysis

Based on Python, Pandas, and Mike Cuthbert's [music21](http://web.mit.edu/music21/), CRIM Intervals is a pattern finding engine for musical scores, with an emphasis on the kinds of melodic and harmonic patterns found in Renaissance polyphony. It has been developed as a primary data analysis tool for CRIM, but can be applied and adapted to a wide range of styles. Results are reported in Pandas dataframes (and thus exportable in a variety of standard formats for further analysis), and also via several visualizations methods.

## File Types Compatible with CRIM Intervals

Since CRIM Intervals is based on music21, all the file types read by music21 will work with CRIM Intervals. Be sure to include the appropriate file extension as part of each file name: '.mei', '.mid', '.midi', '.abc', '.xml', '.musicxml'. Note that the `lyrics` function is untested with midi and abc files.

## Importing a Piece: `importScore()`

CRIM Intervals begins by importing one or more MEI, MusicXML, or MIDI Files. This can be done directly, as shown:

piece = importScore('https://crimproject.org/mei/CRIM_Model_0008.mei')

The field within the `importScore()` function can be either a url or local file path, and must be surrounded by quotes as shown.

Note that the **local file path must also be preceded by a `/` [forward slash]**, for example:

piece = importScore('/path/to/mei/file2.mei')

### Check Metadata for Imported Piece

To confirm successful import, view the metadata: `print(piece.metadata)`. Alternatively, add the parameter `verbose = True` to the `importScore()` function. CRIM Intervals will automatically provide information to the user as it runs about whether or not it was able to successfully import the given piece. For example:

piece = importScore('https://crimproject.org/mei/CRIM_Model_0008.mei', verbose = True)

Note that import errors will be reported even if `verbose = False`


## Importing Multiple Pieces at Once: `CorpusBase()`

If you pass `importScore()` a **path to a directory** it will import all the files in that directory, for example:

pieces = importScore('/Users/rfreedma/Downloads/MEI_Folder')

Adding the parameter `recursive = True` will in turn import all of the pieces in the main directory and any subdirectories, for example:

pieces = importScore('/Users/rfreedma/Downloads/MEI_Folder', recursive = True)

And as with a single piece, the parameter `verbose = True` will the status of each attempted import.

The CRIM Interval library also allows the user to import multiple pieces at once through the `CorpusBase()` function. This function operates similarly to the `importPiece()` function, but accepts a **list of piece urls or paths** instead of a single url or path. The individual items in the Python list must be:

* surrounded by quotation marks (remember the `/` at the start of any time coming from a local path!)
* separated by commas (but no comma after the last item in the list)

And then the entire list must be surrounded in square brackets.

The complete import statement will look like this:

corpus = CorpusBase(['url_to_mei_file1.mei', 'url_to_mei_file2.mei', '/path/to/mei/file1.mei', '/path/to/mei/file2.mei'])

Note that there is a special format required when a given CRIM Intervals function (such as melodic(), or harmonic() is applied to a **corpus** object. See below, and also the **batch method** documentation for each individual function.

## Using CRIM Intervals Functions

Once one or more pieces have been imported, they can be examined and analyzed through a wide variety of different functions that find the notes, durations, melodic intervals, harmonic intervals, and so on. Most of these functions follows one of a few common formats:

*like this:*

piece.func()

*or:*

piece.func(some_parameter)

*or:*

piece.func(param_1 = True, param_2 = "d")

Except in the case of a **corpus** of pieces (see below), the parentheses that follow the function are *always required*. Most functions have parameters that can be adjusted (for instance, the choice of diatonic or chromatic intervals). It is always possible simply to accept the default settings (no need to pass a parameter). Or the parameters can be adjusted.

The specific details of how to format the function varies from case to case. The choices and common settings for each function are detailed in the pages of this guide. It is also possible to read the built-in documentation as explained below under **Help and Documentation**.

## Batch Methods for a Corpus of Pieces

In the case of a **corpus** of pieces, it is also necessary to use the `batch` function, which applies one of the main functions (such as `notes`, `melodic`, `harmonic`, etc.) to each of the pieces in the corpus in turn, and then assembles the results into a single dataframe. First create the corpus:

corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Mass_0014_3.mei',
'https://crimproject.org/mei/CRIM_Model_0009.mei'])

Then specify the function to be used (NB: **do not include the parentheses after the function!**):

func = ImportedPiece.notes

And finally run the function with each piece and concatenate them as a single dataframe:

list_of_dfs = corpus.batch(func)

Normally parameters are passed to a function within the parentheses (as noted above). But with the batch methods for a corpus the parameters are instead passed as **kwargs** (that is, as a *dictionary of keyword arguments*, with each parameter and its corresponding value formatted as `{key: value}` pair).

For example see this code for batch processing a corpus with the `melodic` function using some keywords:


#define the corpus
corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Mass_0014_3.mei',
'https://crimproject.org/mei/CRIM_Model_0009.mei'])
#specify the function
func = ImportedPiece.melodic # <- NB there are no parentheses here
#provide the kwargs
kwargs = {'kind': 'c', 'directed': False}
#build a list of dataframes, one for each piece in the corpus
list_of_dfs = corpus.batch(func, kwargs)
#concatenate the list to a single dataframe
output = pd.concat(list_of_dfs)

### Chaining Together Batch Methods

CRIM Intervals functions often need to be chained together, as explained in the individual sections for each function. The results of the first function (which is a list of dataframes) is passed to the second fucntion via the `df` parameter as a `kwarg`

#define the corpus
corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Mass_0014_3.mei',
'https://crimproject.org/mei/CRIM_Model_0009.mei'])
#first function
func1 = ImportedPiece.melodic
#first function results as list of dfs
list_of_dfs = corpus.batch(func = func1, kwargs = {'end': False}, metadata = False)
#second function
func2 = ImportedPiece.ngrams
#now the list_of_dfs from the first function is passed to the second function as the keyword argument 'df'
list_of_melodic_ngrams = corpus.batch(func = func2, kwargs = {'n': 4, 'df': list_of_dfs})

### Piece Metadata and Batch Methods: The `metadata` Parameter

The `batch` method will normally include `metadata` for each piece. But if the aim is to chain several functions together in a series of batch processes, it is probably best to request the metadata only for the final step:

#define the corpus
corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Mass_0014_3.mei',
'https://crimproject.org/mei/CRIM_Model_0009.mei'])
#first function
func1 = ImportedPiece.melodic
#first function results as list of dfs
#notice that 'metadata = False' for this step
list_of_dfs = corpus.batch(func = func1, kwargs = {'end': False}, metadata = False)
#second function
func2 = ImportedPiece.ngrams
#now the list_of_dfs from the first function is passed to the second function as the keyword argument 'df'
#here metadata remains as True (which is the default, and so we can omit the parameter)
list_of_melodic_ngrams = corpus.batch(func = func2, kwargs = {'n': 4, 'df': list_of_dfs})

### Tracking Batch Processing Errors: The `verbose` Parameter

As in the case of single piece imports, when used as part of a `batch` function, the `verbose = True` provides confirmation that each piece has been successfully imported. This can be useful to pinpoint a piece that is triggering a bug.

corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Mass_0014_3.mei',
'https://crimproject.org/mei/CRIM_Model_0009.mei'])
func = ImportedPiece.notes
list_of_dfs = corpus.batch(func, verbose = True)

### Voice Part Names vs Staff Position in Batch Processing: The `number_parts` Parameter

By default, .batch will replace columns that consist of part names (like `.melodic()` results) or combinations of part names (like `.harmonic()` results) with **staff position numbers**, starting with "1" for the highest part on the staff, "2" for the second highest, etc. This is useful when combining results from pieces with parts that have different names. For example:

list_of_dfs_with_numbers_for_part_names = corpus.batch(ImportedPiece.melodic)

To keep the **original part names** in the columns, set `number_parts` parameter to False. For example:

list_of_dfs_with_original_part_names = corpus.batch(ImportedPiece.melodic, number_parts = False)


## Exporting CRIM Intervals Results

CRIM Intervals is a Python library. But it also makes extensive use of PANDAS (Python for Data Analysis). The most common output for the CRIM Intervals functions is thus a **DataFrame**. These can be viewed in output window of VS-Code (or similar IDE where CRIM Intervals is running), or can be seen in Juypyter Notebooks. There are nevertheless two useful ways to download results for later use:

### Export to CSV:

If you are running the Jupyter Hub version of this code, then there should be a folder provided called 'saved_csv'. This is where we will be exporting files, from which you can then download them to your computer.

If you wish to export a CSV a piece's that has been generated as a DataFrame, you can do so with the following command line:

notebook_data_frame_name.to_csv('saved_csv/your_file_title.csv')

'notebook_data_frame_name' should be replaced with the name of your DataFrame. For example, if you had ran the following lines:

piece = importScore('https://crimproject.org/mei/CRIM_Model_0008.mei')
mel = piece.melodic()

You could then save this model's melodic interval data to a CSV file with the file name 'CRIM_Model_0008.csv' by running the following:

mel.to_csv('saved_csv/CRIM_Model_0008.csv')

### Export to Excel:

Alternatively, a DataFrame can be saved as an Excel file with the following command lines in order, once again replacing 'file_name.xlsx' with your desired file name, replacing 'Sheet1' with your desired sheet name **(in quotes)**, and replacing 'frame_name' in the second line with the name of your DataFrame **(without quotes)**, which was be 'mel' in the last example:

writer = pd.ExcelWriter('saved_csv/file_name.xlsx', engine = 'xlsxwriter')
frame_name.to_excel(writer, sheet_name = 'Sheet1')
writer.save()

Substituting the information from the first example, we could write that same DataFrame to an Excel sheet with the following commands:

writer = pd.ExcelWriter('saved_csv/CRIM_Model_0008.xlsx', engine = 'xlsxwriter')
mel.to_excel(writer, sheet_name = 'CRIM Model 0008')
writer.save()

## Help and Documentation

The documentation associated with each function can be read with a line of the following sample format:

print(piece.notes.__doc__)

This line would print out the documentation (`.__doc__`) associated with the function `notes()`, a function applicable to the object `piece`. Note that to print the documentation for a function, some object able to utilize that function must be used in the command line as shown above.

-----

## Sections in this guide

* [01_Introduction](01_Introduction.md)
* [02_NotesAndRests](02_NotesAndRests.md)
* [03_MelodicIntervals](03_MelodicIntervals.md)
* [04_HarmonicIntervals](04_HarmonicIntervals.md)
* [05_Ngrams](05_Ngrams.md)
* [06_Durations](06_Durations.md)
* [07_Lyrics](07_Lyrics.md)
* [08_Time-Signatures](08_TimeSignatures.md)
* [09_DetailIndex](09_DetailIndex.md)
* [10_Cadences](10_Cadences.md)
* [11_Pandas](11_Pandas.md)
106 changes: 106 additions & 0 deletions tutorial/02_NotesAndRests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Finding the Notes and Rests in Pieces

## The `notes()` Function

After importing one or more pieces, the `notes()` function can be run to create a table of all of a piece's notes and rests, in order:

piece.notes()

These notes will be printed in a table, where each new row contains some new note in any voice. The left-most column is an index representing the offset of the change in note, where 0 represents the first note of the piece, and 1 unit of offset represents a quarter note. Note that this index will not necessarily be regularly spaced. Each column of the `notes()` table represents a different voice of the pieces, as indicated by the headings of these columns. By default, printing `piece.notes()` will print the first and last five rows of the table. That is, the first and last 5 points in the piece at which any voice changes in note. To control how many rows are printed:

To print only the first 20 rows of the table:

piece.notes().head(20)

To print only the last 20 rows of the table:

piece.notes().tail(20)

While in its simplest form, the `notes()` function simply produces a DataFrame of each note or group of rests, it can be modified through its parameters:

* `combineUnisons`, which controls how consecutive pitch repetitions are treated; default treats unisons as separate notes
* `combineRests`, which controls how consecutive rests are treated; default sums consecutive rests into a single extended rest

## `notes()` Parameters

### Dealing with Consecutive Pitch Repetition: The `combineUnisons` Parameter:

A unison is when a new note is sounded, but the pitch remains the same (e.g. a C5 half note followed by a C5 quarter note). the `notes()` function contains a parameter called `combineUnisons`, which defaults to `False`. When `combineUnisons` is set to `True`, any unisons will be treated as a continuation of the previous note, effectively adding a tie between those notes. As a result, the table output of the `notes()` function will not printing anything at the offset of the given note's repetiton. The `combineUnisons` parameter may be run as follows:

piece.notes(combineUnisons = True)
#Or, the default value:
piece.notes(combineUnisons = False)

![Alt text](images/notes_2.png)

The `head()` function can be combined with `notes(combineUnisons = True)` as shown in the following examples:

whole_piece = piece.notes(combineUnisons = True)
whole_piece.head(20)

Or, more directly:

piece.notes(combineUnisons = True).head(20)

The first example demonstrates declaring a variable, and then performing functions on it, while the second demonstrates performing multiple functions simultaneously. Beyond applications of the CRIM Intervals library, the first option is often more efficient when coding in general. This is because it avoids unnecessary repetitions of the same statement, saving system memory as well as time.

### Dealing with Consecutive Rests: The `combineRests` Parameter:

The `combineRests` parameter operates similarly to the `combineUnisons` parameter, where any rests in the piece that does not preceed the first non-rest note are combined with neighboring rests (e.g. three whole rest measures in a row will be treated as a single 3 measure-long rest). By default, the `combineRests` parameter of the `notes()` function is set to `True`. Note that this is different from the default state of the `combineUnisons` parameter. This can be controlled similarly to the `combineUnison` parameter by the following code:

piece.notes(combineRests = False)
#Or, the default value:
piece.notes(combineRests = True)

![Alt text](images/_notes_3.png)

Or, once again, incorporated with the `head()` function;

piece_separate_rests = piece.notes(combineRests = False)
piece_separate_rests.head(20)

Additionally, the `combineRests()` and `combineUnisons()` parameters may be changed simultaneously as follows:

piece.notes(combineRests = False, combineUnisons = True).head(20)

### Dealing with "NaN" Outputs: The `fillna()` Function

It is important to realize that in code, '0' is not the same as Nothing. Where the former is an integer value, the latter is the lack of any value. These 'absent values' are often called 'Nulls' or, in Python and Pandas, 'NaN' values. Inevitably, appling the CRIM Intervals functions to a piece will result in some 'NaN' values. This is because the `notes()` function, for example, indicates any point where a pitch or rest *begins*, but does not indicate when it is *held*. As a result of this, any offset of the piece where a note is held in any voice will produce a 'NaN' value in that frame. To decrease the visual clutter of the table, these "NaN" outputs can be replaced with the `fillna()` function, which is used as follows; The `fillna()` function accepts a parameter of any data to replace the "NaN" elements of the `notes()` output table. This field may contain empty quotes, as shown above, or another symbol such as '-'. Any of the following are valid ways to replace "NaN" values with a more discrete symbol (though `fillna(0)` is somewhat more specialized):

piece.notes().fillna('')
piece.notes().fillna('-')
piece.notes().fillna(0)

![Alt text](images/notes_4.png)

Note that the parameter of the `fillna()` function is not necessarily a text string, as any valid data could be provided, such as an integer value in place of the text field. Later, we will examples of cases where replacing 'NaN' values with 0 rather than a text string is optimal, but in many cases, it is simply beneficial to pass some discrete symbol to the `fillna()` function for the benefit of a human reader.

Once again, this function can be modified by adding a `.head()` function to the line:

piece.notes().fillna('-').head(20)

## More About Measures, Beats, and Offsets: The `detailIndex()` Function

By default, the `notes()` function returns a DataFrame which indexes by offsets: That is, events in the piece are counted by which overall beat in the piece they fall on. This is useful for measuring time distances between events, but not for a human reader to refer back to the musical score itself. It is easy to include measure and beat indexes by passing the result of the function to the `detailIndex()` function as shown:

notes_rests = piece.notes()
notes_rest_DI = piece.detailIndex(notes_rests)

For more information about the `detailIndex` function, consult [the function's documentation](09_DetailIndex.md).

-----

## Sections in this Guide

* [01_Introduction](01_Introduction.md)
* [02_NotesAndRests](02_NotesAndRests.md)
* [03_MelodicIntervals](03_MelodicIntervals.md)
* [04_HarmonicIntervals](04_HarmonicIntervals.md)
* [05_Ngrams](05_Ngrams.md)
* [06_Durations](06_Durations.md)
* [07_Lyrics](07_Lyrics_Homorhythm.md)
* [08_TimeSignatures_BeatStrength](08_TimeSignatures_BeatStrength.md)
* [09_DetailIndex](09_DetailIndex.md)
* [10_Cadences](10_Cadences.md)
* [11_Pandas](11_Pandas.md)
* [12_Modules](12_Modules.md)
Loading

0 comments on commit bd637a1

Please sign in to comment.