A database linking piano and orchestral MIDI scores with application to automatic projective orchestration

A D A T AB ASE LINKING PIANO AND ORCHESTRAL MIDI SCORES WITH APPLICA TION T O A UT OMA TIC PR OJECTIVE ORCHESTRA TION L ´ eopold Crestel 1 Philippe Esling 1 Lena Heng 2 Stephen McAdams 2 1 Music Representations, IRCAM, Paris, France 2 Schulich School of Music, McGill Uni versity , Montr ´ eal, Canada leopold.crestel@ircam.fr ABSTRA CT This article introduces the Projecti ve Orchestral Database ( POD ), a collection of MIDI scores composed of pairs linking piano scores to their corresponding orchestrations. T o the best of our knowledge, this is the ﬁrst database of its kind, which performs piano or orchestral prediction, b ut more importantly which tries to learn the correlations be- tween piano and orchestral scores. Hence, we also intro- duce the projectiv e orchestration task, which consists in learning how to perform the automatic orchestration of a piano score. W e show how this task can be addressed using learning methods and also provide methodological guide- lines in order to properly use this database. 1. INTR ODUCTION Orchestration is the subtle art of writing musical pieces for the orchestra by combining the properties of various instru- ments in order to achie ve a particular musical idea [11, 23]. Among the v ariety of writing techniques for orchestra, we deﬁne as pr ojective or chestration [8] the technique which consists in ﬁrst writing a piano score and then orchestrating it (akin to a projection operation, as depicted in Figure 1). This technique has been used by classic composers for cen- turies. One such example is the orchestration by Maurice Rav el of Pictures at an Exhibition , a piano work written by Modest Mussorgsk y . This paper introduces the ﬁrst dataset of musical scores dedicated to projective orchestrations. It contains pairs of piano pieces associated with their orches- tration written by famous composers. Hence, the purpose of this database is to offer a solid knowledge for studying the correlations in volved in the transformation from a pi- ano to an orchestral score. The remainder of this paper is or ganized as follows. First, the moti v ations for a scientiﬁc in vestigation of or - chestration are exposed (section 2). By re viewing the previous attempts, we highlight the speciﬁc need for a c  L ´ eopold Crestel, Philippe Esling, Lena Heng, Stephen McAdams. Licensed under a Creative Commons Attrib ution 4.0 Inter- national License (CC BY 4.0). Attribution: L ´ eopold Crestel, Philippe Esling, Lena Heng, Stephen McAdams. “A database linking piano and orchestral MIDI scores with application to automatic projective orches- tration”, 18th International Society for Music Information Retrie val Con- ference, Suzhou, China, 2017. Piano score Or c hestr a score Or c hestr ation Figure 1 . Pr ojective orc hestration of the ﬁrst three bars of Modest Mussorgsky’ s piano piece Pictur es at an Exhi- bition by Maurice Rav el. Piano notes are assigned to one or se veral instruments, possibly with doubling or harmonic enhancement. symbolic database of piano and corresponding orchestral scores. In an attempt to ﬁll this gap, we b uilt the Pr ojective Or chestral Database ( POD ) and detail its structure in sec- tion 3. In section 4, the automatic projective orchestration task is proposed as an ev aluation framework for automatic orchestration systems. W e report our experiment with a set of learning-based models deriv ed from the Restricted Boltzmann Machine [26] and introduce their performance in the previously deﬁned ev aluation framew ork. Finally , in section 5 we provide methodological guidelines and con- clusions. 2. A SCIENTIFIC INVESTIGA TION OF ORCHESTRA TION Over the past centuries, se veral treatises hav e been written by renowned composers in an attempt to decipher some guiding rules in orchestration [11, 21, 23]. Even though they present a remarkable set of examples, none of them builds a systemic set of rules tow ards a comprehensive the- ory of orchestration. The reason behind this lack lies in the tremendous complexity that emerges from orchestral works. A large number of possible sounds can be created by combining the pitch and intensity ranges of each instru- ments in a symphonic orchestra. Furthermore, during a performance, the sound produced by a mixture of instru- ments is also the result of highly non-linear acoustic ef- fects. Finally , the way we perceive those sounds inv olves complex psychoacoustic phenomena [14, 16, 25]. It seems almost impossible for a human mind to grasp in its entirety the intertwined mechanisms of an orchestral rendering. Hence, we believe that a thorough scientiﬁc in vestiga- tion could help disentangle the multiple factors in volv ed in orchestral works. This could provide a ﬁrst step tow ards a greater understanding of this complex and widely un- charted discipline. Recently , major works ha ve reﬁned our understanding of the perceptual and cogniti ve mechanisms speciﬁcally in volved when listening to instrumental mix- tures [15, 22, 25]. Orchids, an advanced tool for assisting composers in the search of a particular sonic goal has been dev eloped [8]. It relies on the multi-objecti ve optimiza- tion of sev eral spectro-temporal features such as those de- scribed in [20]. Howe ver , few attempts hav e been made to tackle a sci- entiﬁc exploration of orchestration based on the study of musical scores. Y et, symbolic representations implicitly con vey high-level information about the spectral kno wl- edge composers hav e exploited for timbre manipulations. In [6] a generative system for orchestral music is intro- duced. Giv en a certain style, the system is able to generate a melodic line and its accompaniment by a full symphonic orchestra. Their approach relies on a set of templates and hand-designed rules characteristic of different styles. [19] is a case study of how to automatically transfer the Ode to joy to different styles. Unfortunately , very few details are provided about the models used, but it is interesting to observe that different models are used for different styles. Automatic arrangement, which consists in reducing an or- chestral score to a piano version that is can be played by a two-hand pianist, has been tackled in [10] and [24]. The proposed systems rely on an automatic analysis of the or- chestral score in order to split it into structuring elements. Then, each element is assigned a role which determines whether it is played or discarded in the reduction. T o the best of our knowledge, the inv erse problem of automati- cally orchestrating a piano score has nev er been tackled. Howe ver , we believe that unknown mechanisms of orches- tration could be revealed by observing how composers per- form projectiv e orchestration, which essentially consists in highlighting an existing harmonic, rhythmic and melodic structure of a piano piece through a timbral structure. Even though symbolic data are generally regarded as a more compact representation than a ra w signal in the computer music ﬁeld, the number of pitch combinations that a symphonic orchestra can produce is extremely large. Hence, the manipulation of symbolic data still remains costly from a computational point of view . Even through computer analysis, an exhausti ve in vestigation of all the possible combinations is not feasible. For that reason, the approaches found in the literature rely heavily on heuristics and hand-designed rules to limit the number of possible solutions and decrease the complexity . Howe ver , the re- cent advents in machine learning hav e brought techniques that can cope with the dimensionality in volved with sym- bolic orchestral data. Besides, even if a wide range of orchestrations exist for a giv en piano score, all of them will share strong relations with the original piano score. Therefore, we make the assumption that projecti ve orches- tration might be a relativ ely simple and well-structured transformation lying in a complex high-dimensional space. Neural networks hav e precisely demonstrated a spectac- ular ability for extracting a structured lower-dimensional manifold from a high-dimensional entangled representa- tion [13]. Hence, we believ e that statistical tools are now powerful enough to lead a scientiﬁc in vestigation of pro- jectiv e orchestration based on symbolic data. These statistical methods require an extensi ve amount of data, but there is no symbolic database dedicated to or- chestration. This dataset is a ﬁrst attempt to ﬁll this gap by building a freely accessible symbolic database of piano scores and corresponding orchestrations. 3. D A T ASET 3.1 Structure of the Database The database can be found on the companion website 1 of this article, along with statistics and Python code for reproducibility . 3.1.1 Or ganization The Projectiv e Orchestral Database ( POD ) contains 392 MIDI ﬁles. Those ﬁles are grouped in pairs containing a piano score and its orchestral version. Each pair is stored in a folder indexed by a number . The ﬁles hav e been col- lected from sev eral free-access databases [1] or created by professional orchestration teachers. 3.1.2 Instrumentation As the ﬁles gathered in the database hav e various origins, different instrument names were found under a variety of aliases and abbreviations. Hence, we provide a comma- separated value ( CSV ) ﬁle associated with each MIDI ﬁle in order to normalize the corresponding instrumentations. In these ﬁles, the track names of the MIDI ﬁles are linked to a normalized instrument name. 3.1.3 Metadata For each folder , a CSV ﬁle with the name of the folder contains the relative path from the database root directory , the composer name and the piece name for the orches- tral and piano works. A list of the composers present in the database can be found in table 1. It is important to note the imbalanced representativ eness of composers in the database. It can be problematic in the learning context we in vestigate, because a kind of stylistic consistency is a pri- ori necessary in order to extract a coherent set of rules. Picking a subset of the database would be one solution, but another possibility would be to add to the database this stylistic information and use it in a learning system. 1 https://qsdfo.github.io/LOP/database Composer Number of piano ﬁles Percentage piano frames Number of orchestra ﬁles Percentage orchestra frames Arcadelt. Jacob 1 0.07 Arresti. Floriano 3 0.57 Bach. Anna Magdalena 3 0.43 Bach. Johann Sebastian 9 4.57 4 0.81 Banchieri. Adriano 1 0.32 Beethoven. Ludwig V an 1 0.60 38 42.28 Berlioz. Hector 1 0.14 Brahms. Johannes 3 0.28 Buxtehude. Dietrich 1 0.21 Byrd. William 1 0.13 Charpentier . Marc-Antoine 2 0.38 Chopin. Frederic 2 0.44 Clarke. Jeremiah 1 0.23 Debussy . Claude 1 0.59 6 0.90 Dvorak. Anton 6 2.42 Erlebach. Philipp Heinrich 1 0.10 Faure. Gabriel 1 0.60 Fischer . Johann Caspar Ferdinand 1 0.10 Gluck. Christoph Willibald 1 1.61 Grieg. Edvard 1 2.10 Guerrero. Francisco 1 0.12 Handel. George Frideric 4 1.00 1 0.75 Haydn. Joseph 6 1.01 Kempff. Wilhelm 1 1.58 Leontovych. Mykola 2 0.22 Liszt. Franz 34 39.98 Mahler . Gustav 1 0.85 Mendelssohn. Felix 2 1.41 Moussorgsky . Modest 1 0.04 Mozart. W olfgang Amadeus 1 0.71 8 1.45 Okashiro. Chitose 3 1.09 Pachelbel. Johann 1 0.15 Praetorius. Michael 2 0.14 Purcell. Henry 1 0.08 Ravel. Maurice 6 6.49 8 6.69 Rondeau. Michel 2 0.25 1 0.14 Schonberg. Arnold 1 0.21 Schumann. Robert 1 0.05 Shorter . Steve 1 0.26 Smetana. Bedrich 1 0.61 Soler . Antonio 1 0.54 Strauss. Johann 1 0.04 Strauss. Richard 1 0.22 Stravinsky . Igor 4 0.94 Tchaikovsky . Piotr Ilyich 36 20.08 T elemann. Georg Philipp 2 1.04 Unknown. 107 40.18 28 7.47 V ivaldi. Antonio 4 2.94 W alther. Johann Gottfried 1 0.14 Wiber g. Steve 1 0.75 Zachow . Friedrich Wilhelm 1 0.32 2 0.23 T able 1 . This table describes the relati ve importance of the different composers present in the database. For each com- poser , the number of piano (respecti vely orchestral) scores in the database are indicated in the second (respectively fourth) column. The total number of ﬁles is 184 x 2 = 392. As the length of the ﬁles can v ary signiﬁcantly , a more signiﬁcant indicator of a composer’ s representativ eness in the database is the ratio of the number of frames from its scores ov er the total number of frames in the database. Figure 2 highlights the activ ation ratio of each pitch in the orchestration scores ( # { pitch on } # { pitch on } +# { pitch off } , where # is the cardinal of an ensemble) ov er the whole dataset. Note that this acti vation ratio does not take the duration of notes into consideration, but only their number of occurrences. The pitch range of each instrument can be observed be- neath the horizontal axis. T wo different kinds of imbalance can be observed in ﬁgure 2. First, a giv en pitch is rarely played. Second, some pitches are played more often compared with others. Class imbalance is kno wn as being problematic for ma- chine learning systems, and these two observations high- light how challenging the projective orchestration task is. Vln. (40,101) Fl. (38,101) Tba. (21,66) Bsn. (21,77) Org. (35,88) Ob. (54,94) Picc. (59,111 ) Horn (25,93) Vc. (21,85) Tbn. (25,81) Vla. (40,92) V oice (31,88) Db. (8,68) Tpt. (42,92) Clar . (35,98) Hp. (20,107) pitch Figure 2 . Acti vation ratio per pitch in the whole orches- tral score database. For one bin on the horizontal axis, the height of the bar represents the number of notes played by this instrument divided by the total number of frames in the database. This value is computed for the event-le vel aligned representations 4.2. The different instruments are cov ered by the pitch axis, and one can observe the peaks that their medium ranges form. The maximum value of the vertical axis (0.06), which is well below 1, indicates that each pitch is rarely played in the whole database. More statistics about the whole database can be found on the companion website. 3.1.4 Inte grity Both the metadata and instrumentation CSV ﬁles hav e been automatically generated but manually checked. W e fol- lowed a conservati ve approach by automatically rejecting any score with the slightest ambiguity between a track name and a possible instrument (for instance bass can refer to double-bass or voice bass ). 3.1.5 F ormats T o facilitate the research work, we provide pre-computed piano-roll representations such as the one displayed in Figure 3. In this case, all the MIDI ﬁles of piano (respec- tiv ely orchestra) w ork hav e been transformed and concate- nated into a unique two-dimensional matrix. The starting and ending time of each track is indicated in the meta- data.pkl ﬁle. These matrices can be found in Lua/T orch (.t7), Matlab (.m), Python (.npy) and raw (.csv) data for- mats. 3.1.6 Scor e Alignment T wo versions of the database are provided. The ﬁrst version contains unmodiﬁed midi ﬁles. The second version contains MIDI ﬁles automatically aligned us- ing the Needleman-W unsch [18] algorithm as detailed in Pitch & & & & ? ? ? b b b b b b b b b b b b b b 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 6 4 6 4 6 4 6 4 6 4 6 4 6 Horns 1.2. Horns 3.4. Trumpet 1 (C) Trumpets 2.3. (C) Trombones 1.2. Bass Trombone (Trb.3) Tuba œ - œ - œ - œ - œ œ - f œ - œ œ - œ - œ - œ - œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ - œ - œ - œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ - œ - œ - œ - œ - œ - œ - œ - œ - œ - f f f f f œ œ - œ œ - œ œ - œ œ n - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ - œ œ n - œ œ - œ - œ œ - œ - œ - œ - œ - œ œ - œ œ - œ œ - œ œ n - œ œ - œ œ - œ - œ - œ - œ - œ n - œ - œ - œ - œ - œ - œ n - œ - T ime Pitch T rumpets T rombones Horns T uba Original score Piano-roll representation Figure 3 . Piano-roll representation of orchestral scores. The piano-r oll pr is a matrix. A pitch p at time t played with an intensity i is represented by pr ( p, t ) = i , where 0 is a note off. This deﬁnition is extended to an orchestra by simply concatenating the piano-r olls of every instrument along the pitch dimension. Section 3.2. 3.2 A utomatic Alignment Giv en the diverse origins of the MIDI ﬁles, a piano score and its corresponding orchestration are almost never aligned temporally . These misalignments are very prob- lematic for learning or mining tasks, and in general for any processing which intends to take adv antage of the joint information provided by the piano and orchestral scores. Hence, we propose a method to automatically align two scores, and released its Python implementation on the companion website 2 . More precisely , we consider the piano-roll representations (Figure 3) where the scores are represented as a sequence of vectors. By deﬁning a dis- tance between two vectors, the problem of aligning two scores can be cast as a univ ariate sequence-alignment prob- lem. 3.2.1 Needleman-W unsch The Needleman-W unsch ( NW ) algorithm [18] is a dynamic programming technique, which ﬁnds the optimal align- ment between two symbolic sequences by allo wing the in- troduction of gaps (empty spaces) in the sequences. An application of the NW algorithm to the automatic align- ment of musical performances is introduced in [9]. As pointed out in that article, NW is the most adapted tech- nique for aligning two sequences with important structural differences lik e skipped parts, for instance. The application of the NW algorithm relies solely on the deﬁnition of a cost function, which allows the pairwise 2 https://qsdfo.github.io/LOP/code comparison of elements from the two sequences, and the cost of opening or extending a gap in one of the two se- quences. 3.2.2 Similarity Function T o measure the similarity between two chords, we propose the following process: • discard intensities by representing notes being played as one and zero otherwise. • compute the pitch-class representation of the two vectors, which ﬂattens all notes to a single octave vector (12 notes). In our case, we set the pitch-class to one if at least one note of the class is played. For instance, we set the pitch-class of C to one if there is any note with pitch C played in the piano-roll vector . This provides an extremely rough approximation of the harmony , which proved to be sufﬁcient for align- ing two scores. After this step, the dimensions of each vector is 12. • if one of the vectors is only ﬁlled with zeros, it rep- resents a silence, and the similarity is automatically set to zero (note that the score function can take neg- ativ e values). • for two pitch-class vectors A and B , we deﬁne the score as S ( A , B ) = C × P 12 i =1 δ ( A i + B i ) max ( || A + B || 1 , 1) (1) where δ is deﬁned as: δ ( x ) =    0 if x = 0 − 1 if x = 1 1 if x = 2 C is a tunable parameter and || x || 1 = P i | x i | is the L 1 norm. Based on the values recommended in [18] and our own experimentations, we set C to 10. The gap-open parameter, which deﬁnes the cost of introducing a gap in one of the two sequences, is set to 3 and the gap-extend parameter, which deﬁnes the cost of e xtending a gap in one of the two sequences, is set to 1. 4. AN APPLICA TION : PR OJECTIVE A UTOMA TIC ORCHESTRA TION In this section, we introduce and formalize the automatic projectiv e orchestration task (Figure 1). In particular, we propose a system based on statistical learning and deﬁne an ev aluation framew ork for using the POD database. 4.1 T ask Deﬁnition 4.1.1 Or chestral Inference For each orchestral piece, we deﬁne as O and P the aligned sequences of column v ectors from the piano-r oll of the or - chestra and piano parts. W e denote as T the length of the aligned sequences O and P . The objective of this task is to infer the present orches- tral frame knowing both the recent past of the orchestra sequence and the present piano frame. Mathematically , it consists in designing a function f where ˆ O ( t ) = f [ P ( t ) , O ( t − 1) , ..., O ( t − N )] ∀ t ∈ [ N , ...T ] (2) and N deﬁnes the order of the model. 4.1.2 Evaluation F ramework W e propose a quantitati ve ev aluation frame work based on a one-step predictiv e task. As discussed in [5], we make the assumption that an accurate predictive model will be able to generate original acceptable w orks. Whereas e valuating the generation of a complete musical score is subjecti ve and difﬁcult to quantify , a predictiv e framew ork provides us with a quantitativ e ev aluation of the performance of a model. Indeed, many satisfying orchestrations can be cre- ated from the same piano score. Howe ver , the number of reasonable inferences of an orchestral frame gi ven its con- text (as described in equation 2) is much more limited. As suggested in [4, 12], the accuracy measure [2] can be used to compare an inferred frame ˆ O ( t ) drawn from (2) to the ground-truth O ( t ) from the original ﬁle. Accuracy ( t ) = 100 . T P ( t ) T P ( t ) + F P ( t ) + F N ( t ) (3) where T P ( t ) (true positi ves) is the number of notes cor- rectly predicted (note played in both ˆ O ( t ) and O ( t ) ). F P ( t ) (false positive) is the number of notes predicted that are not in the original sequence (note played in ˆ O ( t ) but not in O ( t ) ). F N ( t ) (false negativ e) is the number of un- reported notes (note absent in ˆ O ( t ) , b ut played in O ( t ) ). When the quantization gets ﬁner, we observed that a model which simply repeats the previous frame gradu- ally obtains the best accuracy as displayed in T able ?? . T o correct this bias, we recommend using an event-le vel ev aluation framework where the comparisons between the ground truth and the model’ s output is only performed for time indices in T e deﬁned as the set of index es t e such that O ( t e ) 6 = O ( t e − 1) The deﬁnition of event-le vel indices can be observ ed in Figure 4. In the context of learning algorithms, splitting the database between disjoint train and test subsets is highly recommended [3, pg.32-33], and the performance of a giv en model is only assessed on the test subset. Finally , the mean accuracy measure o ver the dataset is giv en by 1 K X s ∈D test X t e ∈ T e ( s ) Accur acy ( t e ) (4) where D test deﬁnes the test subset, T e ( s ) the set of ev ent-time inde xes for a gi ven score s, and K = P s ∈D test | T e ( s ) | . 4.2 Proposed Model In this section, we propose a learning-based approach to tackle the automatic orchestral inference task. 4.2.1 Models W e present the results for two models called condi- tional Restricted Boltzmann Machine ( cRBM ) and F ac- tor ed Gated cRBM ( FGcRBM ). The models we explored are deﬁned in a probabilistic framew ork, where the vec- tors O ( t ) and P ( t ) are represented as binary random vari- ables. The orchestral inference function is a neural net- work that expresses the conditional dependencies between the different variables: the present orchestral frame O ( t ) , the present piano frame P ( t ) and the past orchestral frames O ( t − 1 , ..., t − N ) . Hidden units are introduced to model the co-acti vation of these variables. Their number is a hyper-parameter with an order of magnitude of 1000. A theoretical introduction to these models can be found in [26], whereas their application to projectiv e orchestration is detailed in [7]. 4.2.2 Data Repr esentation In order to process the scores, we import them as piano- r oll matrices (see Figure 3). Their extension to orchestral scores is obtained by concatenating the piano-r olls of each instrument along the pitch dimension. Then, new ev ents t e ∈ T e are extracted from both piano-rolls as described in Section 4.1. A consequence is that the trained model apprehends the scores as a succes- sion of events with no rhythmic structure. This is a sim- pliﬁcation that considers the rh ythmic structure of the pro- jected orchestral score to be exactly the same as the one of the original piano score. This is false in the general case, since a composer can decide to add nonexistent events in an orchestration. Howe ver , this provides a reasonable ap- proximation that is veriﬁed in a v ast majority of cases. During the generation of an orchestral score given a piano score, the next orchestral frame is predicted in the event- lev el framew ork, but inserted at the temporal location of the corresponding piano frame as depicted in Figure 4. Automatic alignment of the two piano-r olls is per- formed on the ev ent-lev el representations, as described in Section 3.2. In order to reduce the input dimensionality , we sys- tematically remov e any pitch which is nev er played in the training database for each instrument. W ith that simpliﬁ- cation the dimension of the orchestral vector typically de- creases from 3584 to 795 and the piano vector dimension from 128 to 89. Also, we follow the usual orchestral sim- pliﬁcations used when writing orchestral scores by group- ing together all the instruments of a same section. F or in- stance, the violin section, which might be composed by sev eral instrumentalists, is written as a single part. Finally , the velocity information is discarded, since we use binary units that solely indicate if a note is on or off. Eventually , we observed that an important proportion of the frames are silences, which mathematically corresponds to a column vector ﬁlled with zeros in the piano-roll rep- resentation. A consequence of the over -representation of silences is that a model trained on this database will lean tow ards orchestrating with a silence any piano input, which is statistically the most rele vant choice. Therefore, orches- Fr ame level Event level Piano Orchestr a Pitch T ime Figure 4 . From a piano score, the generation of an or- chestral score consists in extracting the e vent-lev el repre- sentation of the piano score, generating the sequence of orchestral ev ents, and then injecting them at the position of the ev ent from the piano score. Note that the silence in the fourth e vent of the piano score is not orchestrated by the probabilistic model, but is automatically mapped to a silence in the orchestral version. tration of silences in the piano score ( P ( t ) = 0 ) are not used as training points. Howe ver , it is important to note that they are not remov ed from the piano-rolls. Hence, si- lences could still appear in the past sequence of a training point, since it is a v aluable information regarding the struc- ture of the piece. During generation time, the silences in the piano score are automatically orchestrated with a si- lence in the orchestra score. Besides, silences are taken into consideration when computing the accuracy . 4.2.3 Results The results of the cRBM and FGcRBM on the orchestral inference task are compared to tw o na ¨ ıve models. The ﬁrst model is a random generation of the orchestral frames ob- tained by sampling a Bernoulli distribution of parameter 0 . 5 . The second model predicts an orchestral frame at time t by simply repeating the frame at time t − 1 . The results are summed up in T able ?? . 4.3 Discussion As expected, the random model obtains very poor results. The repeat model outperform all three other models, sur- prisingly ev en in the ev ent-level framew ork. Indeed, we observed that repeated notes still occur frequently in the ev ent-lev el framew ork. For instance, if between two suc- cessiv e events only one note out of ﬁve is modiﬁed, the accuracy of the repeat model on this frame will be equal to 66% . If the FGcRBM model outperforms the cRBM model in the frame-lev el framework, the cRBM is slightly better than the FGcRBM model in the ev ent-level frame work. Generations from both models can be listened to on the companion website 3 . Even though some fragments are coherent regarding the piano score and the recent past or- chestration, the results are mostly unsatisfying. Indeed, we observed that the models learn an extremely high probabil- ity for ev ery note to be off. Using regularization methods such as weight decay has not proven efﬁcient. W e believ e that this is due to the sparsity of the vectors O ( t ) we try to generate, and ﬁnding a more adapted data representation of the input will be a crucial step. 5. CONCLUSION AND FUTURE WORK W e introduced the Projective Orchestral Database ( POD ), a collection of MIDI ﬁles dedicated to the study of the re- lations between piano scores and corresponding orchestra- tions. W e believe that the recent advent in machine learn- ing and data mining has provided the proper tools to take advantage of this important mass of information and in- vestigate the correlations between a piano score and its or- chestrations. W e provide all MIDI ﬁles freely , along with aligned and non-aligned pre-processed piano-roll repre- sentations on the website https://qsdfo.github. io/LOP/index.html . W e proposed a task called automatic orchestral infer- ence. Gi ven a piano score and a corresponding orchestra- tion, it consists in trying to predict orchestral time frames, knowing the corresponding piano frame and the recent past of the orchestra. Then, we introduced an e valuation frame- work for this task based on a train and test split of the database, and the deﬁnition of an accuracy measure. W e ﬁnally present the results of two models (the cRBM and FGcRBM ) in this frame work. W e hope that the POD will be useful for many re- searchers. Besides the projectiv e orchestration task we de- ﬁned in this article, the database can be used in sev eral other applications, such as generating data for a source- separation model [17]. Even if small errors still persist, we thoroughly checked manually the database and guarantee its good quality . Ho wever , the number of ﬁles collected is still small with the aim of leading statistical in vestiga- tions. Hence, we also hope that people will contribute to enlarge this database by sharing ﬁles and helping us gather the missing information. 6. REFERENCES [1] Imslp. http://imslp.org/wiki/Main_Page . Accessed : 2017-01-23. [2] Mert Bay , Andreas F Ehmann, and J Stephen Downie. Evaluation of multiple-f0 estimation and tracking sys- tems. In ISMIR , pages 315–320, 2009. [3] Christopher M Bishop. P attern r ecognition and ma- chine learning . springer , 2006. [4] Nicolas Boulanger-Le wando wski, Y oshua Bengio, and Pascal V incent. Modeling temporal dependencies in 3 https://qsdfo.github.io/LOP/results high-dimensional sequences: Application to poly- phonic music generation and transcription. arXiv pr eprint arXiv:1206.6392 , 2012. [5] Darrell Conklin and Ian H Witten. Multiple viewpoint systems for music prediction. J ournal of New Music Resear ch , 24(1):51–73, 1995. [6] J. Cookerly . Complete orchestration system, May 18 2010. US Patent 7,718,883. [7] Leopold Crestel and Philippe Esling. Li ve orchestral piano, a system for real-time orchestral music genera- tion. In Pr oceedings of the 14th Sound and Music Com- puting Confer ence , Aalto, Finland, July 2017. [8] Philippe Esling, Gr ´ egoire Carpentier , and Carlos Agon. Dynamic musical orchestration using genetic algo- rithms and a spectro-temporal description of musical instruments. Applications of Evolutionary Computa- tion , pages 371–380, 2010. [9] Maarten Grachten, Martin Gasser , Andreas Arzt, and Gerhard W idmer . Automatic alignment of music per- formances with structural differences. In In Pr oceed- ings of 14th International Society for Music Informa- tion Retrieval Conference (ISMIR . Citeseer , 2013. [10] Jiun-Long Huang, Shih-Chuan Chiu, and Man-Kwan Shan. T o wards an automatic music arrangement frame- work using score reduction. ACM T ransactions on Multimedia Computing, Communications, and Appli- cations (TOMM) , 8(1):8, 2012. [11] Charles Koechlin. T rait ´ e de l’orc hestration . ´ Editions Max Eschig, 1941. [12] V ictor Lavrenk o and Jeremy Pickens. Polyphonic mu- sic modeling with random ﬁelds. In Pr oceedings of the eleventh A CM international confer ence on Multimedia , pages 120–129. A CM, 2003. [13] Y ann LeCun, Y oshua Bengio, and Geof frey Hinton. Deep learning. Natur e , 521(7553):436–444, 05 2015. [14] Sven-Amin Lembke and Stephen McAdams. T imbre blending of wind instruments: acoustics and percep- tion. 2012. [15] Stephen McAdams. T imbre as a structuring force in music. In Proceedings of Meetings on Acoustics , vol- ume 19, page 035050. Acoustical Society of America, 2013. [16] Stephen McAdams and Bruno L Giordano. The per- ception of musical timbre. The Oxfor d handbook of music psycholo gy , pages 72–80, 2009. [17] M. Miron, J. Janer , and E. G ´ omez. Generating data to train con volutional neural networks for classical music source separation. In Pr oceedings of the 14th Sound and Music Computing Confer ence , pages 227–233, Aalto, Finland, 2017 2017. [18] Saul B. Needleman and Christian D. W unsch. A gen- eral method applicable to the search for similarities in the amino acid sequence of two proteins. J ournal of Molecular Biology , 48(3):443 – 453, 1970. [19] Franc ¸ ois Pachet. A joyful ode to automatic orches- tration. A CM T rans. Intell. Syst. T echnol. , 8(2):18:1– 18:13, October 2016. [20] Geoffro y Peeters, Bruno L Giordano, Patrick Susini, Nicolas Misdariis, and Stephen McAdams. The tim- bre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of Amer - ica , 130(5):2902–2916, 2011. [21] W alter Piston. Or chestration . Ne w Y ork: Norton, 1955. [22] Daniel Pressnitzer , Stephen McAdams, Suzanne W ins- berg, and Joshua Fineberg. Perception of musical tension for nontonal orchestral timbres and its rela- tion to psychoacoustic roughness. P er ception & psy- chophysics , 62(1):66–80, 2000. [23] Nikolay Rimsky-K orsakov . Principles of Or chestra- tion . Russischer Musikverlag, 1873. [24] Hirofumi T akamori, Haruki Sato, T akayuki Nakatsuka, and Shigeo Morishima. Automatic arranging musical score for piano using important musical elements. In Pr oceedings of the 14th Sound and Music Computing Confer ence , Aalto, Finland, July 2017. [25] Damien T ardieu and Stephen McAdams. Perception of dyads of impulsiv e and sustained instrument sounds. Music P er ception , 30(2):117–128, 2012. [26] Graham W T aylor and Geoffre y E Hinton. Factored conditional restricted boltzmann machines for model- ing motion style. In Pr oceedings of the 26th annual international confer ence on machine learning , pages 1025–1032. A CM, 2009.

A database linking piano and orchestral MIDI scores with application to automatic projective orchestration

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment