User Curated Shaping of Expressive Performances

User Curated Shaping of Expr essiv e P erf ormances Zhengshan Shi 1 Carlos Cancino-Chac ´ on 2 Gerhard Widmer 2 3 Abstract Musicians produce individualized, expressi v e per - formances by manipulating parameters such as dynamics, tempo and articulation. This manip- ulation of e xpressi v e parameters is informed by elements of score information such as pitch, me- ter , and tempo and dynamics markings (among others). In this paper we present an interacti v e in- terface that gi ves users the opportunity to explore the relationship between structural elements of a score and expressi v e parameters. This interface draws on the basis function models, a data-driven framew ork for expressiv e performance. In this framew ork, expressi ve parameters are modeled as a function of score features, i.e., numerical encodings of speciﬁc aspects of a musical score, using neural networks. With the proposed inter - face, users are able to weight the contrib ution of individual score features and understand ho w an expressi v e performance is constructed. 1. Introduction The way a piece of music is performed e xpressi vely consti- tutes a very important aspect of our enjoyment of the music. In W estern art music, performers conv e y expression in their performances through v ariations in expressi v e dimensions such as tempo, dynamics and articulation, among others. While most computational models of expressiv e perfor- mance allo w for modeling only a single performance strat- egy , musicians can interpret a piece of music with a wide variety of stylistic and expressi v e inﬂections ( Kirk e & Mi- randa , 2013 ; Cancino-Chac ´ on et al. , 2018 ). Some compu- tational models allow users to control global characteris- tics of the performance (lik e tempo and dynamics) in real 1 CCRMA, Stanford Univ ersity , USA 2 Austrian Research Insti- tute for Artiﬁcial Intelligence, V ienna, Austria 3 Johannes K epler Univ ersity Linz, Austria. Correspondence to: Zhengshan Shi < kittyshi@ccrma.stanford.edu > , Carlos Cancino-Chac ´ on < car- los.cancino@ofai.at > . Pr oceedings of the 36 th International Confer ence on Machine Learning , Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). time ( Dixon et al. , 2005 ; Che w et al. , 2006 ; Baba et al. , 2010 ). In this work we present a prototype of a system that allows users to generate indi vidualized piano performances by weighting the contribution of individual aspects of the musical score to the ov erall performance. The rest of this paper is structured as follows: Section 2 provides a brief ov ervie w of the basis function models. Sec- tion 3 describes the proposed extension to the basis function models to allo w the user to weight the contribution of indi- vidual aspects of the score to shape expressiv e performances. Finally , the paper is concluded in Section 4 . 2. Basis Function Models The basis function models are a data-driven framework for modeling musical expressiv e performance of notated music ( Grachten & Widmer , 2012 ; Cancino-Chac ´ on & Grachten , 2016 ). In this frame work, numerical representa- tions of expressi ve dimensions such as tempo and dynamics (which we refer to as e xpr essive parameters ) are modeled as function of scor e basis functions : numerical encodings of structural aspects of a musical score. These aspects include low-le v el notated features such as pitch and metrical infor - mation, as well as music theoretic features and cogniti v ely motiv ated features. More formally , an e xpressi ve parameter can be written as y i = f ( ϕ i ) , where ϕ i is a vector of basis functions e v aluated on score element x i (e.g., a note or a position in the score, which we refer to as scor e onset ) and f ( · ) is (non-linear) function (the output of a neural netw ork, as described below). For a thorough description of the basis function models, see ( Cancino-Chac ´ on , 2018 ). 2.1. Representing Perf ormance Information In order to capture the sequential nature of music, we divide the performance information into onset-wise and note-wise parameters. Onset-wise parameters capture aspects of the performance with respect to the corresponding temporal (score) position, while note-wise features capture aspects of the performance of each note: 2 . 1 . 1 . O N S E T - W I S E P A R A M E T E R S 1. MIDI velocity trend (vt). Maximal MIDI velocity at each score onset User Curated Shaping of Expressiv e Perf ormances 2. Log Beat Period Ratio (lbpr). Logarithm of the beat period, i.e., the time interval between consecuti v e beat grids, divided by the a v erage beat period of the piece. 2 . 1 . 2 . N OT E - W I S E PA R A M E T E R S 1. MIDI v elocity deviations (vd). The de viation of the MIDI velocity for each note from the trend. 2. Timing (tim). Onset de viations of the indi vidual notes from the grid established by the local beat period. 3. Articulation (art). Logarithm of the ratio of the actual duration of a performed note to its reference (notated) duration according to the local beat period. These parameters are then standardized per piece to be zero- mean and unit variance. 2.2. Modeling Expressi ve Perf ormance W e use bi-directional LSTMs to model onset-wise as well as note-wise parameters, gi ven their sequential nature. The input of the networks for predicting onset-wise parameters are the basis functions ev aluated for each score onset, while the input of the networks for note-wise parameters are the basis functions ev aluated for e v ery note. The models are trained in a supervised fashion to minimize the reconstruction error on the Magalof f/Chopin ( Flossmann et al. , 2010 ) and Zeilinger/Beethoven ( Cancino-Chac ´ on et al. , 2017 ) datasets. These datasets consists of record- ings of piano music performed on computer controlled B ¨ osendorfer grand pianos, which hav e been aligned to their scores. 3. User -contr olled Basis Function Models In order to allo w the users to e xplore and adjust the contri- bution of indi vidual score features, we need ﬁrst to compute the contribution of a feature to the output of the model. A way to do so is to deﬁne a locally-linear approximation of the output of the neural networks modeling each expressiv e parameter as follows ˜ y i = c +  ∂ ∂ ϕ f ( ϕ ∗ )  T ( ϕ i − ϕ ∗ ) , (1) where ˜ y is the approximated value of the e xpressi v e param- eter for score element x i , c is a user deﬁned constant v alue (e.g. the a verage lbpr or MIDI velocity of the piece), ϕ i is the vector of basis functions e v aluated for score element x i and ∂ ∂ ϕ f ( ϕ ∗ ) is the gradient of f with respect to ϕ ev al- uated in ϕ ∗ . W e can naturally extend this locally-linear approximation to onset-wise models, by constructing a tem- poral Jacobian matrix, in which its ij -th element can be Figure 1. The user interface where the w av eform of the predicted performance is displayed. Sliders are provided for the user to shape the performance. Curves indicating expressi ve parameters are updated as the user change the sliders. interpreted as the “contrib ution” of the j -th basis function (e.g., the pitch, the inter-onset-interv al, etc) to the perfor - mance of the i -th score onset. 3.1. Interactive Interface The interface allows users to explore the contribution of indivi dual score descriptors (e.g., the velocity on downbeats, the timing surrounding a beat phase) by adjusting the scal- ing of each column of the temporal Jacobian matrix. Curves indicating velocity and beat period will be updated to visu- alize the changes. The onset-wise and note-wise parameters will be calculated with the locally-linear approximation, and a ne w performance will be rendered and displayed for listen- ing. The users will also be able to indicate their preference on overall tempo and articulation of the piece by adjust- ing the mean and standard de viation. In this way , they can shape the way a performance is rendered, while e xploring the contribution of dif ferent musical dimensions (Figure 1 ). 4. Conclusions In this paper we ha ve presented a prototype of an interface that allows users to explore the contrib ution of indi vidual score descriptors to the e xpressiv eness of the performance. Such an interface could hav e potential pedagogical applica- tions: users can interactiv ely e xplore the comple x patterns through which score features contribute to the overall e x- pressiv eness, while at the same time allo wing for creating personalized interpretation, as the performer giv es more importance to certain parameters. User Curated Shaping of Expressiv e Perf ormances Acknowledgements This research has recei ved funding from the European Re- search Council (ERC) under the European Union’ s Horizon 2020 research and innov ation programme under grant agree- ment No. 670035 (project “Con Espressione”). References Baba, T ., Hashida, M., and Katayose, H. “V irtualPhilhar- mony”: A Conducting System with Heuristics of Con- ducting an Orchestra. In Pr oceedings of the 10th In- ternational Conference on New Interfaces for Musical Expr ession, NIME 2010 , pp. 263–270, Sydney , Australia, 2010. Cancino-Chac ´ on, C., Grachten, M., Goebl, W ., and W idmer , G. Computational Models of Expressiv e Music Performance: A Comprehensiv e and Criti- cal Revie w . F r ontiers in Digital Humanities , 5:25, 2018. ISSN 2297-2668. doi: 10.3389/fdigh.2018. 00025. URL https://www.frontiersin.org/ article/10.3389/fdigh.2018.00025 . Cancino-Chac ´ on, C. E. Computational Modeling of Ex- pr essive Music P erformance with Linear and Non-linear Basis Function Models . PhD thesis, Johannes Kepler Univ ersity Linz, Linz, Austria, 2018. Cancino-Chac ´ on, C. E. and Grachten, M. The Basis Mixer: A Computational Romantic Pianist. In Pr oceedings of the Late Breaking/ Demo Session, 17th International So- ciety for Music Information Retrieval Conference (ISMIR 2016) , New Y ork, NY , USA, 2016. Cancino-Chac ´ on, C. E., Gadermaier , T ., W idmer , G., and Grachten, M. An Evaluation of Linear and Non-linear Models of Expressi ve Dynamics in Classical Piano and Symphonic Music. Machine Learning , 106(6):887–909, 2017. Chew , E., Liu, J., and Fran c ¸ ois, A. R. J. ESP: Roadmaps As Constructed Interpretations and Guides to Expressiv e Performance. In Pr oceedings of the 1st ACM W orkshop on Audio and Music Computing Multimedia , pp. 137–145, New Y ork, NY , USA, 2006. A CM. Dixon, S., Goebl, W ., and W idmer, G. The “ Air W orm”: an Interface for Real-T ime manipulation of Expressi v e Mu- sic Performance. In Pr oceedings of the 2005 International Computer Music Confer ence (ICMC 2005) , Barcelona, Spain, 2005. Flossmann, S., Goebl, W ., Grachten, M., Niedermayer , B., and W idmer, G. The Magaloff Project: An Interim Report. Journal of New Music Researc h , 39(4):363–377, 2010. Grachten, M. and W idmer , G. Linear Basis Models for Prediction and Analysis of Musical Expression. Journal of New Music Researc h , 41(4):311–322, December 2012. Kirke, A. and Miranda, E. R. An Overvie w of Computer Systems for Expressiv e Music Performance. In Kirk e, A. and Miranda, E. R. (eds.), Guide to Computing for Ex- pr essive Music P erformance , pp. 1–48. Springer-V erlag, London, UK, 2013.

User Curated Shaping of Expressive Performances

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment