Eigentriads and Eigenprogressions on the Tonnetz
We introduce a new multidimensional representation, named eigenprogression transform, that characterizes some essential patterns of Western tonal harmony while being equivariant to time shifts and pitch transpositions. This representation is deep, mu…
Authors: Vincent Lostanlen
EIGENTRIADS AND EIGENPR OGRESSIONS ON THE T ONNETZ V incent Lostanlen Music and Audio Research Lab, Ne w Y ork Uni versity , Ne w Y ork, NY , USA ABSTRA CT W e introduce a ne w multidimensional representation, named eigenpr ogr ession transform , that characterizes some essen- tial patterns of W estern tonal harmon y while being equi v- ariant to time shifts and pitch transpositions. This rep- resentation is deep, multiscale, and con v olutional in the piano-roll domain, yet incurs no prior training, and is thus suited to both supervised and unsupervised MIR tasks. The eigenprogression transform combines ideas from the spi- ral scattering transform, spectral graph theory , and wav elet shrinkage denoising. W e report state-of-the-art results on a task of supervised composer recognition (Haydn vs. Mozart) from polyphonic music pieces in MIDI format. 1. EIGENTRIADS Let x [ t, p ] ∈ M T ,P ( R ) the piano-roll matrix of a musical piece, either obtained by parsing symbolic data or by ex- tracting a melody salience representation from audio [2]. The constant T (resp. P ) is typically equal to 2 10 (resp. 2 7 ). W ithin the framew ork of twelve-tone equal tempera- ment, we define the major and minor triads as the tuples I 1 = (0 , 4 , 7) and I 0 = (0 , 3 , 7) . For each quality q ∈ Z 2 and frequency β ∈ Z 3 , let ψ triad β 1 ,q [ p ] = 3 X n =1 exp 2 π i β n 3 δ p − I q [ n ] , (1) where δ [ p − I q [ n ]] is the Kroneck er delta symbol, equal to one if p = I q [ n ] and zero otherwise. Let G q the induced subgraph of I q , where I q is understood as a set of vertices in Z P . Observe that { p 7→ ψ triad β ,q } β consists of the eigen- functions of the unnormalized Laplacian matrix of G q : L triad q [ p, p 0 ] = |I q | δ [ p ∈ I q ] δ [ p 0 ∈ I q ] − δ [ p G q ∼ p 0 ] (2) As a result, we propose to name eigentriads the complex- valued signals ψ triad β 1 ,q in pitch space. W e construct a mul- tiresolution con volutional operator in the piano-roll domain This work is supported by the ERC Inv ariantClass grant. The source code to reproduce experiments is released under MIT license at: www.github.com/lostanlen/ismir2018- lbd . The au- thor thanks Moreno Andreatta, Joanna Dev aney , Peter van Kranenb urg, St ´ ephane Mallat, Brian McFee, and Gissel V elarde for helpful comments. c V incent Lostanlen. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: V incent Lostanlen. “Eigentriads and Eigenprogressions on the T onnetz”, 19th In- ternational Society for Music Information Retrieval Conference, Paris, France, 2018. by separable interference Ψ triad ( α 1 ,β 1 ,q ) [ t, p ] = ψ α 1 [ t ] ψ triad β 1 ,q [ p ] between the aforementioned eigentriads and a family of temporal Gabor wa velets ψ α 1 [ t ] = α 1 exp − α 2 1 t 2 2 σ 2 exp(i α 1 ξ t ) (3) for t ∈ J 0; T J . W e set ξ = 2 π 3 , σ = 0 . 1 , ξ = 2 π 3 , and log 2 α 1 ∈ J 0; log 2 T K . W e define the eigentriad transform of x as the rank-fi ve tensor resulting from the complex modulus of all con volutions between x and multiv ariable wa velets Ψ triad ( α 1 ,β 1 ,q ) [ t, p ] : U 1 ( x )[ t, p, q , α 1 , β 1 ] = x ∗ Ψ triad ( α 1 ,β 1 ,q ) [ t, p ] = T − 1 X t 0 =0 P − 1 X p 0 =0 x [ t 0 , p 0 ] Ψ triad ( α 1 ,β 1 ,q ) [ t − t 0 , p − p 0 ] , (4) where the dif ference in t (resp. in p ) is computed in Z T (resp. in Z P ). By averaging the tensor U 1 ( x ) ov er the di- mensions of time t , pitch p , and triad quality q , one obtains the matrix S 1 ( x )[ α 1 , β 1 ] = X t ∈ Z T X p ∈ Z P X q ∈ Z 2 U 1 ( x )[ t, p, q , α 1 , β 1 ] . (5) The operator x 7→ S 1 ( x ) characterizes the relati ve amounts of ascending triads ( β 1 = 1 ), descending triads ( β 1 = − 1 ), and perfect chords ( β 1 = 0 ) at various temporal scales α 1 in the piece x , while k eeping a relativ ely lo w dimensional- ity , equal to 3 log 2 T . The averaging along variables t and p in volved in Equation 5 guarantees that S 1 is inv ariant to the action of any temporal shift operator τ ∆ t : x [ t, p ] 7→ x [ t + ∆ t, p ] , as well as any pitch transposition operator π ∆ p : x [ t, p ] 7→ x [ t, p + ∆ p ] : ∀ ∆ t ∈ Z T , ∀ ∆ p ∈ Z P , S 1 ( π ∆ p ◦ τ ∆ t ◦ x ) = S 1 ( x ) . (6) Furthermore, the averaging across triad qualities q implies approximate in variance to tonality , in the sense that replac- ing major triads by minor triads and vice versa in x (insofar as this is feasible in the signal x at hand) does not affect the matrix S 1 ( x ) . From the standpoint of serialist music theory [1], the presence of signed eigentriad frequencies β 1 = ± 1 ensures that S 1 is not in v ariant to retrogradation R : x [ t, p ] 7→ x [ − t, p ] , i.e. time re versal: ( R ◦ x ) 6 = x = ⇒ S 1 ( R ◦ x ) 6 = S 1 ( x ) . (7) Howe ver , the av eraging across triad qualities q causes S 1 to be in variant to in version I : x [ t, p ] 7→ x [ t, − p ] , i.e. rev ersal of the pitch axis: ( I ◦ x ) 6 = x 6 = ⇒ S 1 ( I ◦ x ) 6 = S 1 ( x ) . (8) The above property hinders the accurate modeling of chord progressions in the context of W estern tonal music. Indeed, S 1 fails to distinguish a perfect major cadence ( C maj → F maj ) from a plagal minor cadence ( F min → C min ), as one proceeds from the other by in volution with I . More gen- erally , the eigentriad transform may suffice for extracting the quality of isolated chords, but lacks longer-term con- text of harmonic tension and release so as to infer the tonal functions of such chords (e.g. tonic vs. dominant). 2. EIGENPR OGRESSIONS In this section, we introduce a second multidimensional feature U 2 , built on top of U 1 and named eigenpr ogres- sion transform , that aims at integrating harmonic context in W estern tonal music while still respecting the aforemen- tioned requirements of in variance to global temporal shifts τ ∆ t and pitch transpositions π ∆ p . W e begin by defining the T onnetz as an undirected graph over the 24 vertices of triads ( p, q ) ∈ Z 12 × Z 2 . Let J 1 = 4 (resp. J 0 = 3 ) the number of semitones in a major (resp. minor) third. The unnormalized Laplacian tensor of the T onnetz is L T onnetz [ p, q , q 0 , p 0 ] = δ [( − 1) q ( p − p 0 ) ∈ J q ] + δ [( − 1) q 0 ( p − p 0 ) ∈ J q 0 ] + δ [ p − p 0 ] × δ [ q − q 0 + 1] − 3 δ [ p − p 0 ] δ [ q − q 0 ] . (9) W e define eigen values λ of the Laplacian tensor, and corre- sponding eigenv ectors v , as the solutions of the following equation: ( L T onnetz ⊗ v )[ p, q ] = X p 0 ,q 0 L [ p, q , q 0 , p 0 ] v [ p 0 , q 0 ] = λ v [ p, q ] . (10) The number of distinct eigenv alues λ 1 . . . λ K satisfying Equation 10 is equal to K , and their associated eigensub- spaces form a direct sum: V 1 ⊕ . . . ⊕ V K = M P, 2 ( R ) . For values of k such that dim V k = 1 , we define the k th eigenprogressions as ψ T onnetz k [ p, q ] = v k [ p, q ] , where v k ∈ V k and k v k k 2 = 1 . In contrast, for values of k such tat dim V k = 2 , we arbitrarily select two vectors v R e k and v I m k satisfying v R e k ⊥ v I m k , k v R e k k 2 = k v I m k k 2 = 1 , and span( { v R e k , v I m k } ) = V k ; and define eigenprogressions as ψ T onnetz β 2 [ p, q ] = v R e k [ p, q ] + i v I m k [ p, q ] . (11) W e define multiv ariable eigenprogression wa velets as Ψ prog ( α 2 ,β 2 ,γ 2 ) [ t, p, q ] = ψ α 2 [ t ] ψ T onnetz β 2 [ p, q ] ψ spiral γ 2 [ p ] , (12) where ψ α 2 is a temporal Gabor wav elet of frequency and ψ spiral γ 2 is a Gabor wa velet on the Shepard pitch spiral [4]: ψ spiral γ 2 [ p ] = γ 2 exp − γ 2 2 b p 12 c 2 2 σ 2 exp i γ 2 ξ j p 12 k , (13) wherein γ 2 ∈ { 0 , ± 1 } . W e define the eigenpr ogr ession transform of x as the following rank-eight tensor: U 2 ( x )[ t, p, q , α 1 , β 1 , α 2 , β 2 , γ 2 ] = U 1 ( x ) t,p,q ∗ Ψ prog ( α 2 ,β 2 ,γ 2 ) [ t, p, q ] . (14) At first sight, Equation 10 suffers from an identifiability problem. Indeed, a different choice of basis for V k would incur a phase shift and/or a complex conjugation of the con volutional response U 1 ( x ) p,q ∗ ψ T onnetz k . Y et, because the eigenprogression transform consists of Gabor wa velets (i.e. with a symmetric amplitude profile) and is followed by a complex modulus operator, such differences in phase and/or spin are eventually canceled and thus have no ef- fect on the outcome of the transform. Consequently , we pick one arbitrary pair ( v R e k , v I m k ) for each subspace V k , without loss of generality . 3. EXPERIMENTS W e ev aluate the eigenprogression transform on a task of supervised composer recognition between Haydn and Mozart string quartets [7]. After av eraging along time t and pitch p , we standardize each feature in the rank-fiv e tensor S 2 ( x )[ α 1 , β 1 , α 2 , β 2 , γ 2 ] = X t,p,q U 2 ( x )[ t, p, q , α 1 , β 1 , α 2 , β 2 , γ 2 ] (15) to null mean and unit variance. Then, we train a linear support vector machine with C = 10 4 , and report results with leave-one-out cross-validation. The ablation study in T able 1 confirm that all the fiv e scattering variables ( α 1 , β 1 , γ 1 , α 2 , β 2 , γ 2 ) are beneficial to both sparsity and ac- curacy . Ho wev er, because the dataset string quartet move- ments contains only 107 examples in total, the full eigen- progression transform (in dimension 8385 ) is exposed to statistical overfitting. For the sake of simplicity , rather than running a feature selection algorithm, we apply wav elet shrinkage denoising, i.e. we keep 1119 coef ficients of largest energy on av erage, summing up to 50% of the total en- ergy . This adaptive procedure has been proven to be near - optimal in the context of wavelet bases [3]. It leads to a state-of-the-art classification accuracy of 82 . 2% . 4. CONCLUSION W e hav e diagonalized the Laplacian of the T onnetz graph and derived a multiv ariable scattering transform, named eigenprogression transform, that captures some local har- monic context in W estern tonal music. Although the nu- merical example we ga ve was a task of composer recog- nition, the eigenprogression transform could, in principle, addresss other MIR tasks in the future, including cover song retriev al, ke y estimation, and structure analysis. dim. ` 1 /` 2 acc. (%) [7] 79.4 [8] 80.4 α 1 8 2.6 67.3 α 1 β 1 24 4.6 71.0 α 1 β 1 α 2 129 6.1 72.0 α 1 β 1 α 2 β 2 1677 17.0 76.7 α 1 β 1 α 2 β 2 γ 2 8385 42.4 77.6 α 1 β 1 α 2 β 2 γ 2 1119 22.3 82.2 T able 1 . Comparison between the eigenprogression trans- form and other transforms of smaller tensor rank, in terms of dimensionality , sparsity ( ` 1 /` 2 ratio), and accuracy on a supervised task of composer recognition. 5. REFERENCES [1] Milton Babbitt. T welve-tone inv ariants as com- positional determinants. The Musical Quarterly , 46(2):246–259, 1960. [2] Rachel M. Bittner , Brian McFee, Justin Salamon, Peter Li, and Juan Pablo Bello. Deep salience representations for f 0 estimation in polyphonic music. In Proc. ISMIR , 2017. [3] David L Donoho and Jain M Johnstone. Ideal spa- tial adaptation by w avelet shrinkage. Biometrika , 81(3):425–425, 1994. [4] V incent Lostanlen and St ´ ephane Mallat. W avelet scat- tering on the pitch spiral. In Pr oc. D AFx , 2015. [5] St ´ ephane Mallat. Group in variant scattering. Commun. Pur e Appl. Math. , 65(10):1331–1398, 2012. [6] Oriol Nieto and Juan Pablo Bello. MSAF: Music struc- ture analysis framew ork. In Pr oc. ISMIR , 2015. [7] Peter V an Kranenburg and Eric Backer . Musical style recognitiona quantitativ e approach. In Handbook of P attern Recognition and Computer V ision , pages 583– 600. W orld Scientific, 2005. [8] Gissel V elarde, T illman W eyde, Carlos Eduardo Can- cino Chac ´ on, David Meredith, and Maarten Grachten. Composer recognition based on 2D-filtered piano- rolls. In Pr oc. ISMIR , 2016.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment