Chord Generation from Symbolic Melody Using BLSTM Networks

Generating a chord progression from a monophonic melody is a challenging problem because a chord progression requires a series of layered notes played simultaneously. This paper presents a novel method of generating chord sequences from a symbolic me…

Authors: Hyungui Lim, Seungyeon Rhyu, Kyogu Lee

CHORD GENERA TION FROM SYMBOLIC MEL ODY USING BLSTM NETWORKS 𝐇𝐲𝐮𝐧𝐠𝐮𝐢 ' 𝐋𝐢𝐦 𝟏,𝟐 , 𝐒𝐞𝐮𝐧𝐠𝐲𝐞𝐨𝐧' 𝐑𝐡𝐲𝐮 𝟏 and 𝐊𝐲𝐨𝐠𝐮' 𝐋𝐞𝐞 𝟏,𝟐 ' 3 Music and Audio Research Group, Graduate School of Convergence Science and Technology ' 4 Center for Super Intelligence Seoul National University, Korea {goongding7, rsy1026, kglee}@snu.ac.kr ABSTRACT Generat ing a chord progres sion from a monophonic melody is a chall enging probl em becau se a chord progression requires a series of layered notes played simultaneously. This paper prese nts a novel method of generating chord sequences from a symbolic melody using bidirectional long short - term memory (BLSTM ) network s trained on a lead sheet database . To th is end , a group of feature vectors com posed of 12 semit ones is extracted from the no t es in each bar of mono phonic melodies. In order to ensure that the data shares uniform key and duration cha racteristics, the key and the time si gnatures of the vec tors are normalize d. The BLST M netw orks th en learn from the data to incorporate the tempora l dependencies to produce a chord progression. Both quantitative and qualitative evaluations are conducted by comparing the proposed method with the conventional HMM and DNN - HMM based appr oaches. Proposed model achieves 23.8% and 11.4% performance increas e f rom the other models, respec tively. User studies further confirm that the chord sequences generated by the proposed method are pr eferred by list eners. 1. INT RODUC TION Genera ting chords from melodies is an artistic pro cess for musici ans, w hich requires knowledge of chord progression and tona l harmo ny. W hile it play s an important role in music composi tion studies , the implemen tatio n of its process can be difficult especially for individuals who do not have prior experience or dom ain knowledge in musi cal studies. F or this reason, the chord generation process often serves as an obstacle for no vices who try to c ompose music based on a melody. To overcome this limitati on, automatic chord generatio n systems ha ve been implemented based on machine learning m ethod s [ 1, 2 ]. One of the most popul ar approaches for this task is probabilistic m odelling, which commonly applies the hidden Markov model (H MM). A single - HMM is used with 12 - semitone vectors of melody as observations and corresponding chords as hidden states [3, 4]. Allan and Williams trained a first - order HMM which learn s from pieces composed by Bach, to generate c horale harmonies [5 ]. A more comp lex metho d is presented by Raczy ń ski et al. [6], using time - varying tonalities and bigram s as o bservation s with melo dy variables. In addit ion, a multi - level gr aphical m odel u sing tree structure s and HM M is proposed by Paiement et al. [7]. Their mode l generates chord progressions base d on the root note p rogression pred icted from a melodic se quence. Forsyth and B ello [ 8 ] also introduce d a MID I based harmonic acc ompaniment system using a finite state transducer (F ST). Althou gh the HMM has bee n succes sfully used for various tasks, it has several drawbacks. According to one of the assumptions of the Markov model, observations occur independently of their neighbors, depending only on the curre nt state. Moreove r, the c urrent sta te of a Mark ov chain is only affected by its previous state. These drawbacks are also observable in chord generation from melody tasks because long - term depen dencies exist in chord progressions and melodic sequences of Wes ter n tonal music [6]. Meanw hile , deep lear ning based appro aches have recently shown grea t im proveme nts in machin e learning tasks of large da tasets. Esp ecially for tempora l sequen ces, recurrent neural netw orks (R NN) and long short term memory (LSTM) n etworks hav e pr oven to be more powerful models than HMM in the field of handwriting recognition [9], speech recognition [10] , a nd emotion recognition [11]. Nowadays, even music generat ion researches have increa singly adapted RNN/LSTM model s in two m ajor stream – one that aims to generate complete music se quences [12, 13], and the ot her which concentrates on generating music com ponents suc h as melody, chord and drum sequence [14, 15]. We attempt an extended approach to the latte r stream b y implem enting a chord generation system with a mel ody in put . In this paper, we implemen t a ch ord genera tion al gorithm based on bi directional LSTM (BLSTM) and evaluate i ts perform ance on reflecting temporal dependencies on melody/chord progressions by comparing with two HMM - based methods: a simple HMM, and deep neural networks - HMM (DNN - HM M). We then pres ent the qua ntitative analysis and the accuracy results of the three models . We also describe the qu alitative results based on subjecti ve ratings provided by 25 non - musi cians . © Hyun gui Lim, Seungy eon Ryu and Kyogu Lee. Lic en sed under a Creat ive Commons Attr ibuti on 4.0 Interna ti onal L icense (CC B Y 4.0) . Attribution: Hyungui L im, Seun gyeo n R yu a nd Kyogu Lee. “ Cho rd Gen eration from Sym bo lic Me lo dy Us ing BLST M N etw o rk s ”, 18th Inter nati onal Soci ety for Musi c Informa ti on Retr ieval Confer ence, 2017. T he remainder of th e paper is organized as follow. I n S ection 2 , we ex plain the preprocessin g step and the detail s of the machine lear ning met hods we apply . Section 3 describe s the expe rimental setup for evaluating the proposed approach . The experim ental re sults are presen ted in Section 4, with additional discussions. Finall y , we dra w a concl usion followed by limitations and future work s in Section 5. 2. METHODOLOGY The method proposed in this paper c an be divided into two main part s. The first part is a preprocessing procedure to extract inpu t/ output features from lead sheet s. The other part consists of model trai ning and a chord generation processes . We app ly BLSTM network s for th e proposed model and two types of HMM for the comparable models . The o verall framework of our proposed method is shown in Figure 1 . 2.1 Preprocessi ng To extract appropriate features for this task , we first col lect m usical features such as time signature, measure (bar), key {fifths, mode}, chord {root, type} and note {root, octave, duration} from the lead sheets. These features are then represented in a m atrix by concatenating rows , which respectively represent the musical features of a single note as shown in Figure 2 . T he generated data is then preprocessed in order to make an acceptable relatio n between melody in put an d cho rd output . A ll so ngs are in major key in the data base and are transposed to C major key for data consistency. In other words, all root s of chords and note s are shifted to C m ajor key to normalize different characteristics of melodies an d chords i n different songs. Each song contains a tim e sig nature , which has a variety of meters such as 4/4, 3/4, 6/8 , etc. The variety in time signature cause s the im balance of total note duration s in a bar among different songs , s o note durations are normalized b y multiplying them with the recipro cal number of each tim e signature. After that, every note in a bar is stored into 12 semitone clas ses, without the octave information . Each class consists of a single value that accumulates the durat ion of t he c orresponding sem itone in the bar. Since the t otal number of chord types is quite large, if all of these chord types exist as independent classes, then each chord may not have enough samples. For such reason, all types of chord s are mapped in to one of tw o primary triads: m ajor and minor . E ach ch ord is represente d wi th a binary 24 - dimensional class to indic ate the 24 m ajor/mino r chords. 2.2 BLSTM Network s Recurren t neural networks (RNN) is a deep learning model, which learns complex networ ks not only by recons truct ing the input featu res in a nonlin ear process, b ut also by usin g the para meters of previo us states in its hidden layer. A concept of “time step” exists in RNN, which is able to control the number of feedbacks on a recurrent process. This prope rty enable s the model to incorporate temporal dependencies by s toring the pas t information in its i nternal memory, in cont rast to a simpl e feedf orward deep neural netwo rks (DNN). Despit e such advant ages of RNN models, ther e stil l exist problems regarding the long - term depen dency. This is caused by v anishing gra dient during th e back propagation through time (BPTT) [16 ]. In the p rocess of calculating the gradient of the loss fun ction , the error between the estimated value and the actual value diminishes as the number of hidden laye r s increases. Thus, we instead use l ong short - term m emory (L STM) lay er s, Figu re 2 . An example of extracted d ata from a single b ar. Figu re 1 . The overvie w of propose d system which im prove the lim itation o f storing long - term history with three multi plicat ive g ates [ 17 ]. General ly, chords and melodies are for med in a sequential order, which is affected by both the p revious and next order. B ased on this, we can predict that if we reverse the lead sheet a nd train the musical progre ssions, a meanin gful sequ entia l context simila r to the ori ginal s will appear. Hence, we apply a BLS TM so that th e ne twork can reflect musical context not on ly in forwa rd but also in backward directions. As shown in Figure 1, the input semitone ve ctors from each bar enter the networ k seq uentially during the time step (i.e. a fixed num ber of bars) and emit the corresponding output chord classes i n the same order . This is possible because the hidden layer in the network return s the output for each input. I n order to train this sequence of multi ple bars, we reconstr uct our datase t by applying the window wi th the s ize of the time step and ove rlapping the window wit h the hop si ze of one bar . Each window, composed of multiple bars, is then used as a sa mple to train the netwo rk . For our m odel, w e build a time distributed input layer with 12 units, which rep resents the sequence of semitone vectors, 2 hidden lay ers with 128 B LSTM units, and a time distributed output layer with 24 uni ts, which represents the sequence of cho rd classes. We emp iric all y choos e the number of hidden layers and units that yield the best result. We us e hyperbolic tangent activati on function for the hidden layers to reconstruct the features in a nonlinear process. We then apply the sof tmax function fo r th e output layer to generate values corresp onding to the pro bability of each class. Dropout is also em ployed w ith a rate of 0.2 on all hidden layers to preven t o verfitting. We use mini - bat ch gradient descent with categorical cross entropy as the cost f unction and Adam as the optimiz er. In ad dition, for th e model tra ining pro cess, we use a batch size of 512 and early stopping for 10 epoch patience . 2.3 Hidden Markov Mod el W e apply tw o types of su pervised HMM as baseline model s . First is a simple HMM which is a generative model and the other is hybrid deep neural network – HMM (DNN - HM M) which is a sequ ence - discriminative model [18 ] . 2.3.1 Simple HMM T he sim ple HM M consists of three parameters: initial state distr ibution, transition prob ability and emission probability . In our case, t he initial state distribution is the histogram of e ach chord in our train set . The transit io n probability is computed using the bigram of c hord transition and it is assumed to follow the rule of general first - order M arkov chains. A h igher - order transition probability is not tak en into acc ount because the fixed length o f an input bar in our task is not lo ng eno ugh . The emission probability is determined by a m u ltinomi al dist ribution of sem itone observati ons from each chord class . Once the paramet ers are learned , the model can generate a sequ ence of hidden chord states from a melody with three steps. First, the prob abilities of 24 chord classes in each bar are determin ed by the melody distribut ion in each bar. As mention ed above, the simple HMM is a generativ e model. Hence, it uses not only the emissi on probabi lity but also a class prior to calculate posterior probab ility with the Bayes rule. We define the clas s prior s ame as the initial probability, which is the histogram of each chord. Secondly, in order to reflect sequential effects, transition probability is appl ied to adjust the probabi lities of the chord classes. In case of the first chord state, since there is no previous state to consider the transition, the ini tial probability is applied instead. After that, a Viterbi decoding algorithm is implemented to find the optimal chord sequence that i s most likely to match along with the observed melody sequence [19]. 2.3.2 DNN - HMM The hybrid DNN - HMM is a po pular mode l in th e field of speech recognition [20 ] . It is a sequence - discriminative model , which adapts the advantage of sequential modeling method of HMM , but does not require th e class prior an d the emission pro bability to get posterior probability. DNN makes it possibl e because the probability result from a softmax ou tp ut laye r can be assume d as a posterior probability. Then th e two of HM M param eters - initial state distribution and transition probability – are applied identically with the simple HMM to emplo y the Viterbi decoding algorithm . We build an input layer with 12 units, 3 hidden layers with 128 units that are all ide ntical and an output layer with 24 units . W e u se hyperbolic tangent activation function for the hidden l ayer and softmax for the output layer . O ther features such as d ropout , loss function, optimizer and batch size are applied in the same settings of BLSTM . 3. EXPERIMENTS In this s ection, we first introduce our data set , which is parsed from digital lead sheet s. Then we present the experimental set up fo r evaluating the performance of chord generation model s. W e conduct both quantitative and qualitative evaluatio ns for this task . 3.1 Data set We use the lead shee t dat abas e provi ded by Wikif onia .org , which was a public lead sheet r eposit ory. The site unfortu - nately stopped service in 2013, but some of the data, which consists of 5,533 Western music lead sheets in MusicXML format, including rock, pop, coun try, jazz, folk, R&B , chil- dren’s song, etc., was obtained before the termina tion and we extracted featur es from the data for only academic pur- pose . From the obtained database, w e collect 2, 252 lead sheets , which are al l in major key , an d th e m ajority of the bars in the lead sheets hav e a single c hord per b ar . If a bar consists of two or more chords, we choose the firs t chord in the bar. Then we extr act musi cal featur es and convert them to a CSV format (see Section 2.1) . The set is split into two sets – a t raining set of 1802 song s , which consists of 72,418 bars and a test set of 450 son gs , which consists of 17 , 768 bars . Since music al features in th i s dataset can be useful for not only chord gener ation but also for other kinds of symbolic music tasks, the dataset is shared on our websit e ( http://marg.snu.ac.kr/chord_generation/ ) for public access . 3.2 Quanti tative Evaluat ion We perf orm a quantitative analysis by comp aring the accuracies of chord estimation from each model using the test se t . The accuracy is calculated by count in g the number of matching samples between the predic ted and the true chord s and by dividing it by the tota l num ber of samples . W e m ainly apply a 4 - bar melody in put fo r our task, but also experiment w ith 8-, 12 - and 16 - bar input s to analyze the influenc e on the leng th of a melody sequen ce. Determi ning the “rig ht ” chord is a difficu lt proces s because chord selection can vary among people based on their musica l st yles and tastes . Howeve r, the aforementioned accuracy calculation is often used to evaluate the capability of incorporating the long - term dependency in the musical progression [6 , 8 ]. Therefore , we use it for m easuring which model reflect s the relationship betwee n chord an d melody most ad equatel y . 3.3 Qualit ative Evalua tion As men tioned abov e, there is a limit to evaluate the mo del performance only by a quantitative analysis. Thus, we also conduct qualitati ve evaluati on based on subjective rating from actual user . This assessment allows us to determine the va lidity of each m odel by comp aring how the chords generated from different models are perceived by actual users . For th e exp eriment, we col lect eighteen 4 - bar - length melodi es from lead sheets of thirteen K - pop songs and f ive West ern pop s ongs. E very m elodic sequence is converted into a vector of 12 sem itones as described in Section 2 .1. HMM, DNN - HMM, and BL STM t hen generate chord sequences from each v ector. T hose sequences are evaluated by 25 musica lly untraine d participants (13 ma le s and 12 females ) through a web - based survey. The participa nts complete 18 sets of surveys in their own pace. At the beginning of each set, participants liste n to a melody . A fter that, participa nts listen to the four types of chord progressions, including the one from the original song , along with the melod y. Participant s are asked to rate each ch ord progression on a five - point scale (1 – ‘not appropriate’; 5 – ‘very appropriate’). At the end of each set, participants also are asked to answer a question whether they hav e pre - existing familiarity with the original songs. The audio sa mples used for expe riment are avail able on our website. 4. RESULT S 4.1 Chord P redi ction Performance Table 1 presents the accuracy results of three model s for four instances of differen t ba r leng ths. The r esult s show that the B LSTM method achieves th e best perform ance on the test set followed by D NN - HMM and HMM . Accordi ng to the avera ge score s of models, BLSTM has 23.8% and 11.4% performanc e increase from the HMM and DNN - HMM , resp ectively. The result s also demonstrate that the number of input bar s is not an important factor affect ing the accurac y for all models since they don’t show obvious linear variation s. To examin e the quality of predic t ed chords from each model more in dept h, we compute the results of each model into a confusion matrix . This allows u s to easily analyze the results th rough visualization . W e normalize the ma trix with the number of samples in respective chords so that each row re pres ents the distribution of predicted chord s on each true chord class . I n Figure 3 , we displ ay this normalized confusion matrix of each model . A number of noteworthy findings from each m atrix are observed. First , HMM yields a skewed result that shows severe misclassification of ch ords especially on C, F and G as sh own i n Fi gure 3 (a). We hypothesize this is resulted from the lack of c omplexity of the model. Emission probability, one of the parameters of the m odel, does not properly capture the accurate correlation between the chord s and corresponding melodies . Moreove r, the fact that the training data contain s more freq uent occurrenc es of C, F and G chords (over 60% in to tal sam ples ) reduc ed the accuracy of the HMM m odel which uses th e p rior probability to obtain the posterior as mentioned in Section 2.3.1. Lastl y, a n oticeable bi as in transit ion matri x moving to C chord also seems to lower the precision of the model. The result of DNN - HMM is similar to HMM but the skewness on C chord spreads out little bit to F and G chords. Despit e our initial expecta tion that the DNN would perform better since it is a discriminative model that calculates posterior directly, still many miscl assifi cati ons on three chords exist as shown in Figure 3 (b). To find the reasoning behind this observation, we test simple DNN with 1 - bar input w ithout the sequential pa ram eter of HMM . The accurac y is higher than DNN - HMM (46.93 %) and the confusion m atrix produces more diagonal elements as shown in Figure 4 . This find ing supports that the transition T able 1 . Cho rd pr ed ic tion per fo rm an ce usi ng di ff e re nt numbe r of input bar . probability of HM M force s the mod el to ge nerate limited classes and also t hat the model is not adequate to train various chord progr essions. In c ontrast to the HMM based method, the confusion matri x of the BLSTM shows a less s kewed d istribution an d clearer diagonal elements as shown in F igure 3 (c). BLSTM has m uch more complex parameter s in hidden layers, which train the sequen tial info rmation of both melodi es and chords . W e believe this p roperty makes the performance b etter compa red to the oth ers. 4.2 User P referen ce I n the user subjective test , evaluation scores are obtain ed from 450 sets (18 sets x 25 participants) . E ach set contains chord sequence s from HM M, DNN - HMM, and BLSTM . A n original chord sequence is also included for rela tive comparison of the generated results to the original . These four chord seque nces are evaluated as described in Section 3.3. Figure 5 shows the example o f melody and chord sequences which is used in the user test a nd more examples are available to li sten o n our websit e. The average score of each model is shown in Figure 6 . The original chord progression is preferred the most followed b y BLST M, DN N - HMM, and HMM. To investigate w hether differenc es on scores between the results are critical , w e conduct one - way repea ted m easure ANOVA sett ing eac h model as a variable. The result shows that at least one o ut o f fou r sc ores is significantly different from the others . (F(3, 1772) = 310, p < 0.001). We the n conduct a pairwise t - test with Bonfe rroni correction on the mean scores between each pair of models for a post - hoc analysis . A s a re sult, differences be tween all pairs are proven to be significant ( p < 0.01 ) . Therefore , i t can be concluded that the BLSTM produce s the most satisfying ch ord seq uences among the ot her computational models but it produce s less satisf ying results than the original . Moreove r, sinc e the differen ce between BLSTM and DNN - HMM is bigger than other pairs, i t seems there is a big quality difference b etween them . To verify our hypothesis that having fam iliarity with the original song affects the result w e perform a further analysis. We separate 450 evaluation sets into two, 248 sets marked as known and the rest as unknown, and conduct further analysis. A simple comparison of those two sets b ased on the evaluation scores shows that awareness of the songs does not aff ect the pr eference rank of the models. W e also perform one - way repeat ed measure F igu re 3 . Normali zed confusi on matrix of H MM (a ) , DNN - HMM (b) , an d BLS T M (c) us ing 4 - bar me lo dy inp u t . Figu re 4 . No rmaliz ed confusion matrix of sim p l e DN N usi ng sing le bar melod y input . Figu re 5. An ex a mple o f g enera ted chor d pr ogr ession s from th r e e d iff er en t mo d e ls an d th e or igin al pr og r ess io n . ANOVA for each g rou p of awareness (grou p of known songs: F 3, 964 ' = ' 286, p ' < ' 0.001 ; group of unknow n songs: F 3, 780 ' = ' 72, p ' < ' 0.001 ) and pairw ise t - test w ith Bonferro ni correctio n. The r esults are presented in Figure 7 . As shown in the figure , when songs are unknown, the preferenc e for HMM based models increases while it decreases for B LSTM g enerated and origin al chords. A plausible explanation for this observation can be that when the listener knows the song, he/she is more p erc eptive of the monoto nous chord sequence s generated from HMM and DNN - HM M which tend to produce more of C, F and G than other chords . However, when the list ener does not know the song, he/she is less aware of the mono ton ous progression of the chords and tend to give more generous scores to tho se two mo dels. For BLSTM, the result is the opposite. Listeners who are mo re used to the dynamic chord progression of the original song tend to give relatively hig her scores to BLSTM than to HM M based methods probably because BLSTM often generates a m ore diverse chord sequences. On the other hand, when t he songs are unknow n, relative preference towards both B LSTM and the original chords is les s strong . The reduced gap among four different options when the songs are unknown may be explained by the assumption that when the son gs are not familiar , all four options are relativ e ly equally acceptable to the listeners. Regardless of the difference in the results, howev er, B LSTM is preferred over the other two m odels in bo th cases. 5. CONCLUSION S We have intro duce d a novel approach for generating a chord sequence from symbolic melody using neural network models. The result shows that BLSTM achieves the best perform ance followed by DNN - HMM and HMM. Therefore , the recurr ent lay er of BLSTM is more appropriate to model t he re lationship between melody and chord than HM M based s equential method s . Our work can be further impro ved by modifyi ng data extracting and preprocessing step s . First, since the lead sheets used in this study have one chord in each bar, the task is constrain ed to one - chord generation for each bar. Since actual musi c usually contains a lo t of bars with multi ple chords, a d ditiona l extraction process is needed to allow the model to generate multiple chords per bar. Secondly, in the preprocessi ng step, all c hords are mapped into only 24 classes of major and m inor . Thus, further chord classes such as maj7 and min7 need to be included for performance impro vement. Lastly, our input feature vectors consist of 12 semitones by accumulating the melody notes in ea ch bar, so the sequential in formation o f melodi es i n ea ch ba r di sappear s in this step. Thus, another feature - preprocessing step may be needed not to omit the information , which can b e a crucial factor in the future work. We h ope that more resea rches wi ll be do ne thro ugh our published data to overcome the limitations as w ell as to further dev elop of this task. 6. ACKNOW LEDGEMENTS This work was supported by Kakao C orp. and Kakao B rain Corp. Figu re 6 . M ean s co re of subjective evaluati on of each model. Figu re 7 . M ean s co re o f su bje c tive ev alu at ion for a group of known songs (a ), and of unknown s ongs (b ). 7. REFERENCES [1] E. C. Lee and M. W. Park: “Music Chord Recommendat ion of Self Composed Melodic Lines for M aking Instrumental Sound ,” Multime dia Tool s and Applications , pp. 1 - 17, 2016. [2] S. D. You and P. Liu: “Automatic Chord Generation System Using Basic Music Theory and Genetic Algori thm,” Proceedi ngs of the IEEE Conference on Consumer Elect ronics (ICCE ) , pp. 1 - 2, 2016. [3] I. Sim on, D. M orris, and S. Basu: "My Song: Automat ic Acco mpan iment Generati on for Vocal M elodies," Proceedings of the S pecial I nterest G roup on C omputer -H uman I nteraction (SIGCHI) Conferen ce on Human Factors in Computing Systems, pp. 725 - 734, 2008. [4] H. Lee and J. Jang: "i - Ring: A System for Humming Transcrip tion and Chord Generation," Proceedings of the IEEE Internationa l Conferen ce on Multimedia and Expo (ICME) , Vol. 2, pp. 10 31 – 1034 , 2004 . [5] M. Allan and C. Willi ams: “Harmoni zing C hor ales by Probabilistic Inference,” Advances in Neural Information Processing Systems , V ol. 17, pp. 25 - 32, 2005. [6] S. A. Raczy ń ski , S. F ukayama , and E. Vincent: “Melody Harmonization with Interpolated Probabilis tic Models,” Journal of New Music Research , Vol. 42, No. 3, pp .223 - 235, 2013. [7] J. Paiement, D. Eck, and S. Bengio: "Probabilistic Melod ic H armonization," Proceedin gs of the 19th Canadian Confer ence on Artificial Intelligen ce , pp. 218 - 229, 2006. [8] J. P. Forsyth and J. P. Bello: "Generating Musical Accompan iment Using Fi nite St ate Transducer s, " Proceedings of the 16th International Conferen ce on Digital Audio Effects (DAFx - 13) , 2013. [9] A. Graves, M . Liwicki , S. Fernández , R. Bertolam i, H. Bunke, and J. Schmidlhuber: “A Novel Co nnectionist System for Uncons trained H andwriting Recognition,” IEEE Tra nsactions on Pattern Analysis and Machine Intelli gence , Vol. 31, N o. 5, pp. 855 - 868, 2009. [10] H. Sak, A. Senio r , and F . Beaufays: “Long Short - Term Memory Based Recurrent Neural N etwork A rchitectures for Large V ocabulary Speech Recognit ion ,” CoRR arXi v: 1402 .1128 , 2014. [11] M. Wöllmer , A. Metallino u, F. E yben, B. Schu ller , and S. Narayanan: “Context - Sensitive Multi modal Emotion Recogniti on fro m Speech and Facial Expression U sing Bidirectional LSTM Modeling,” Interspeech , pp. 2 362 - 2365, 2010. [12] I. Liu and B. Ramakrishn an: “Bach in 2014: Music Compositi on with Recurrent Neural Network,” CoRR arXiv: 1412.3191 , 2014. [13] D. D. Johnson: “Generatin g Po lyphoni c Mus ic Using Tied Para llel Networks,” International Conference on Evolutionary and Biologically Inspired Music and Art , pp. 128 - 143, 2017. [14] A. E. Coca, D. C. Corrêa , an d L. Zhao: “Computer - a ided Mu sic Comp osition with LS TM Neu ral Network and Chaotic Inspir ation, ” Proceedings of the IEEE Interna tional Joint C onference on Neural Networks (IJCNN) , pp. 1 - 7, 2013. [15] K. Choi, G. Fazekas , and M. Sandler: “Text - based LSTM N etworks for Automatic Music Composition,” CoRR arXi v: 1604. 05358 , 2016. [16] F. A. Gers , J. Schm idhuber , and F. Cummins: “Learn- ing to F orget: Conti nual P rediction w ith LST M, ” Neural C omputation , Vol. 12 , pp. 2451 - 2471 , 200 0. [17] S. Hochreiter and J. Schmidhuber: “Long Short - Term Memor y,” Neural Computation , Vol. 9, No. 8, pp. 1735 - 1780, 1997. [18] K. Vesel ý , A. G hoshal, L. Burget , and D. Povey: “Sequence - Discrimi native Training of Deep Neural Network s,” Interspeech , pp. 2345 - 2349, 2013. [19] K. Lee and M. Slaney : “ Automatic Chord Recogni - tion from Au dio U sing a Hmm with Supervised L earning, ” Proceedin gs of t he 7t h International Con- ference o n Music I nformation Retrieval , pp. 1 33 – 137, 2006. [20] G. Hinton , L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A . Se nior, V . V anhoucke, P. Nguyen, T. Sainath , and B. Kingsbury: “Deep Neural Networks for Aco ustic Mo deling in Speech Recognition ,” IEEE Signal Processing Magazine , Vo l. 29, No . 6, pp. 8 2 - 97, 2012.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment