Autonomous Control of a Tendon-driven Robotic Limb with Elastic Elements Reveals that Added Elasticity can Enhance Learning

A utonomous Contr ol of a T endon-driv en Robotic Limb with Elastic Elements Rev eals that Added Elasticity can Enhance Lear ning Ali Marjaninejad 1 , Jie T an 2 , and Francisco V alero-Cue v as 3 , Senior Member , IEEE Abstract — Passiv e elastic elements can contribute to stability , energetic efﬁciency , and impact absorption in both biological and r obotic systems. They also add dynamical complexity which makes them more challenging to model and control. The impact of this added complexity to autonomous learning has not been thoroughly explor ed. This is especially rele vant to tendon-driven limbs whose cables and tendons are inevitably elastic. Here, we explor ed the efﬁcacy of autonomous learning and control on a simulated bio-plausible tendon-dri ven leg across different tendon stiffness values. W e demonstrate that increasing stiffness of the simulated muscles can require more iterations f or the in verse map to con ver ge but can then perf orm more accurately , especially in discr ete tasks. Mor eover , the system is r ob ust to subsequent changes in muscle stiffnesses and can adapt on-the-go within 5 attempts. Lastly , we test the system for the functional task of locomotion, and f ound similar effects of muscle stiffness to learning and performance. Given that a range of stiffness values led to improv ed learning and maximized performance, we conclude the robot bodies and autonomous controllers—at least f or tendon-driven systems— can be co-developed to take advantage of elastic elements. Importantly , this opens also the door to development efforts that recapitulate the beneﬁcial aspects of the co-evolution of brains and bodies in vertebrates. I . I N T RO D U C T I O N Elastic elements are known to contrib ute in a passive way to a number of advantageous mechanical properties of robotic and biological systems. These include absorbing impacts, storing energy and postural stability . By absorbing impacts, elastic elements reduce noise and prev ent damage to the structural elements and actuators (linkages, hinges and motors in robots; and bones, joints, and musculotendons in animals) or the en vironment [1], [2], [3], [4]. Also, opposing pairs of elastic elements act lik e proportional controllers (that can only pull) that can passiv ely grant postural stability [5], [6], [7], [8], [9], [10], [11], [12]. It is also known that great energetic efﬁciency can be achieved by storing and timely release of energy in elastic elements [3], [13], [14]. These beneﬁts, howe ver , come at a cost. They can add nonlinearities, hysteresis and oscillatory modes to the system dynamics and, in general, make it harder to model and ﬁnd accurate and robust analytical control solutions [15]. This is especially the case for analytical control methods that require precise models of the plant and the en vironment to operate accurately [16], [17] which is, in general, infeasible for most 1 A. Marjaninejad is with Univ ersity of Southern California, Los Angeles, Ca 90089 USA e-mail: marjanin@usc.edu 2 J. T an is with Google Brain, Mountain V ie w , CA, 94043. e-mail: jietan@google.com 3 F . V alero-Cue vas is with University of Southern California, Los An- geles, Ca 90089 USA (corresponding author) email-: valero@usc.edu; phone: 213-740-4219 real-world plants and problems. Moreov er , the mechanical properties of elastic materials are more often susceptible to changes in environmental (e.g., temperature), and use-case (e.g., wear and tear) factors. An alternati v e approach to the control of plants with elastic elements would be to use control methods that do not depend on prior models, are data-driv en, autonomous, or adaptable on the ﬂy . Ho we ver , the performance of these methods in dealing with added dynamical complexities introduced with the elastic elements has not been thoroughly e xplored. Moreov er , the robustness of such methods to changes in stiffness values or operation in different functional regimes (e.g., nonlinear springs) needs to be addressed as well. This is an under-studied problem especially on bio-inspired, tendon- driv en systems. T endon-driv en systems are particularly interesting because they can offer great functional agility and versatility and freedom of design (e.g., actuator placement and tendon routing) [18], [19], [20], [21], [22]. Moreover , they can help us better understand and ev en approach the diversity and functional versatility of animals by shedding some light on gov erning principles of vertebrate form and function [23], [24]. These systems, on the other hand, are harder for engi- neers to model and analytically control for a number of reasons. T o begin with, they are simultaneously under- and ov er-determined as, respectiv ely , multiple muscle forces can produce a same net torque at a joint, yet a single joint rotation sets the lengths of all muscles that cross it. Thus, it can be challenging to ﬁnd solutions that satisfy all the constraints imposed by tendons and by task speciﬁcations at the same time [23], [20]. Moreover , the fact that their actuators are not directly operating on the degrees of freedom (as is the case in joint-driv en systems), makes it challenging to use an off the shelf controller (such as a simple PID setup) without having access to dynamical equations of the system or a forward or inv erse kinematics model [25]. Also, these tendon-driv en systems often require accurate modeling and control strategies for applications such as animation of life- like ﬁgures [26], control of anatomical limbs to understand neurological conditions [27], [28], [29], or functional elec- trical stimulation of limbs (e.g., [30] or [31]). Here, we explored the efﬁcac y of autonomous learning and control on a simulated bio-plausible tendon-driven leg across different tendon stiffness values. For he sake of generality , in this ﬁrst study , we used two autonomous learning algorithms—one that builds a data-driv en explicit kinematics model of the limb vs. one that uses end-to-end learning (see Methods)—to gauge the ef fect of elasticity of the actuators on learning and performance. Our results show that autonomous learning (both with an explicit in verse map and end-to-end) could learn to control the limb across all stiffness values. Our results also show that an appropriate value of added stiffness can enhance the learning and pre- cision in all cases and ev en exhibit emergence of lower energy consumption. This is of great signiﬁcance because the elasticity that is inherent to some types of plants (i.e., tendon- driv en systems) can no w be leveraged to improv e learning and performance. I I . M E T H O D S In this paper, we studied how adding elastic elements affects autonomous learning in a two-joint three-tendons simulated limb (similar to [23], [25]) in the MuJoCo en- vironment [32](Fig. 1.a). The muscle model we use consist of a contractile element with Force-Length-V elocity prop- erties [32], [20], a small parallel damper (100 Ns/m) and a parallel elastic element with stif fness v alue ‘K’ (see Fig. 1.b). Speciﬁcally , we studied the con ver gence of the in verse kinematics map, how its performance accuracy changes with stiffness, as well as its adaptability when learning with one stiffness value and then performing using a dif ferent v alue. As for learning, we used our autonomous few-shot hierar- chical learning algorithm General-to-Particular (G2P) [23], and the end-to-end Proximal Policy Optimization (PPO) au- tonomous learning algorithm [33]. G2P is a hierarchical au- tonomous learning algorithm that, on its lower -lev el, creates an in verse kinematics map using output kinematics collected from an initial random set of actuation commands (motor babbling). Systems that use an explicit kinematics model are, in general, easier to study and interpret, more data ef ﬁcient and can generalize to a wider range of tasks; ho we ver , the y can suffer from inaccuracies in the model especially dur- ing complex dynamical interactions (e.g., contact dynamics, injury to the body , or changes in the en vironment) [25], [23], [34], [35], [36], [37]. Systems that perform end-to-end learning (such as PPO), on the other hand, usually require larger number of samples to learn to perform a task, are harder to interpret due to their implicit modeling, and usually cannot generalize well across tasks [38], [39], [33], [40], [41]. These methods, howe ver , can achiev e better asymptotic performance ev en in challenging tasks. A. Simulated e xperiments For this study , we have performed three set of simu- lated experiments. In all simulations, elastic elements are considered as parallel elements with each musclotendon (Fig. 1.b); the stiffness value of all elements are equal for each simulation and refereed to as stiffness. The details for each of these set of simulations are provided below . 1) Contr olling the limb with differ ent stif fness values in the muscle model: In this simulation, for each stif fness value, we ﬁrst randomly activ ated muscles and recorded the resulting kinematics (motor babbling [23]) for 3 minutes (100 samples per second). The recorded kinematics are joint Fig. 1. (a) The studied tendon-driven limb in MuJoCo en vironment. (b) each musculotendon consists of a muscle model (M), elastic element (K), and a damper (B). angles, angular velocities, and angular accelerations for both joints (a vector of 6 values). Next, we trained a Multi- Layer Perceptron (MLP) Artiﬁcial Neural Network (ANN; one hidden layer with 15 neurons; trained for 20 epochs; 80 % training 20 % validation; loss function: MSE, optimizer: AD AM) with kinematics as input and activ ations as output to form the in v erse kinematics map (similar to [23], [25]). Finally , this in verse map was used to control the system to perform two tasks: Cyclical and Point-to-point mo vements. a) Cyclical movements: In this task, the system was prescribed to move to generate a perfect circle in its con- ﬁguration space (joint angle space). I.e., Joint angles change sinusoidal with π / 2 phase difference. The frequency of these cyclical movements was set to 0.7 Hz and the task was continued for 21 c ycles (total of 30 seconds). b) P oint-to-point movements: Unlike the c yclical mo v e- ments task, which is a smooth continuous task, the point- to-point task is consisted of discrete joint angle locations connected with rapid movements. In this task, 10 indepen- dent random angles (sampled from a uniform distribution within the range of each joint) are selected for each joint. The system then is commanded to go to each joint angle pair and stay there for 3 seconds (total of 30 seconds). Similar to our pre vious work [23], we chose these tasks since they cov er both extremities in the movement spectrum between continuous and smooth movements and discrete movements with fast transitions. For each joint, we calculate the error as the Root Mean Square Error (RMSE) of the dif ference between the joint angle and the desired angle in Radians. W e disregard the error for the ﬁrst 25 % of the signal to make sure any initial condition effect is washed out [23], [25]. 2) Adaptability to changes in stiffness: Stiffness value of an elastic element can change as a function of many physical factors such as temperature, wear and tear , etc. This can potentially endanger performance of the autonomous control of a system e ven if the system performs accurately in absence of any changes. This task is designed to study this effect as well as studying the feasibility of adapti ve learning on-the- go (without a need to stop the system and redo the babbling) to compensate for these changes. Here, we ﬁrst perform the motor babbling for a system with an initial stiffness value (lets call it A) and train the in v erse map with the collected data. Then, we change the stiffness value (to lets say B) and command the system to perform a cyclical movement attempt (described abov e). After each attempt, we concatenate all collected data and reﬁne the in verse map using the cumulative data (reﬁnement phase of G2P [23]). Here, we are sho wing results for up to 5 reﬁnements for a system the stiffness v alue of which has changed (from A to B) as well as systems that performed both babbling and reﬁnements with the same stiffness value (A to A and B to B) to provide better insight for a better comparison of the adaptation performance. 3) Functional task of locomotion: Studying the ability of our system in creating an in v erse kinematics map for different stiffness values provides great insight into under- standing the effects of stiffness on control and learning. Howe v er , a precise in v erse map does not necessarily mean better performance in performing functional tasks that also features contact dynamics [25]. Also, most autonomous control methods do not use an explicit in verse kinematics map. Therefore, it is important to study the ef fects of stif fness on the performance of the system for a functional task. W e chose a locomotion task that entertains contact dynamics, deals with gravity and inertia, and yields a reward as a measure of success. For this task, the limb is connected to a chassis that can move in x-axis (forward-backward) with friction to stop the system from ﬂoating. The system can also mo ve on y-axis (up-down) where it is assisted with a spring-damper mechanism (similar to a gantry [25]). Please see the Supplementary V ideo for the task in action. W e have performed this task with two leading algorithms in autonomous learning. First, the G2P algorithm [23], [25], which is speciﬁcally designed to handle challenging task of learning and adaptation with no prior model and only using limited experience (which is a need in most real- world applications) and has prov ed to work well on the tendon-driv en systems. Second, we have chosen the PPO algorithm, which is one of the leading end-to-end learning methods: for each observation, predicts activ ations that will yield high reward. The G2P implementation was in a faithful manner to the original paper [23], babbling time was selected to be 3 minutes, and the exploration-e xploitation reward threshold was set to 3 meters of the chassis movement in the forward direction. Each attempt would be consisted of 10 steps (1.3 seconds each). For PPO, we used the PPO1 implementation from Open AIs stable baselines repository . W e run the training for 5000 episodes (1000 samples each; sampling rate: 100Hz). 1 2 3 4 5 6 7 8 9 10 Epoch # 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0.090 0.095 Epoch MSE Learning curves for different stiffness values (S) K: 0 K: 500 K: 1k K: 2k K: 4k K: 7k K: 10k K: 15k K: 20k Fig. 2. MSE over the training data as a function of the epoch number across stiffness values (A verage of 50 Monte Carlo runs). I I I . R E S U LT S 1) Contr olling the limb for differ ent stiffness values: Fig. 2 shows the MSE ov er the training data as a function of the epoch number across stiffness v alues. W e see a consistent pattern in the training error curves in which systems with higher stif fness v alues start with lar ger error , yet once enough training rounds (epochs) are performed, they exhibit the smallest training errors. This pattern can be explained by the f act that more stif fness will add more dynamics to the system which initially makes it harder for the ANN to catch, but once conv erged, these extra dynamics can reduce the size of the solution space [42], [18], [20] (less ambiguity caused by the under-determined nature of the system) and therefore make more precise predictions. Ho we ver , these MSE values only sho w ho w well the ANN could ﬁt to the training data coming from the motor babbling (see Methods). Therefore, to study its performance across tasks, we now focus on the results collected from the cyclical and point-to- point tasks. Fig. 3 sho ws RMSE values for this simulation across all tested stiffness values (also see the Supplementary V ideo). W e see that stiffness in the range of 2k-10k N/m can signiﬁcantly improve performance compare to zero stiffness or very high stiffness values. This improvement is ev en more signiﬁcant for the point-to-point task which is explained by the fact that this task is more prov e to the adv erse ef fects of control in under-determined systems (see Discussion and [18], [20]). 2) Adaptability to chang es in stif fness: Fig. 4 shows the performance of the system trained and tested with dif ferent stiffness values as well as its progress through reﬁnements. Fig. 4 also sho ws the performance of systems trained, reﬁned, and tested with the same stiffness values for comparison. In Fig. 4, A and B correspond to 7K N/m and 2K N/m, respectiv ely . Adaptability between other stiffness values, in general, also followed the same pattern (in all error bars and error shades in this paper, end to end height of whiskers/shades are equal to one standard deviation of the 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness (N/m) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RMSE (rads) Average across both joints 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness (N/m) P r o x i m a l j o i n t ( q 0 ) 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness (N/m) D i s t a l j o i n t ( q 1 ) cyclical point-to-point Fig. 3. RMSE of joint angles as a function of stiffness for cyclical (dark blue) and point-to-point (light blue) tasks. 50 Monte Carlo runs for each case. babbling refinement #1 refinement #2 refinement #3 refinement #4 refinement #5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RMSE (rads) average across both joints babbling refinement #1 refinement #2 refinement #3 refinement #4 refinement #5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 p r o x i m a l j o i n t ( q 0 ) babbling refinement #1 refinement #2 refinement #3 refinement #4 refinement #5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d i s t a l j o i n t ( q 1 ) A_A B_A B_B A_B Fig. 4. RMSE of the systems trained and tested with different stiffness values. A: 7K N/m and B: 2K N/m. (A B (orange): trained with A, reﬁned and tested with B; B A (light green) the other way around), as well as the performance of systems trained, reﬁned, and tested with the same stiffness values for baseline comparison (A A and B B, red and dark green, respectiv ely), 50 Monte Carlo runs for each case. data). Fig. 4 results show that it is feasible for a system to create and initial inv erse map and then adapt on-the-go while con v erging to similar performance measures as if it did not hav e a change. This is important since it will pro ve the use of elastic elements that are subject to change due to physical features (temperature, wear and tear , etc.) to be feasible in real-world robotic systems. W e used G2P here and showed the feasibility of adaptation on-the-go to the changes in the tendon stiffness values. Howe v er , we want to underline that other adaptive learning methods can also be used (e.g., [34], [35], [43]). 3) Functional task of locomotion: In this section, we study the results for the functional task of locomotion for two autonomous learning algorithms, namely , G2P and PPO (see methods). It is important to note that the focus of this section is to study the potential effects and contributions of the elastic element in an unbiased manner and not maximizing performance (e.g., using feedback to minimize the error [25], ﬁnding the optimal solution or the most efﬁcient one) or modifying the algorithms to do so. Fig. 5 shows the results for the G2P implementation of the locomotion task for 50 Monte Carlo runs (also see the Supplementary V ideo). Fig. 5a shows the success rate (if the algorithm found a solution that passes the 3m threshold within 100 exploration attempts). This ﬁgure shows that except for very high stif fness values, the algorithm could ﬁnd a way to fulﬁl the task. Fig. 5b shows the ultimate re ward for the successful attempts. Since G2P algorithm is not strict on maximizing the reward (ﬁnds a good-enough solution within few attempt), we cannot see an y big distinction between these ﬁnal re ward. Fig. 5c shows the ener gy consumption for the attempts with the ultimate reward (here we deﬁne energy as the sum of squared acti vation v alues for all three muscles and across time). This ﬁgure shows that the energy consumption for the mid-range stiffness values is lower . It is important to note that we did not put an ener gy cost term in the reward and therefore, this pattern is an emergent feature of the physics of the system. This result justify future studies that would focus on utilizing stiffness in reducing energy costs. Lastly , Fig. 6 shows results for the PPO implementation of the locomotion task for 50 Monte Carlo runs (also see the Supplementary V ideo). Fig. 6a shows that all learning 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness 0.0 0.2 0.4 0.6 Success rate (%) 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness 0 1 2 3 4 5 Reward 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness 0 250 500 750 1000 1250 1500 Energy Fig. 5. Results of the locomotion task using the G2P algorithm.50 Monte Carlo runs for each case. 0 50 100 150 200 250 300 350 400 450 500 Episode # −10 −5 0 5 10 15 20 Displacement (m) a) Learning curves: reward vs. episode plots K: 0 K: 1k K: 4k K: 10k K: 20k 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness (N/M) 30 40 50 60 70 80 Episode # b) Passing threshold episode 0 500 1k 2k 4k 7k 10k 15k 20k Stiffness (N/M) 12 14 16 18 20 Displacement (m) c) Average final rewards Fig. 6. Results of the locomotion task using the PPO algorithm, 50 Monte Carlo runs for each case. curves exhibit a consistent pattern where systems with mid- range stiffness v alues raise faster and also end up with higher ultimate rew ards. Fig. 6b shows the ﬁrst episode in which the Fig. 6a curves passed an arbitrary reward cap (9m for this ﬁgure). The plot can slightly change based on the selected threshold b ut the pattern is consistent in that systems with mid-range stif fness values need less episodes to pass any rew ard cap. Finally , Fig. 6c sho ws the ultimate re wards in which, again, consistent with all other ﬁndings of this paper, a mid-range stiffness value resulted in higher performance. It is important to note that although the PPO algorithm does not use an explicit inv erse map, it builds an implicit in verse map which justiﬁes why the results are consistent with the ones coming from the G2P algorithm (which uses an in verse map in an hierarchical structure). One important point we observed in our simulations was oscillatory behaviour (chatter) in systems with very high stiffness (see Supplementary V ideo). The likely origin of this is that high stif fness in the muscle model, we no w see, can make the system hav e modes at higher resonant frequencies (analogous to high gains for small errors in a proportional controller) that can lead to instability and interfere with the numerical integrator . This happens at high stiffness values ev en though our MuJoCo model has mild damping and frictional losses distrib uted throughout the body (i.e., at joints, contact model, muscles, etc.) to make the system more stable, realistic and numerically efﬁcient. I V . D I S C U S S I O N Here we show , ﬁrst, the feasibility of autonomous learn- ing and adaptation in the presence of elastic elements in tendon-driv en systems. And second, we provide evidence that changes in the parallel stiffness of the actuators (i.e., muscle model) af fects both learning rate and performance. Our results are useful in that they show (i) fast learning and adaptation in systems known to be challenging to control with analytical approaches, and (ii) great promise and op- portunity for the design of robotic systems where tuning the stiffness of the actuators can greatly enhance perfor- mance while lev eraging the inherent passi ve properties of elastic elements that also grant stability and potential energy efﬁcienc y with, importantly , minimal to no degradation in learning rates. These ﬁndings are critical for the future ev olution of robot design, which to date has splintered into two main camps: ‘con ventional’ robot design with stif f bodies and actuators [44] vs. ‘soft robots’ that have few to no stiff ele- ments [45]. Our work here no w points to a third option that can, in principle, combine the beneﬁts of both approaches by populating the spectrum between them. In our prior work, we ha ve emphasized that the design space of tendon- driv en systems must include both the topology of the limb (i.e., the number , type and connectivity among its elements) and the parameters of the individual elements (e.g., joints, linkages and tendons) for both robots and musculoskeletal systems [46], [47]; we hav e also explored the extreme case of purely data-driv en locomotion of tensegrity structures [48] and limbs [23]. Howe ver , that work did not explicitly explore the consequences of elasticity to learning per se. W e no w argue that elasticity is an ine vitable element of tendon-driv en robots and biological systems (see Introduc- tion), and thus must be systematically and explicitly consid- ered in this current AI wav e seeking to de velop autonomous learning for robots, and to understand neuromuscular control in animals. As such, our results ar gue for , and enable, the co-dev elopment of robot bodies and autonomous controllers that take advantage of elastic elements, which can lead to improved learning and performance—while also taking advantage of its intrinsic beneﬁts of stability , energetic ef- ﬁciency , and impact absorption. It is important to underline that the main focus of this study was not to optimize for performance or energy efﬁciency . Moreover , we used two of the most recent algorithm that pro ve to be suitable for the test case in hand but similarly , other state of the art algorithms can also be used in the future to control these systems. Our results, therefore, open the door to de velopment efforts that recapitulate the beneﬁcial aspects of the co-ev olution of brains and bodies in vertebrates. One particularly interesting observation from Fig. 2 is that it was initially easier for the ANN in G2P to ﬁt to the data when the muscles had lo w stiffness v alues. And then, after a few epochs, the ﬁt was better with higher stiffness values. This suggests that, in principle, learning would be optimized if one were to start out with low stiffnesses that increased ov er time. This is paralleled by the fact that most vertebrates start their life with a more compliant anatomy which stiffens with dev elopment [49], [50], [51]. In our prior work, we hav e discussed in detail ho w the o ver -determined nature of tendon- driv en systems with stretch reﬂex es in the muscles can make them difﬁcult to control [20]. This is because the rotation of a joint will be impeded or disrupted if e ven one of the muscles that crosses it f ails to lengthen (via its stretch reﬂe x). That is, multiple constraints (i.e., muscle lengthenings) must be satisﬁed when driven by few variables (i.e., join angles). Such over -determined systems, which hav e more variables than equations, ha ve at most one solution and are solved in practice via least-squares error methods. In such methods, a solution is found by ﬁnding a set of variables that violate the constraint equations the least (in an Euclidean norm or sum-of-squares sense). This is why , in the past, we have called the elasticity of musculotendons (the combinations of muscle and tendon) as a ‘critical enabler’ of the neural control of smooth movements [20]. The results Fig. 2 bear this out: it is easier to learn to control tendon-driv en system where low stif fnesses at the muscles provide a large error margin for muscle lengths at the expense of performance; but stiffening the system once the initial learning has taken place will improv e performance. This, in a sense, is a form of morphological curriculum learning that can enable new thinking about ‘developmental robotics, ’ where changes that happen within an individual’ s life span improve learning and performance, echoing the work of Bongard where morpho- logical changes within a single indi vidual aid learning [52]. This is an interesting path for future work and is distinct from ‘ev olutionary’ robotics that occurs over multiple generations of individuals. Other future work could focus on the de velopment of hard- ware/software to e xploit these beneﬁts of elastic elements, especially in tendon-driven systems. This also opens up opportunities for testing autonomous learning algorithms and assessing their performance in more sophisticated designs (such as bipeds or quadrupeds, especially in their physical implementations), and more challenging tasks and en viron- ments. It is important to note that this study worked within the abilities and limitations of MuJoCo, which implements a very particular version of a Hill-T ype muscle model that does not include a tendon with the elasticity and viscosity parameters of the aponeurosis and tendon [20]. The stiffness values that we changed in the muscle model are those for the parallel elastic element to the force generating module that uses a simple approximation to the force-length and force- velocity properties of muscle [32], [20], and does not contain the natural spinal closed-loop control (i.e., afferentation) of muscles [53]. Studying the effects of the series elastic element (which, by the way , is also the stress-strain curve of a mechanical cable in a robot) would be an interesting and necessary path to follo w in the future work. C O D E A V A I L A B I L I T Y The code and the MuJoCo models used in this study and the supplementary video can be accessed through project’ s Github repository at: https://github.com/ marjanin/tendon_stiffness AC K N OW L E D G M E N T S This project was supported by NIH Grants R01-052345 and R01-050520, a ward MR150091 by DoD, and award W911NF1820264 by D ARP A-L2M program. Also, by USC Prov ost Fellowship to A.M. and the Consejo Nacional de Ciencia y T ecnologa (Mexico) fellowship to D.U.-M. R E F E R E N C E S [1] A. Ananthanarayanan, M. Azadi, and S. Kim, “T owards a bio-inspired leg design for high-speed running, ” Bioinspiration & biomimetics , vol. 7, no. 4, p. 046005, 2012. [2] P . M. W ensing, A. W ang, S. Seok, D. Otten, J. Lang, and S. Kim, “Proprioceptiv e actuator design in the mit cheetah: Impact mitigation and high-bandwidth physical interaction for dynamic legged robots, ” IEEE Tr ansactions on Robotics , vol. 33, no. 3, pp. 509–522, 2017. [3] A. Mazumdar , S. J. Spencer, C. Hobart, J. Salton, M. Quigley , T . W u, S. Bertrand, J. Pratt, and S. P . Buerger , “Parallel elastic elements improve energy efﬁciency on the steppr bipedal walking robot, ” IEEE/ASME T ransactions on Mechatr onics , vol. 22, no. 2, pp. 898– 908, 2016. [4] S. Jain and C. K. Liu, “Controlling physics-based characters using soft contacts, ” in ACM T ransactions on Graphics (TOG) , vol. 30, no. 6. A CM, 2011, p. 163. [5] G. A. Pratt, “Low impedance walking robots, ” Inte grative and Com- parative Biology , vol. 42, no. 1, pp. 174–181, 2002. [6] X. Zhou and S. Bi, “ A survey of bio-inspired compliant legged robot designs, ” Bioinspiration & biomimetics , vol. 7, no. 4, p. 041001, 2012. [7] S. Babikian, F . J. V alero-Cue vas, and E. Kanso, “Slow movements of bio-inspired limbs, ” Journal of Nonlinear Science , vol. 26, no. 5, pp. 1293–1309, 2016. [8] T . E. Milner , “Contribution of geometry and joint stiffness to mechani- cal stability of the human arm, ” Experimental brain resear ch , vol. 143, no. 4, pp. 515–519, 2002. [9] F . A. Mussa-Iv aldi, N. Hogan, and E. Bizzi, “Neural, mechanical, and geometric factors subserving arm posture in humans, ” Journal of Neur oscience , vol. 5, no. 10, pp. 2732–2743, 1985. [10] R. Osu and H. Gomi, “Multijoint muscle regulation mechanisms examined by measured human arm stiffness and emg signals, ” J ournal of neurophysiology , vol. 81, no. 4, pp. 1458–1468, 1999. [11] E. J. Perreault, R. F . Kirsch, and P . E. Crago, “V oluntary control of static endpoint stiffness during force regulation tasks, ” Journal of neur ophysiology , vol. 87, no. 6, pp. 2808–2816, 2002. [12] ——, “Effects of voluntary force generation on the elastic components of endpoint stiffness, ” Experimental brain resear ch , vol. 141, no. 3, pp. 312–323, 2001. [13] S. Seok, A. W ang, M. Y . M. Chuah, D. J. Hyun, J. Lee, D. M. Otten, J. H. Lang, and S. Kim, “Design principles for energy-efﬁcient legged locomotion and implementation on the mit cheetah robot, ” Ieee/asme transactions on mechatr onics , vol. 20, no. 3, pp. 1117–1129, 2014. [14] J. W . Hurst, “The role and implementation of compliance in legged locomotion, ” Ph.D. dissertation, Carnegie Mellon Univ ersity , The Robotics Institute, 2008. [15] J. T an, G. Turk, and C. K. Liu, “Soft body locomotion, ” ACM T ransactions on Graphics (TOG) , vol. 31, no. 4, p. 26, 2012. [16] D. Nguyen-Tuong and J. Peters, “Online kernel-based learning for task-space tracking robot control, ” IEEE transactions on neural net- works and learning systems , vol. 23, no. 9, pp. 1417–1425, 2012. [17] J. C. Doyle, “Guaranteed margins for lqg regulators, ” IEEE T ransac- tions on automatic Control , vol. 23, no. 4, pp. 756–757, 1978. [18] A. Marjaninejad and F . J. V alero-Cuev as, “Should anthropomorphic systems be redundant?” in Biomechanics of Anthr opomorphic Systems . Springer , 2019, pp. 7–34. [19] A. Marjaninejad, R. Annigeri, and F . J. V alero-Cuevas, “Model-free control of movement in a tendon-driv en limb via a modiﬁed genetic algorithm, ” in 2018 40th Annual International Confer ence of the IEEE Engineering in Medicine and Biology Society (EMBC) . IEEE, 2018, pp. 1767–1770. [20] F . J. V alero-Cuev as, Fundamentals of neur omechanics . Springer , 2016. [21] J. P . King, D. Bauer, C. Schlagenhauf, K.-H. Chang, D. Moro, N. Pollard, and S. Coros, “Design. fabrication, and evaluation of tendon-driv en multi-ﬁngered foam hands, ” in 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids) . IEEE, 2018, pp. 1–9. [22] S. Sueda, A. Kaufman, and D. K. Pai, “Musculotendon simulation for hand animation, ” ACM T ransactions on Graphics (TOG) , vol. 27, no. 3, p. 83, 2008. [23] A. Marjaninejad, D. Urbina-Mel ´ endez, B. A. Cohn, and F . J. V alero- Cuev as, “ Autonomous functional movements in a tendon-driv en limb via limited experience, ” Natur e machine intelligence , vol. 1, no. 3, p. 144, 2019. [24] A. Marjaninejad, J. A. Berry , and F . J. V alero-Cue vas, “ An analytical approach to posture-dependent muscle force and muscle activ ation patterns, ” in 2018 40th Annual International Confer ence of the IEEE Engineering in Medicine and Biology Society (EMBC) . IEEE, 2018, pp. 2068–2071. [25] A. Marjaninejad, D. Urbina-Mel ´ endez, and F . J. V alero-Cuev as, “Sim- ple kinematic feedback enhances autonomous learning in bio-inspired tendon-driv en systems, ” arXiv preprint , 2019. [26] S. Lee, M. Park, K. Lee, and J. Lee, “Scalable muscle-actuated human simulation and control, ” ACM Tr ansactions on Graphics (TOG) , vol. 38, no. 4, p. 73, 2019. [27] M. U. Kurse, H. Lipson, and F . J. V alero-Cue vas, “Extrapolatable ana- lytical functions for tendon excursions and moment arms from sparse datasets, ” IEEE T ransactions on Biomedical Engineering , vol. 59, no. 6, pp. 1572–1582, 2012. [28] C. M. Niu, K. Jalaleddini, W . J. Sohn, J. Rocamora, T . D. Sanger , and F . J. V alero-Cue vas, “Neuromorphic meets neuromechanics, part i: the methodology and implementation, ” Journal of neural engineering , vol. 14, no. 2, p. 025001, 2017. [29] K. Jalaleddini, C. M. Niu, S. C. Raja, W . J. Sohn, G. E. Loeb, T . D. Sanger , and F . J. V alero-Cuev as, “Neuromorphic meets neuromechan- ics, part ii: the role of fusimotor driv e, ” Journal of neural engineering , vol. 14, no. 2, p. 025002, 2017. [30] G. E. Loeb, F . J. Richmond, and L. L. Baker, “The bion devices: in- jectable interfaces with peripheral nerves and muscles, ” Neur osur gical focus , vol. 20, no. 5, pp. 1–9, 2006. [31] P . H. Peckham and J. S. Knutson, “Functional electrical stimulation for neuromuscular applications, ” Annu. Rev . Biomed. Eng. , vol. 7, pp. 327–360, 2005. [32] E. T odorov , T . Erez, and Y . T assa, “Mujoco: A physics engine for model-based control, ” in 2012 IEEE/RSJ International Confer ence on Intelligent Robots and Systems . IEEE, 2012, pp. 5026–5033. [33] J. Schulman, F . W olski, P . Dhariwal, A. Radford, and O. Klimov , “Proximal polic y optimization algorithms, ” arXiv preprint arXiv:1707.06347 , 2017. [34] R. Kwiatkowski and H. Lipson, “T ask-agnostic self-modeling ma- chines, ” Science Robotics , vol. 4, no. 26, p. eaau9354, 2019. [35] A. Cully , J. Clune, D. T arapore, and J.-B. Mouret, “Robots that can adapt like animals, ” Natur e , vol. 521, no. 7553, p. 503, 2015. [36] Y . Y ang, K. Caluwaerts, A. Iscen, T . Zhang, J. T an, and V . Sind- hwani, “Data efﬁcient reinforcement learning for legged robots, ” arXiv pr eprint arXiv:1907.03613 , 2019. [37] D. Nguyen-T uong and J. Peters, “Model learning for robot control: a survey , ” Cognitive processing , vol. 12, no. 4, pp. 319–340, 2011. [38] T . P . Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T . Erez, Y . T assa, D. Silver , and D. Wierstra, “Continuous control with deep reinforce- ment learning, ” arXiv preprint , 2015. [39] J. Schulman, S. Levine, P . Abbeel, M. Jordan, and P . Moritz, “Trust region policy optimization, ” in International conference on machine learning , 2015, pp. 1889–1897. [40] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. W ayne, Y . T assa, T . Erez, Z. W ang, S. Eslami, M. Riedmiller, et al. , “Emergence of locomotion behaviours in rich en vironments, ” arXiv pr eprint arXiv:1707.02286 , 2017. [41] V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. V eness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. , “Human-level control through deep reinforcement learning, ” Natur e , vol. 518, no. 7540, p. 529, 2015. [42] B. A. Cohn, M. Szedl ´ ak, B. G ¨ artner , and F . J. V alero-Cue vas, “Feasi- bility theory reconciles and informs alternativ e approaches to neuro- muscular control, ” Fr ontiers in computational neuroscience , vol. 12, 2018. [43] J. Bongard, V . Zykov , and H. Lipson, “Resilient machines through continuous self-modeling, ” Science , vol. 314, no. 5802, pp. 1118– 1121, 2006. [44] T . Y oshikawa, F oundations of r obotics: analysis and contr ol . MIT press, 1990. [45] A. V erl, A. Albu-Sch ¨ affer , O. Brock, and A. Raatz, Soft Robotics . Springer , 2015. [46] J. M. Inouye, J. J. Kutch, and F . J. V alero-Cuev as, “Optimizing the topology of tendon-driven ﬁngers: Rationale, predictions and implementation, ” in The Human Hand as an Inspiration for Robot Hand Development . Springer , 2014, pp. 247–266. [47] F . J. V alero-Cue vas, V . V . Anand, A. Sax ena, and H. Lipson, “Be- yond parameter estimation: extending biomechanical modeling by the explicit exploration of model topology , ” IEEE Tr ansactions on Biomedical Engineering , vol. 54, no. 11, pp. 1951–1964, 2007. [48] J. Rieffel, F . V alero-Cuev as, and H. Lipson, “ Automated discovery and optimization of large irregular tensegrity structures, ” Computers & Structures , vol. 87, no. 5-6, pp. 368–379, 2009. [49] L. Stenroth, J. Peltonen, N. J. Cronin, S. Sipil ¨ a, and T . Finni, “ Age- related differences in achilles tendon properties and triceps surae muscle architecture in viv o, ” Journal of Applied Physiology , vol. 113, no. 10, pp. 1537–1544, 2012. [50] T . D. OBrien, N. D. Reeves, V . Baltzopoulos, D. A. Jones, and C. N. Maganaris, “Mechanical properties of the patellar tendon in adults and children, ” Journal of biomechanics , vol. 43, no. 6, pp. 1190–1195, 2010. [51] J. Gosline, M. Lillie, E. Carrington, P . Guerette, C. Ortlepp, and K. Savage, “Elastic proteins: biological roles and mechanical prop- erties, ” Philosophical Tr ansactions of the Royal Society of London. Series B: Biological Sciences , vol. 357, no. 1418, pp. 121–132, 2002. [52] J. Bongard, “Morphological change in machines accelerates the ev o- lution of robust behavior , ” Proceedings of the National Academy of Sciences , vol. 108, no. 4, pp. 1234–1239, 2011. [53] A. Nagamori, C. M. Laine, and F . J. V alero-Cuev as, “Cardinal features of inv oluntary force variability can arise from the closed-loop con- trol of viscoelastic afferented muscles, ” PLoS computational biology , vol. 14, no. 1, p. e1005884, 2018.

Autonomous Control of a Tendon-driven Robotic Limb with Elastic Elements Reveals that Added Elasticity can Enhance Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment