Simplified decision making in the belief space using belief sparsification

Simpliﬁed decision making in the belief space using belief spar siﬁcation Khen Elimelech 1 and V adim Indelman 2 Abstract In this work, we introduce a new and efﬁcient solution approach f or the prob lem of decision making under uncer tainty , which can be f ormulated as decision making in a belief space, ov er a possib ly high-dimensional state space. T ypically , to solve a decision prob lem, one should identify the optimal action from a set of candidates, according to some objectiv e. W e claim that one can often generate and solve an analogous yet simpliﬁed decision problem, which can be solved more efﬁciently . A wise simpliﬁcation method can lead to the same action selection, or one f or which the maximal loss in optimality can be guaranteed. Fur ther more, such simpliﬁcation is separated from the state inference and does not compromise its accuracy , as the selected action would ﬁnally be applied on the original state. First, we present the concept f or general decision prob lems and provide a theoretical frame w ork for a coherent f ormulation of the approach. W e then practically apply these ideas to decision problems in the belief space, which can be simpliﬁed by consider ing a sparse appro ximation of their initial belief. The scalable belief sparsiﬁcation algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the beneﬁts of the approach in the solution of a realistic activ e-SLAM problem and manage to signiﬁcantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical, and holds numerous possib le e xtensions . Ke ywor ds Decision making under uncer tainty , belief space planning, POMDP , sparse systems, sparsiﬁcation, activ e SLAM 1 Introduction 1.1 Backg round In this era, intelligent autonomous agents and robots can be found all around us. They are designed for v arious functions, such as operating in remote domains, e.g., underwater and space; imitating humans and interacting with them; performing repetiti ve tasks; and ensuring safety of operations. They might be physically noticeable, e.g., personal-use drones, industrial robotic arms, and military vehicles; or less so, with the popularization of internet of things (IoT), smart homes, and virtual assistants. Still, these agents share the same fundamental goal – to autonomously plan and execute their actions. Y et, the increasing demand for these “smart” systems presents new challenges: integration of robotic agents into ev eryday life requires them to operate in real time, using inexpensi ve hardware. In addition, when planning their actions, these agents should account for real-world uncertainty in order to achiev e reliable and robust performance. There are multiple possible sources for such uncertainty , including dynamic en vironments, in which unpredictable e v ents might occur; noisy or limited observations, such as an imprecise GPS signal; and inaccurate deliv ery of actions. Also, problems, such as long-term autonomous navig a- tion, and sensor placement ov er lar ge areas, often in volv e op- timization of numerous v ariables. These settings require rea- soning ov er high-dimensional probabilistic states, kno wn as “beliefs”. Appropriately , the corresponding planning prob- lem is kno wn as Belief Space Planning (BSP). The objectiv e in such a problem is to select “safe” actions, which account for the uncertainty of the agent’ s belief. Other relev ant instantiations include acti ve Simultaneous Localization and Mapping (SLAM), acti ve sensing, robotic manipulation, and ev en cognitiv e tasks, such as dialogue management. The BSP problem is often modeled as a Partially Observable Markov Decision Process (POMDP), according to which we shall propagate the belief, and ev aluate the development of uncertainty , considering multiple courses of action (Kael- bling et al. 1998). Further , proper uncertainty measures, such as differential entropy , are expensi ve to calculate for high- dimensional and continuous beliefs. Overall, the compu- tational complexity of the problem can turn exceptionally high, thus making it challenging for online systems, or when having a limited processing po wer . 1.2 Objectiv es and approach ov er view The previous discussion leads us to our main goal – allowing computationally ef ﬁcient decision making. Note that in this study , we dif ferentiate between planning and decision making. Planning is a broad concept, which takes into consideration many aspects, such as goal setting and balancing, generation of candidate actions, accounting for different planning horizons and future developments, 1 Robotics and Autonomous Systems Progr am, T echnion — Israel Institute of T echnology . 2 Depar tment of Aerospace Engineering, T echnion — Israel Institute of T echnology Corresponding author: Khen Elimelech, T echnion, Haifa 3200003, Isr ael. Email: khen@technion.ac.il 2 coordination of agents, and so on. After reﬁning these aspects, we e ventually result in a decision problem: considering an initial state, and a given set of candidate actions (or action sequences), we use an objective function to measure the scalar values attained by applying each action on the initial state; to solve the problem, we shall identify the optimal candidate action, which generates the highest objective v alue. With this rudimentary view- point, we dismiss problem-speciﬁc attrib utes, which allo ws our formulation to address a wider range of problems. Nonetheless, our work heavily focuses on contrib uting to decision making in the belief space. In these decision problems, the initial state is a belief over a (possibly) high- dimensional state, and the objecti ve function is a belief-based information-theoretic value, measured from the propagated (updated) belief, after applying a candidate action. A traditional solution to the decision problem requires calculation of the objective function for each candidate action. W e would like to reduce the cost of the solution by sparing this exhausti ve calculation and comparison. Instead, we suggest to identify and solve a simpliﬁed decision problem, which leads to the same action selection, or one for which the loss in quality of solution can be bounded. A problem may be simpliﬁed by adapting each of its components – initial state, objecti ve function, and candidate actions. T o allow such analysis, we ﬁrst provide a general theoretical framew ork, which does not depend on any problem-speciﬁc attributes; the frame work allows us to formally quantify the effect of the simpliﬁcation on the action selection, and form optimality guarantees for it. W e then show how these ideas can be practically applied to high-dimensional BSP problems. In this case, the problem is simpliﬁed by considering a sparse approximation of the initial belief, which can be efﬁciently propagated, in order to calculate the candidates’ objective values. The resulting simpliﬁed problem can be solved in any desired manner, making our approach complementary to other solvers. Furthermore, while se veral works already utilize belief sparsiﬁcation to allow long-term operation and tractable state inference, the nov elty in our approach is the exploitation of sparsiﬁcation e xclusively and dedicatedly for ef ﬁcient decision making. After solving the decision problem, the selected action is then applied on the original belief; by such, we do not compromise the accuracy of the estimated state. For clarity , we list down the contributions of this work, in the order they are presented in the manuscript: 1. A theoretical framew ork supporting the concept of decision problem simpliﬁcation; 2. Formulation of decision making in the belief space, and application of the concept to it; 3. A scalable belief sparsiﬁcation algorithm; 4. Deriv ation of quality-of-solution guarantees; 5. Experimental demonstration in a highly realistic activ e-SLAM scenario, where a signiﬁcant improve- ment in run-time is achiev ed. Please note that this paper extends our previous publica- tions (Elimelech and Indelman 2017 a , b , c ). Besides the ex- panded experimental ev aluation, the belief sparsiﬁcation al- gorithm, which was previously introduced, is now reformed to a more stable and efﬁcient version. Also, the theoretical formulation includes se veral re visions and corrections to previously introduced deﬁnitions; the conclusive versions are those presented here. Also, to allow ﬂuid reading, proofs for all theorems, lemmas, and corollaries are given in the appendix. 1.3 Related work Sev eral works explore similar ideas to the ones presented here. In this section we do our best to provide an extensi ve revie w of such works, in comparison to ours. As mentioned, numerous methods consider sparsiﬁcation for the probabilistic state inference problem, in order to limit the belief size, and improve its tractability for long-term operation. Although being a well-researched concept, these methods do not examine sparsiﬁcation in the context of planning problems (inﬂuence ov er action selection, computational beneﬁts, etc.). Thrun et al. (2004), for example, showed that in a SLAM scenario, when using the information ﬁlter, forcing a certain sparsity pattern on the belief ’ s information matrix can lead to improved efﬁcienc y in belief update. Howe ver , they emphasized that the approximation quality was not guaranteed and that certain scenarios could lead to signiﬁcant div ergence. Also, since Dellaert and Kaess (2006) demonstrated the equiv alence between sparse matrices and (factor) graphs for belief representation, graph-based solutions for SLAM problems (which is often a sparse problem) hav e become more popular . Accordingly , methods for graph sparsiﬁcation hav e also gained relev ance. For example, Huang et al. (2012) introduced a graph sparsiﬁcation method, using node marginalization. The resulting graph is notably consistent, meaning, the sparsiﬁed representation is not more conﬁdent than the original one. Several other approaches suggest to sparsify the graph using the Cho w-Liu tree approximation, and show that the KL-div ergence from the original graph remains low (Carlev aris-Bianco et al. 2014; Carlev aris- Bianco and Eustice 2014; Kretzschmar and Stachniss 2012). Hsiung et al. (2018) reach similar conclusions for ﬁxed- lag Marko v blankets. Notably , our sparsiﬁcation method, which is presented both in matrix and graph forms, preserves the dimensionality of the belief, and only modiﬁes the correlations between the variables. It is also guaranteed to exactly preserv e the entropy of the belief. The approach described by Mu et al. (2017) separated the sparsiﬁcation into two stages: problem-speciﬁc remov al of nodes, and problem-agnostic remov al of correlations. The authors then demonstrated the superiority of their scheme ov er agnostic graph optimization, in terms of collision percentage. This two-stage solution reminds the logic in our sparsiﬁcation method: ﬁrst, identifying variables with minimal contribution to the decision problem, and then sparsiﬁcation of corresponding elements. Of course, we use such sparsiﬁcation for planning and not graph optimization. Exploiting sparsity to impro ve ef ﬁciency can also be done in other manners. Fundamental works (e.g., Davis et al. 2004), alongside ne wer ones (e.g. Frey et al. 2017; Agarwal and Olson 2012), provide heuristics for variable elimination order or variable pruning order , in order to minimize ﬁll- in during factorization of the information matrix (which is utilized during belief propagation). Elimelech and Indelman 3 In the context of planning under uncertainty and POMDP , the research community has been extensi vely inv estigating solution methods to provide better scalability for real-world problems. Finding optimal solutions (policies) according to the POMDP formulation is often done by utilizing dynamic programming algorithms, such as, value and polic y iteration (e.g., Porta et al. 2006; Pineau et al. 2006). Such methods are extremely computationally demanding, especially when considering high-dimensional state space (i.e., search spaces). These methods are thus generally not suitable for “online” planning problems for autonomous agents, in which we want to infer a speciﬁc sequence of actions to be ex ecuted immediately . Instead, when considering “online” scenarios, we typically perform a forward search from the current belief, and often forced to rely on approximated solutions. Standard online POMDP solvers (e.g., Silver and V eness 2010; Y e et al. 2017) often perform search in the state-space, and not the belief space, as we care to do here. W orks which do consider planning in the belief space, typically focus on methods for alleviating the search. For example, some solution methods perform direct (localized) trajectory optimization (e.g. Indelman et al. 2015; V an Den Berg et al. 2012). Otherwise, while building on established motions planners (e.g., Karaman and Frazzoli 2011; Kavraki et al. 1996), works such as the Belief Roadmap (by Prentice and Roy 2009), FIRM (by Agha-Mohammadi et al. 2014), SLAP (by Agha-mohammadi et al. 2018), and others (e.g., by P atil et al. 2014) rely on sub-sampling a ﬁnite graph in the belief space, in which the solution can be searched. Ho wev er, such methods are sev erely limited, by only allowing propagation of the belief over a single (most-recent) pose through the graph; i.e., they perform lo w-dimensional pose ﬁltering, rather than high-dimensional belief smoothing, as we do. This forced marginalization of state variables surely compromises the accuracy of the estimation, and limits the applicability to (problems such as) active-SLAM, in which we often wish to examine the information (uncertainty) of the entire posterior state, including the map and/or ex ecuted trajectory (Stachniss et al. 2004; Kim and Eustice 2014). Nonetheless, we do not focus on generation (or sampling) of candidates, but, instead, on efﬁcient comparison of their objectiv e values, by lowering the cost of belief updates. Hence, our approach is complementary to the aforemen- tioned graph-based methods, which focus on generating feasible candidates. W e demonstrated this compatibility in our experimental ev aluation, where we used a graph-based motion planner (from the most recent pose) to simply gen- erate a set of candidate actions; we then efﬁciently selected the optimal candidate by propagating the sparsiﬁed (high- dimensional) belief, and ev aluating its posterior uncertainty . In that regard, we may mention additional works which similarly address the issue of high-dimensional belief prop- agation, in the context of active-SLAM (e.g., Cha ves and Eustice 2016; K opitkov and Indelman 2017). Also, closely related to our approach, sev eral other works examine approximation of the state or the objectiv e function in order to reduce the planning complexity . A recent approach (Bopardikar et al. 2016) suggested using a bound over the maximal eigen v alue of the co variance matrix as a cost function for planning, in an autonomous navigation scenario. Beneﬁts of using this cost function include easy computation, holding an optimal substructure property (incremental search) and the ability to account to misdetection of measurements. Y et, the actual quality of results in terms of ﬁnal uncertainty , when measured in con ventional methods, is unclear . Their usage of bounds in attempt to improve planning efﬁciency reminds aspects of our work; howe ver , we use bounds to quantify the quality of solution. As they mention in their discussion, an unanswered question is the dif ference in quality of solution between planning using the exact maximal eigen value, and planning using its bound. Our theoretical frame work might be able to provide answer to this question. Boyen and Koller (1998) suggested maintaining an approximation of the belief for efﬁcient state inference. This approximation is done by dividing state variables into a set number of classes, and then using a product of marginals, while treating each class of variables as a single “meta variable”. A k -class belief simpliﬁcation cuts the original exponential inference complexity by a factor of k . The study showed that in rapidly-mixing POMDPs the expectation of the error could be bounded. This simpliﬁcation method was later examined under a restrictiv e planning scenario (McAllester and Singh 1999). The planning was performed using a planning-tree search, in which a constant amount of possible observations was sampled for each tree lev el, and again assuming a rapidly- mixing POMDP . There, the error induced by planning in the approximated belief space can be bounded as well. This method shares similar objecti ves with our work, b ut examines a v ery speciﬁc scenario, which limits its generality . In the approach described by Roy et al. (2005), the authors attempted to ﬁnd approximate POMDP solutions by utilizing belief compression, which was done with a PCA- based algorithm. This key idea is similar to ours, yet, in that work, the objectiv e value calculation (i.e., decision making) still relied on the original decompressed belief, instead of the simpliﬁed one. Thus, no apparent computational improv ement was achiev ed in planning complexity . The paper also did not make a comparison of this nature, and only presented analysis on the quality of compression. The work presented by Indelman (2015, 2016) contained the ﬁrst explicit attempt to use belief sparsiﬁcation to speciﬁcally achieve efﬁcient planning. The papers showed that using a diagonal cov ariance approximation, a similar action selection could usually be maintained, while signiﬁcantly reducing the complexity of the objecti ve calculation. This claim, howe ver , is most often not guaranteed. Optimal action selection was only proved under sev erely simplifying assumptions – when candidate actions and observations only update a single state variable, with a rank-1 update of the information. This attempt inspired our extensi ve research and in-depth, formal analysis. Finally , it is worth mentioning that the idea of examining only the order of candidate actions, instead of their cardinal objectiv e values, sometimes appears in the context of economics under the term or dinal utility (e.g. Manski 1988); this term, howe ver , is not prominent in the context of artiﬁcial intelligence. W e examine a similar idea in our theoretical framew ork, to follow . 4 2 Simpliﬁed decision making T o begin with, let us consider a decision pr oblem P , which we formally deﬁne in Deﬁnition 1. Deﬁnition 1. A decision pr oblem P is a 3-tuple ( ξ , A , V ) , where ξ is the initial state , from which we examine a set of candidate actions A (ﬁnite or inﬁnite), using an objective function V : { ξ } × A → R . Solving the problem means selecting the optimal action a ∗ , such that a ∗ = argmax a ∈A V ( ξ , a ) . (1) According to our suggested solution approach, we wish to generate and solve a simpliﬁed yet analogous decision problem P s . = ( ξ s , A s , V s ) , which results in the same (or similar) action selection, but for which the solution is more computationally efﬁcient. This can be achiev ed by altering or approximating any of the problem components – initial state, candidate actions, or objective function – in order to alleviate the calculation of the candidates’ objectiv e v alues. Nonetheless, approximating each of these components represents a different simpliﬁcation approach. For example, there is a logical difference between simplifying the initial state (i.e., e xamining dif ferent states under the same objectiv e function), and simplifying the objective function (i.e., examining the same state under different objecti ves); in the ﬁrst case, we would like to maintain a certain relation between states, and in the second one, a relation between functions. Next, we will introduce additional ideas to help formalize our goal, and see how these can guide us to wards designing effecti ve simpliﬁcation methods, which are guaranteed to preserve the quality of solution. 2.1 Analyzing simpliﬁcations 2.1.1 Simpliﬁcation loss Examining a simpliﬁed decision problem may lead to loss in the quality of solution, when the selected action is not the real optimal action. W e can e xpress this loss with the following simpliﬁcation quality measure: Deﬁnition 2. The simpliﬁcation loss between a deci- sion problem P . = ( ξ , A , V ) and its simpliﬁed version P s . = ( ξ s , A s , V s ) , due to sub-optimal action selection, is loss ( P , P s ) . = V ( ξ , a ∗ ) − V ( ξ , a ∗ s ) , where a ∗ = argmax a ∈A V ( ξ , a ) , a ∗ s = argmax a s ∈A s V s ( ξ s , a s ) . (2) T o put in words, this loss is the difference between the maximal objectiv e value, attained by applying the optimal candidate action a ∗ on ξ , and the v alue attained by applying a ∗ s (the action returned from the solution of the simpliﬁed solution) on ξ . This idea is illustrated in Fig. 1a. W e implicitly assume that the original objecti ve function V can accept actions from the simpliﬁed set of candidates A s . When the solutions to the problems agree loss ( P , P s ) = 0 . Most often it is indeed possible to settle for simpliﬁed decision problem formulation (which can lead to a sub- optimal action), in order to reduce the complexity of action Action 0 2 4 6 8 10 V alue Simplification Loss } (a) 1 2 3 4 5 6 7 8 9 10 Action 0 2 4 6 8 10 Value } Simplification Offset } } (b) Figure 1. P s is a simpliﬁed version of a decision prob lem P ; the graphs sho w the objectiv e values of each prob lem’ s candidate actions. (a) a ∗ s is the optimal action according to the simpliﬁed problem, and a ∗ is the real optimal action; the diff erence between the (real) objectiv e v alues of these two actions is the loss induced by the simpliﬁcation. (b) The offset measures the maximal diff erence between respectiv e objective values from the tw o problems , and does not require to explicitly identify a ∗ / a ∗ s . selection; though, it is important to quantify and bound the potential loss, before applying the selected action, in order to guarantee that this solution can be relied on. 2.1.2 Simpliﬁcation offset T o asses the simpliﬁcation loss, we suggest to identify the simpliﬁcation offset , which acts as an intuitive “distance” measure in the space of decision problems: Deﬁnition 3. The simpliﬁcation offset of a candidate a ∈ A , between a decision problem P . = ( ξ , A , V ) , and its simpliﬁed version P s . = ( ξ s , A , V s ) is δ ( P , P s , a ) . = | V ( ξ , a ) − V s ( ξ s , a ) | . (3) Overall, the simpliﬁcation of fset between P and P s is ∆( P , P s ) . = max a ∈A { δ ( P , P s , a ) } . (4) Elimelech and Indelman 5 Unlike the loss, the of fset (which is illustrated in Fig. 1b) measures the maximal difference between r espective objectiv e v alues from the two problems, and does not require to explicitly identify the optimal actions. Further , for each candidate a ∈ A , the offset represents an interval for the real value V ( ξ , a ) , around the respective approximated value V s ( ξ s , a ) , in which it must lie, i.e.: V s ( ξ s , a ) − δ ( a ) ≤ V ( ξ , a ) ≤ V s ( ξ s , a ) + δ ( a ) (5) Notably , the offset represents only the size of this interval, and not its location on the v alue axis (around V s ( ξ s , a ) ). This means that the of fset, in contrast to the loss, is a property of the simpliﬁcation method , and does not depend on the solution of P nor P s . It can thus potentially be examined without explicitly solving either of the problems, nor calculating V nor V s , as we shall see. Note that when deﬁning the offset, we implicitly considered that the two problems examine the same set of candidate actions; this will be valid from no w on, unless stated otherwise. Also, for brevity , we will no longer write the initial state as input to V / V s , nor V , V s as input to δ / ∆ , whenev er the context is clear . Next, we will explain how we can utilize the offset to infer loss guarantees. 2.2 Optimality guarantees 2.2.1 Bounding the offset Ob viously , knowing the offset exactly for ev ery action would be equiv alent to having access to the original solution. W e would thus usually rely on a bound of the offset to infer loss guarantees. As mentioned, the offset measures the difference between respectiv e objectiv e values from the original and simpliﬁed problems, and is independent of their solutions. Thus, we can ev aluate and attempt to bound the offset befor e solving the problem; by utilizing the general structure of problems in our domain, and knowing how they are affected by the nominativ e simpliﬁcation method, we can try to infer a symbolic formula for the of fset, and draw conclusions from it. This type of analysis often allows us to draw general conclusions regarding the simpliﬁcation method , rather than a speciﬁc problem. For example, in Section 3.2, we discuss a novel belief simpliﬁcation method, used to reduce the cost of planning in the “belief space”. By symbolically analyzing the offset (for any decision problem in this domain), we could identify the conditions under which its value is zero, and the simpliﬁcation is guaranteed to induce no loss. This idea is later demonstrated in Section 3.3.1. Still, we note that providing completely general guarantees, which are v alid for all the decision problems in the domain, is not always possible from pure symbolic analysis. Sometimes, to draw decisiv e conclusions, we must assign the properties of the speciﬁc decision problem we wish to solve. If we failed to reach v aluable conclusions from such “pre- solution” symbolic analysis of the of fset, we can try to bound it “post-solution”, by utilizing the calculated (simpliﬁed) values, and (any) kno wn bounds, or limits, for the real objectiv e v alues; these limits should be selected based on domain kno wledge of the speciﬁc problem. Then, the following can be easily derived from the deﬁnition of the simpliﬁcation loss: δ ( a ) ≤ max  V s ( ξ s , a ) − LB { V ( ξ , a ) } , U B { V ( ξ , a ) } − V s ( ξ s , a )  (6) where LB , U B stand for lo wer and upper bounds, respectiv ely . W e demonstrate how to practically utilize this idea in Section 3.3.2. 2.2.2 Bounding the loss As discussed, our goal is to guarantee that relying on a certain simpliﬁcation would not induce more than the acceptable loss. As with the offset, bounding the loss can be done on two occasions: (i) pre- solution analysis – this type of analysis occurs befor e solving the simpliﬁed problem (based on the availability of “symbolic” offset bounds); and (ii) post-solution analysis – which occurs after solving the simpliﬁed problem (b ut before applying the selected action). Surely , we prefer to know if the simpliﬁed solution would be worthwhile befor e in vesting in it; for example, we may consider the case where action ex ecution is costly (as measured with the objecti ve function), and beyond a certain loss, improving the decision making efﬁcienc y is not w orth the e xecution of a sub-optimal action. Nonetheless, post-solution guarantees are typically tighter , as we can also rely on the calculated v alues. The notion of offset allow us to seamlessly deri ve both types of guarantees, and easily improv e them when reﬁning the solution, or gi ven access to new information. From the properties of the absolute v alue, it is also easy to infer that the offset is a valid metric (a distance measure) between decision problems. Indeed, Lemma 1 intuiti vely indicates that when the offset between a problem and its simpliﬁcation is small, then the induced loss is also small, and the action selection stays “similar”. Lemma 1. F or any two decision pr oblems P and P s , 0 ≤ loss ( P , P s ) ≤ 2 · ∆( P , P s ) . (7) This conclusion is potentially reachable in pre-solution analysis, as it does not rely on the simpliﬁed solution, i.e., the calculated objecti ve v alues; when these become av ailable, in post-solution analysis, this bound can be reﬁned, as indicated in Lemma 2. Lemma 2. F or any two decision pr oblems P and P s , loss ( P , P s ) ≤ max n 0 , 2 · ∆( P , P s ) + max a 6 = a ∗ s  V s ( a )  − V s ( a ∗ s ) o . (8) For an extended discussion regarding deriv ation of loss guarantees, including a proof of Lemma 2, and more intricate loss bounding techniques, please refer to Elimelech (2021). Speciﬁcally , when we do not hav e access to a symbolic formula for the offset, and instead rely on the “post-solution offset bound” (6), the e xpression in (8) simpliﬁes to: loss ( P , P s ) ≤ max a 6 = a ∗ s {U B { V ( a ) }} − LB { V ( a ∗ s ) } . (9) Notably , such post-solution analysis allows us to understand not only what is the maximal possible loss, but also which candidates are likely to cause it. 6 2.3 Reducing simpliﬁcation bias Previously , we suggested the simpliﬁcation of fset as a “distance measure” between decision problems, and recognized that it(s bound) can be used to bound the simpliﬁcation loss. Ho we ver , this distance measure may be deceiving, as the problems may appear to be separated by a large offset, ev en when the simpliﬁcation induces a small loss. Speciﬁcally , this can be the case when the simpliﬁcation causes a large “bias” in the simpliﬁed objectiv e v alues. In the following section we introduce another concept, to help us handle such scenarios. 2.3.1 Action consistency W e point out a key observ ation: to solve the decision problem, we only need to sort (or rank) the candidate actions in terms of their objective function value; changing the v alues themselves, without changing the order of actions, does not change the action selection. Hence, when two problems maintain the same order of candidate actions, their solution is equi valent. In this case, we can simply say that the two problems are action consistent , as demonstrated in Fig. 2a. Deﬁnition 4. T wo decision problems, P 1 . = ( ξ 1 , A , V 1 ) and P 2 . = ( ξ 2 , A , V 2 ) , are action consistent , and marked P 1 ' P 2 , if the following applies ∀ a i , a j ∈ A : V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) . (10) If also V 1 ≡ V 2 , we can simply say that ξ 1 , ξ 2 are action consistent , and mark ξ 1 ' ξ 2 . This relation holds sev eral interesting properties. Lemma 3. Action consistency ( ' ) is an equivalence r elation; i.e., any thr ee decision pr oblems P 1 , P 2 , P 3 , satisfy the following pr operties: 1. Reﬂexivity: P 1 ' P 1 . 2. Symmetry: P 1 ' P 2 ⇐ ⇒ P 2 ' P 1 . 3. T ransitivity: P 1 ' P 2 ∧ P 2 ' P 3 = ⇒ P 1 ' P 3 . Lemma 3 implies that the entire space of decision problems is divided into separate equiv alence-classes of action consistent problems. Lemma 4 adds that we can transfer between action consistent problems using monotonically increasing functions. W e remind again that all proofs are giv en in Appendix B. Lemma 4. F or any two decision pr oblems P 1 and P 2 , P 1 ' P 2 ⇐ ⇒ the mapping f : V 1 ( ξ 1 , a ) 7→ V 2 ( ξ 2 , a ) is monotonically incr easing. (11) Meaning, if the (scalar) mapping of respectiv e objectiv e values between the two problems agrees with a monoton- ically increasing function (e.g., a constant shift, a linear transform, or a logarithmic function), then the problems are action consistent. If this mapping is not monotonically increasing, then the problems are not action consistent. 2.3.2 Unbiased simpliﬁcation offset The notion of action consistency can help us to achiev e better guarantees when utilizing our pre viously de veloped analysis approach. W e now understand that when deriving loss bounds, instead of 1 2 3 4 5 6 7 8 9 10 Action 0 2 4 6 8 10 V alue Action Consistent Decision Problems (a) 1 2 3 4 5 6 7 8 9 10 Action 0 2 4 6 8 10 Value } } Reduced Bias Simplification Offset (b) Figure 2. (a) Each graph represents the objectiv e values of the candidate actions of a cer tain decision problem; although the values are diff erent, all the graphs maintain the same trend among the actions, and theref ore the problems are action consistent. (b) The simpliﬁcation offset ∆ betw een P and P s is the maximal diff erence between the v alues of respectiv e actions. The offset can be reduced b y utilizing a monotonically increasing function f (here we used a constant-shift), which leads to an less biased yet action consistent prob lem P f s . examining a simpliﬁed problem P s , we can, equiv alently , examine an y other problem P f s that is action consistent with it. Further , such a problem will necessarily be of the form P f s . = ( ξ s , A , f ◦ V s ) , where f is monotonically increasing. Accordingly , instead of examining the simpliﬁcation offset, as considered thus far , we can examine the unbiased simpliﬁcation offset : Deﬁnition 5. The unbiased simpliﬁcation offset between a decision problem P . = ( ξ , A , V ) , and its simpliﬁed version P s . = ( ξ s , A , V s ) is ∆ ∗ ( P , P s ) . = min  ∆( P , P f s ) | f : R → R is monotonically increasing ∧ P f s . = ( ξ s , A , f ◦ V s )  . (12) The unbiased offset is the minimal offset between P and any problem action consistent with P s . A demonstrativ e Elimelech and Indelman 7 example appears in Fig. 2b. Speciﬁcally , P ' P s , if and only if the unbiased offset is zero: Lemma 5. F or any two decision pr oblems P and P s , P ' P s ⇐ ⇒ ∆ ∗ ( P , P s ) = 0 . (13) Thankfully , our previous conclusions still hold, and we can use the unbiased simpliﬁcation offset to bound the loss: Lemma 6. F or any two decision pr oblems P and P s , 0 ≤ loss ( P , P s ) ≤ 2 · ∆ ∗ ( P , P s ) . (14) Since ∆ ∗ ( P , P s ) ≤ ∆( P , P f s ) , for any monotonically increasing f . W e can symbolically dev elop ∆( P , P f s ) , for any such f that is con venient, in order to bound the loss; such a function should help “counter” the effect of the simpliﬁcation on the objecti ve v alues. W e may also recognize that the unbiased offset satisﬁes the triangle inequality (like the standard of fset): Lemma 7. F or any thr ee decision pr oblems P 1 , P 2 , and P 3 , the unbiased simpliﬁcation offset satisﬁes the triangle inequality , i.e., ∆ ∗ ( P 1 , P 2 ) + ∆ ∗ ( P 2 , P 3 ) ≥ ∆ ∗ ( P 1 , P 3 ) . (15) This property can potentially help in bounding the loss, when applying multiple simpliﬁcations. Ho we ver , unlike the standard of fset, the unbiased of fset is scaled according to the original objecti ve v alues (like the loss), and is asymmetric in its input arguments. It is, therefore, not considered a metric * . W e may also note that the notions of action consistency and simpliﬁcation offset are related to the concept of “rank correlation” – a scalar statistic which measures the correlation between two ranking v ectors (see Kendall 1948). Y et, such ordinal vectors are oblivious to the cardinal objectiv e values, and, therefore, cannot be used to bound the simpliﬁcation loss. The rank correlation coefﬁcient mostly serves for statistical analysis, as its calculation requires perfect knowledge on the ranking vectors. Since the rank variables are not independent of each other, a change or addition of a single vector entry may subsequently lead to change in all other entries, and require complete recalculation of the correlation coef ﬁcient. On the other hand, the concepts we introduced rely on a “local relation” between the problems: to check for action consistency , we only examine pairs of actions at a time; and to e v aluate the offset – only pairs of respectiv e objective values. Addition of candidates, for example, does not affect these relations between the existing candidates. As we explain next, this locality can be utilized to deriv e offset and loss bounds. 3 Decision making in the belief space In the pre vious section, we examined the concept of decision problem simpliﬁcation. W e now wish to practically apply this idea to allow efﬁcient decision making under uncertainty , which we formulate as decision making in the belief space. In this domain, the initial state of the decision problem is actually a probability distribution (“belief”), and, as to be explained, the problem is simpliﬁed by considering a sparse approximation of it. W e pro vide an appropriate sparsiﬁcation algorithm, and then show that the induced loss can be bounded. First of all, we deﬁne the problem. 3.1 Problem deﬁnition 3.1.1 Belief propagation W e consider a sequential prob- abilistic process. At time-step k , an agent transitions from pose x k − 1 to pose x k , using a control u k . It then receiv es an observation of the world z k , based on its updated state. The agent’ s state vector X k . = ( x T 0 , . . . , x T k , L T k ) T consists of the series of poses, and may also include external variables, which are introduced by the observations; for example, in a full-SLAM scenario, L k can stand for the positions of maintained landmarks. Pose transition and observation are both probabilistic operations, which induce probabilistic constraints over the state variables, known as factors. Here, we assume the transition and observation models are described with the following dependencies: x k = g k ( x k − 1 , u k ) + w k , w k ∼ N (0 , W k ) , (16) z k = h k ( X k ) + v k , v k ∼ N (0 , V k ) , (17) where W k , V k are the covariance matrices of the respective normally-distributed (Gaussian) zero-mean noise models w k , v k , and g k , h k are deterministic functions. At each time-step, the agent maintains the posterior distribution over its current state vector X k , given the controls and observ ations taken until that time; this distribution, which is deﬁned by the product of these factors, is also known as its belief : b k . = P ( X k | u 1: k , z 1: k ) ∝ k Y i =1 f u i f z i , (18) where u 1: k . = { u 1 , . . . , u k } and z 1: k . = { z 1 , . . . , z k } , and f u i , f z i are the factors matching the respecti ve controls and observations. As widely considered, by utilizing local model linearization, we may conclude that giv en the previously- deﬁned models, the belief b k is also normally-distrib uted (for the full deriv ation see Elimelech (2021)). Hence, to describe it, we can use a cov ariance matrix Σ k , or equiv alently , its in verse, the (Fisher) information matrix Λ k : b k = N ( X ∗ k , Σ k ) ≡ N  X ∗ k , Λ − 1 k  . (19) The matrices are symmetric, and the order of their rows and columns matches the speciﬁc order of variables in the state. W e may now reason about a posterior belief b k +1 , after performing a control u k +1 and taking an observation z k +1 : b k +1 . = P ( X k +1 | u 1: k +1 , z 1: k +1 ) ∝ b k · P ( x k +1 | x k , u k +1 ) · P ( z k +1 | X k +1 ) . (20) This belief remains normally-distributed and can be described with the following information matrix: Λ k +1 = ˘ Λ k + G T k +1 W − 1 k +1 G k +1 + H T k +1 V − 1 k +1 H k +1 , (21) where the matrices G k +1 and H k +1 are the Jacobians ∇ g k +1 | X k +1 and ∇ h k +1 | X k +1 , respecti vely , around some ∗ Still, the aforementioned properties, along with the obvious non-negativity , make the unbiased offset a quasi-metric (or asymmetric metric), which induces an appropriate topology on the space of decision problems, as explained by K ¨ unzi (2001). 8 initial estimate, and ˘ Λ k is the augmented prior information matrix. Since controls and observations may introduce new v ariables to the state vector , its size at time-step k , often does not match its size at time-step k + 1 . Hence, the prior information matrix Λ k should be augmented to accommodate these new v ariables. W e use the accent ˘  to indicate augmentation of the prior information matrix (with entries of zero) to match the posterior size. Adding new variables is possible at any index in the state, as long as we make sure the augmentation keeps the same variable order . If the prior state is of size n , and we add m new vari ables to the end of it, then ˘ Λ k . =  Λ n × n k 0 n × m 0 m × n 0 m × m  . (22) The expression in (21) can be written in a more compact form, by marking the collective Jacobian J δ k +1 , which encapsulates the ne w information regarding the control and the succeeding observation: Λ k +1 = ˘ Λ k + J δ k +1 T J δ k +1 , where J δ k +1 = " W − 1 2 k +1 G k +1 V − 1 2 k +1 H k +1 # . (23) Each belief update can be described using a collective Jacobian of this form. Thanks to the additivity of the information, we can easily examine the information matrix of the posterior belief b k + T after applying a sequence of T controls u . = u k +1: k + T ; the respective collectiv e Jacobians of each control can simply be stacked to yield the collectiv e Jacobian U of the entire sequence u : Λ k + T = ˘ Λ k + T X t =1 J δ k + t T J δ k + t . = ˘ Λ k + U T U , where U . =    J δ k +1 . . . J δ k + T    . (24) 3.1.2 Decision making At time-step k , the agent performs a planning session. According to its current (prior) belief b k , it wishes to select the control sequence which minimizes the expected uncertainty in the future (posterior) belief. T o measure the uncertainty we use the dif ferential entropy , which, for a normally-distrib uted belief b of state size n , with an information matrix Λ , is H( b ) = 1 2 · ln  (2 π e ) n | Λ |  = − 1 2 · (ln | Λ | − n · ln(2 π e )) , (25) where |  | represents the determinant operation. Although other uncertainty measures with a lower computational cost exist, e.g., the trace of the covariance matrix, the entropy bests those by taking inter -variable correlations into account; those can ha ve a dramatic effect on the measured uncertainty , and are crucial for correct analysis. Thus, while utilizing the information update rule from (24), we deﬁne the following information-theoretic value or objective function, which measures the expected information gain between the current and ﬁnal beliefs: ˜ V ( b k , u ) . = E Z [H( b k ) − H( b k + T )] , (26) where u is a candidate control sequence, and Z is the set of observations taken while performing this sequence. W e may also take the common as sumption of achie ving the most likely observations, around the current mean (“maximum likelihood” assumption, as examined by Platt et al. (2010)), which would allow us to drop the e xpectation from this expression. W e will also drop the augmentation mark and time index from no w on, for the sake of concise writing. Overall, from an initial belief b , and considering a giv en set of candidate control sequences U , we are interested in solving the decision problem P . = ( b, U , V ) , where V is the objectiv e function: V ( b, u ) . = 1 2 ·  ln    ˘ Λ + U T U    − ln | Λ | − m · ln(2 π e )  , (27) Λ is the information matrix of the prior belief b , U is the collectiv e Jacobian of u , and m is the number of variables added to the state when executing u (the difference between the number of columns in U and in Λ ). For clariﬁcation, we described the process as sequential to conform to the common POMDP framework; we treat ev ery planning session as a separate decision problem. Further , the “maximum likelihood” assumption is not essential, but is used to achiev e a clear discussion, where each candidate control sequence can be described with a single collectiv e Jacobian; for a generalized discussion, where this assumption is relaxed, and where we also allo w examination of candidate policies, please see Elimelech (2021). Finally , we can use the information matrix to examine the future beliefs, e ven if the state inference process is not based on such information smoother . If the initial information matrix is not provided, it can be calculated by in verting the cov ariance matrix. 3.1.3 The square root matrix An alternative way to represent the belief b k (and propagate it), is using the upper triangular square root matrix R k of the information matrix Λ k , giv en (e.g.) by calculating the Cholesky factorization: Λ k = R T k R k . (28) Like Λ k , the order of ro ws and columns of R k also matches the order of variables in the state. Prominent state-of-the-art SLAM algorithms, e.g., iSAM2 (Kaess et al. 2012), rely on this representation, as it allows the calculation of the posterior mean (state inference) to be performed incrementally , while exploiting inherent sparsity . Our belief simpliﬁcation method, as described in the following section, also relies on this representation. Unfortunately , in this form, the information update losses its con venient additi vity property , and requires re-calculation (or update) of the factorization, in order to ﬁnd the posterior square root matrix R k + T , such that R T k + T R k + T = Λ k + T = ˘ R k T ˘ R k + U T U , (29) where U is deﬁned as in (24), and ˘ R k marks an appropriate augmentation of the prior root matrix: ˘ R k . =  R n × n k 0 n × m  . (30) Elimelech and Indelman 9 On the other hand, the determinant of the posterior information can be calculated in linear time – by multiplying of the diagonal elements of this triangular matrix. The objectiv e function (27) can thus be re-written as V ( b, u ) ≡ 1 2 · N X i =1 ln( R + ii ) 2 − n X i =1 ln( R ii ) 2 − m · ln(2 π e ) ! , (31) where n is the prior state size, N is the posterior state size, R + marks the posterior square root matrix, and the subscript  ij marks the matrix element in the i -th row and j -th column. As explained, using this form, the signiﬁcant computational cost of calculating the objective value moves from the determinant calculation to the information update phase, though this can be performed incrementally . 3.2 Belief sparsiﬁcation W e no w wish to present a simpliﬁcation method for the decision problem we hav e just formalized: P . = ( b, U , V ) . W e choose keep the same objective function V , and set U of candidate actions, and focus on simplifying the initial belief b . As stated, candidate actions here are actually control sequences for the agent; we assume the collecti ve Jacobians for the set of actions are av ailable. As we saw , calculation of the objective function (as deﬁned in (27)) in volv es calculation of the determinant of the posterior information matrix, after performing an appropriate belief update for the candidate action. The cost of this calculation depends directly on the number of non- zero elements in the matrix, and is signiﬁcantly lower for sparse matrices. Thanks to the additivity of the information, sparsifying the prior information matrix Λ could potentially lead to a sparser posterior information matrix Λ + U T U , for ev ery candidate action u with collective Jacobian U ; notably , such sparsiﬁcation of the prior is only calculated once, for any number of actions. W e also note that in many problems, especially in navigation problems, the collective Jacobians are inherently sparse, and as the state gro ws, in volve less variables in relation to its size. Hence, even after their addition to the sparsiﬁed prior information matrix, its sparsity shall be retained. Equiv alently , we may seek to sparsify R , the square root of Λ , which is used in (31), in order to improv e the efﬁciency of the factorization update process. Overall, assuming the initial belief of the decision problem is b = N ( X ∗ , Λ − 1 ) , our simpliﬁed problem shall rely instead on b s = N ( X ∗ , Λ − 1 s ) as the initial belief, where Λ s is a sparse approximation of Λ . In the following section, we present a sparsiﬁcation algorithm † for the information matrix (or its square root matrix). Fig. 3 summarizes the paradigm of belief sparsiﬁcation for efﬁcient decision making in the belief space; clariﬁcation regarding its steps is to follo w . Identify unin volved v ariables Select a subset S of state v ariables to sparsify Find a sparse approximation b s of the initial belief using Algorithm 1 Pre-solution analysis Calculate the objectiv e values for all candidates using b s Select the “optimal” candidate Post-solution analysis: deriv e loss bounds, to guarantee the quality of solution Apply the selected action on the original belief b Initial belief b Updates corrsponding to each candidate action (“collectiv e Jacobians”) Figure 3. Belief sparsiﬁcation for efﬁcient decision making in the belief space. Essential steps are in dark blue; optional steps, in order to pro vide guarantees, are in light b lue. Here, candidate actions represent control sequences f or the agent. 3.2.1 The algorithm Algorithm 1 summarizes our suggested method for belief sparsiﬁcation. The algorithm may receiv e as input, and return as output, a belief represented using either the information matrix, or its square root. This scalable algorithm depends on a pre-selected subset S of state v ariables, and wisely removes elements which correspond to these variables from the matrix. Approximations of different degrees can be generated using different v ariable selections S , as to be explained in Section 3.3.1. For a clear discussion, when S contains all the variables, we say this is a full sparsiﬁcation ; using any other partial selection of v ariables is a partial sparsiﬁcation . Fig. 4 contains a visual demonstration of the algorithm steps. In the following section (Section 3.2.2), we provide an extended probabilistic analysis of the algorithm, and explain how it can also be applied to general (non-Gaussian) beliefs; a visual demonstration of such application, where we represent the belief using a generic factor graph, is giv en in Figure 5. An example of the the algorithm output is pro vided in Figure 6. Let us break down the algorithm steps: ﬁrst, we should check if the variables are ordered properly , i.e., such that the v ariables we wish to sparsify (variables in S ) appear ﬁrst in the state. If not, we should reorder the v ariables accordingly . This requires appropriate modiﬁcation of the input matrix. If the algorithm input is the symmetric matrix Λ (line 2), we shall simply permute its rows and columns by calculating the product P T Λ P of the information matrix with an appropriate (column) permutation matrix P . After † Algorithm 1 is a revised version of the sparsiﬁcation algorithm that appeared in our previous publication (Elimelech and Indelman 2017 c ). 10 (a) (b) (c) (d) (e) (f ) Figure 4. The steps of Algorithm 1 (from left-to-r ight), for sparsiﬁcation of a Gaussian belief (sho wn in Fig. 5a); the state variables are X . = [ x 1 , l 1 , l 2 , x 2 , x 3 , l 3 ] T (in that order), and the subset of variab les selected f or sparsiﬁcation is S = { x 1 , l 2 , x 2 } (in green). (a) The sparsity pattern of the symmetr ic information matrix of belief. (b) Reordering the variab les, such that all the v ar iables in S appear ﬁrst; this is done by simply permuting the rows and columns of the matrix. (c) Calculating the upper triangular square root matrix chol ( Λ p ) of the permuted inf or mation matr ix; each row corresponds to a state v ar iable . (d) Removing off-diagonal elements from rows corresponding to v ar iables in S . (e) After the sparsiﬁcation, we may permute the variables bac k to their or iginal order directly in the square root matrix, without breaking its upper tr iangular shape. (f) Ref or ming the sparsiﬁed information matrix Λ s . = R T s R s ; note that the process aff ects the values in the matrix, and ma y also introduce new non-zeros (marked in purple). Algorithm 1: Scalable belief sparsiﬁcation. Inputs: A belief b = N ( X ∗ , Λ − 1 ) , such that Λ = R T R A subset S of state v ariables to sparsify Output: A sparsiﬁed belief b s . = N ( X ∗ , Λ − 1 s ) , such that Λ s . = R T s R s // reorder the state variables such that the variables in S are first in the state vector 1 P ← an appropriate (column) permutation matrix 2 if the algorithm input is Λ then 3 Λ p ← P T Λ P 4 R p ← chol ( Λ p ) 5 else if the algorithm input is R then 6 R p ← modify R to conv ey appropriate variable reordering (see remark in the main text) 7 R p s ← zero off-diagonal elements from R p in rows matching variables in S // sparsify R p 8 R s ← P R p s P T // return to the original variable order 9 if the algorithm output is Λ then 10 Λ s ← R T s R s // reform the information matrix this permutation, we can deri ve R p , the square root matrix of the permuted information matrix, using the Cholesky decomposition (line 4). If the algorithm input is the matrix R (line 5), the task of variable reordering is not trivial, as trying to modify R by permuting its rows and columns would break its triangular shape. Instead, this task (typically) requires re- factorization of Λ under the new v ariable order . Remark In our follo w-up work (Elimelech and Indelman 2021), we pro vide an efﬁcient modiﬁcation algorithm for R , which is intended for the task of v ariable reordering, and can spare the matrix re-factorization; we can use this algorithm to efﬁciently deri ve R p (line 6). If no reordering is required, and the algorithm input is Λ , we may directly calculate the Cholesky decomposition (line 4); if no reordering is required, and the input is R , we may skip directly to line 7. Speciﬁcally , when all of S is already at the beginning of the state, no reordering is needed. This situation particularly occurs when sparsifying all the v ariables (i.e., full sparsiﬁcation). Next, in line 7, we zero off-diagonal elements in the permuted square root matrix R p , in rows corresponding to v ariables in S , to yield the sparsiﬁed square root matrix R p s . Since the prior belief should be updated according to the predicted hypotheses, the variable order in the sparsiﬁed information matrix (or its square root) must match the v ariable order in the collecti ve Jacobians. Thus, we should reorder the variables back to their original order (line 8). Though, we notice that after the sparsiﬁcation this permutation can be performed on the square root matrix dir ectly , without resorting to the information matrix, and without breaking its triangular shape, by calculating P R p s P T (note the rev erse multiplication order). This claim is formalized in Corollary 1 (and prov ed in Appendix B). Corollary 1. After sparsiﬁcation of the squar e root matrix (line 7 of Algorithm 1), permutation of the variables back to their original or der can be performed on the square r oot matrix dir ectly , without breaking its triangular shape . Finally , we may return the sparsiﬁed belief, represented either with R s or Λ s . In the latter case, this requires to (easily) reconstruct the sparsiﬁed information matrix from its sparsiﬁed root (line 10). After the sparsiﬁcation, the v alue of the non-zero (NZ) entries in the sparsiﬁed information matrix may be different than the corresponding entries in the original matrix (including the diagonal), and new NZs may be added in compensation for the removed entries (factors). Also, note that the permutation of variables back to their original order can potentially be skipped, by equiv alently permuting the columns of all the candidate collectiv e Jacobians, to match the altered order . The deriv ation of R p (in line 4 or line 6), when conducted, is the costliest step of the algorithm, which deﬁnes its maximal computational complexity; we may recall that the complexity of the Cholesky decomposition is O ( n 3 ) , at worst, where n is the state size (H ¨ ammerlin and Hoffmann 2012). In comparison, the computational cost of Elimelech and Indelman 11 the remaining steps, i.e., matrix permutation (lines 3 and 8), remov al of matrix elements (line 7), and reconstruction of the information matrix (line 10), is usually minor . Still, it should be noted that depending on the conﬁguration, many of the steps are often not necessary . For example, as mentioned, when the input matrix is already in the desired order, the permutations can be skipped; this is speciﬁcally correct in full sparsiﬁcation. In that case, if gi ven the square root matrix as input, the algorithm holds an almost negligible complexity – we only need to extract the matrix’ diagonal. Also, in full sparsiﬁcation, the sparsiﬁed information matrix, if required, can be reconstructed from its root in linear complexity , as both R s and Λ s are diagonal. Nonetheless, we remind that the approach is meant to ov erall reduce the decision making time, as the time spent on performing the sparsiﬁcation (performed once) is lower than the time saved in performing (the multiple) belief updates. F or example, since full sparsiﬁcation leads to a diagonal approximation (information or its root), considering the collecti ve Jacobians are sparse, belief updates can be performed with an almost linear complexity . Also, since the cost of sparsiﬁcation does not depend on the number of candidates or hypotheses, as this number grows, the relative “in vestment” in calculating the sparsiﬁcation becomes less signiﬁcant. 3.2.2 Probabilistic analysis Let us analyze the suggested sparsiﬁcation algorithm from a wider perspective, using probabilistic graphical models. As explained, the belief b (18) is constructed as a product of factors – probabilistic constraints between variables, e.g., those induced by observ ations or constraints between poses. A belief can be graphically represented with a factor graph – where variable nodes are connected with edges to the factor nodes in which they are in volv ed. In Fig. 5a, we can see an ex emplary factor graph, which represents a belief b with six variables and eight f actors: b ( X ) ∝ f x 1 · f x 1 l 1 · f x 1 l 2 · f x 1 x 2 · f x 2 l 1 · f x 2 l 2 · f x 2 x 3 · f x 3 l 3 , (32) where the state X . = [ x 1 , l 1 , l 2 , x 2 , x 3 , l 3 ] T contains three poses and three landmarks, and f ij is a factor between i and j . As explained, in the linear(ized) Gaussian system, the belief b is described with the information matrix Λ , as sho wn in Fig. 4a. Off-diagonal non-zero entries in the information matrix Λ indicate the existence of factors between the corresponding variables. The belief b can be factorized to a product of conditional probability distributions, in a process kno wn as “variable elimination” (see Davis 2006): b ∝ n − 1 Y i =1 P ( X i | d ( X i )) · P ( X n ) , (33) where d ( X i ) denotes the set of v ariables X i is conditionally dependent on – a subset of the variables which follow X i according to the v ariable (elimination) order . Practically , ﬁxing the variable order in the state sets the decomposition of the belief. Thus, according to Algorithm 1, we begin the sparsiﬁcation process by reordering the state variables, such that all variables in S appear ﬁrst in the state. This step requires us to permute the information matrix accordingly (as sho wn in Fig. 4b); here, we chose S = { x 1 , l 2 , x 2 } . Note that variables can be conditionally dependent even if there is no factor between them. By starting the elimination with the variables in S , we force conditional separation of the variables for sparsiﬁcation and the remaining v ariables, i.e., b ∝ P ( S | ¬S ) · P ( ¬S ) . (34) This means that the no variable in ¬S is conditionally dependent on a variable in S . The factorization of the belief to a product of conditional probabilities can be graphically represented with a Bayesian network (“Bayes net”), as shown in Fig. 5c. In this directed graph, the existence of an edge from node i to j indicates that i ∈ d ( j ) . As established by Dellaert and Kaess (2006), this f actorization is equiv alent to the factorization of the (permuted) information matrix Λ p to its upper triangular square root R p (Fig. 4c). The conditional probability distribution of the i -th v ariable corresponds to the respective row of R p . Off-diagonal entries in that row represent the conditional dependencies: if the off diagonal entry R p ij is non-zero, then X j is in d ( X i ) , and X j is a parent of X i in the Bayes net; speciﬁcally , if all elements on the i -th row , besides the diagonal entry , are zero, then X i is not conditionally dependent on any variable (according to the elimination order), and has no parents in the Bayes net. For more details, see Dellaert and Kaess (2017). According to the next step in the algorithm, we shall now zero off-diagonal entries in R p , in the rows which correspond to variables in S (Fig. 4d); equi v alently , this process can be seen as removing edges from the Bayes net (Fig. 5d). By removing all the off-diagonal entries from the i -th row , we replace the conditional probability distribution P ( X i | d ( X i )) = N  µ ( d ( X i )) , ( R p ii T R p ii ) − 1  (35) with an independent probability distribution o ver X i , P s ( X i ) . = N  µ i , ( R p ii T R p ii ) − 1  . (36) Essentially , we ﬁx the mean of X i to a constant v alue, which is no longer dependent on other variables. W e, of course, would like to preserve the mean of the overall belief, and therefore shall select µ i = X ∗ i . It should be mentioned that this probability distribution is not the marginal distribution ov er X i , which is giv en as N ( X ∗ i , Σ ii ) . The sparisiﬁed belief is thus giv en as the product b s ∝ Y x ∈S P s ( x ) · P ( ¬S ) . (37) The chosen elimination order makes sure that the inner dependencies among the non-sparsiﬁed variables remain exact. Notably , the suggested sparsiﬁcation is performed by manipulating the square root matrix, which is equiv alent to manipulating the Bayes net. In contrast, traditional belief sparsiﬁcation methods (as we reviewed) perform sparsiﬁcation on Λ directly , or equiv alently , the factor graph. Still, we would like to understand what the factor- decomposition, which corresponds to the sparsiﬁed belief, is. 12 (a) F actor Graph (c) Bayes Net (e) F actor Graph (d) Bayes Net (b) (Partial) V ariable Elimination Figure 5. Visualizing the steps of Algorithm 1 (from left-to-r ight), for sparsiﬁcation of a belief with probabilistic g raphical models. (a) The f actor graph of the prior belief b (matching Fig. 4a); the state variables are X . = [ x 1 , l 1 , l 2 , x 2 , x 3 , l 3 ] T , and the subset of variab les selected f or sparsiﬁcation are S = { x 1 , l 2 , x 2 } (circled in green). (b) Eliminating the variables in the f actor graph in order to derive the corresponding Ba yes net; the ﬁgure describes an inter mediate step of the elimination process, after eliminating the variab les in S : x 1 , l 2 , x 2 (in this order); note the added marginal f actor (in pur ple). (c) The ﬁnal Ba yes net of b , after eliminating all the variab les. (d) Remo ving all edges which lead to variab les in S (green arrows); this is the Ba yes net describing the sparsiﬁed belief b s . (e) Ref orming the factor graph of the sparsiﬁed belief b s ; variab les in S are now independent, and each is connected to a modiﬁed prior factor (in green); the remaining v ar iables are inter-connected with the same f actors which connected them originally (in blac k), alongside the marginal factors, which w ere added after elimination of S (in pur ple). Let us look again at the exemplary belief, gi ven in (32). W e begin its factorization (after the initial reordering) by eliminating the variables in S (in order). First, x 1 : b ∝ P ( x 1 | x 2 , l 1 , l 2 ) · f 0 x 2 l 1 l 2 · f x 2 l 1 · f x 2 l 2 · f x 2 x 3 · f x 3 l 3 . (38) Then, l 2 : b ∝ P ( x 1 | x 2 , l 1 , l 2 ) · P ( l 2 | x 2 , l 1 ) · f 0 x 2 ,l 1 · f x 2 l 1 · f x 2 x 3 · f x 3 l 3 . (39) Finally , x 2 : b ∝ P ( x 1 | x 2 , l 1 , l 2 ) · P ( l 2 | x 2 , l 1 ) · P ( x 2 | l 1 , x 3 ) · f 0 x 3 l 1 · f x 3 l 3 . (40) This partial elimination is visualized in Fig. 5b . As we can see, after elimination of variables, new “marginal” factors ( f 0 x 2 l 1 l 2 , f 0 x 2 ,l 1 , f 0 x 3 l 1 ) may be introduced to the belief, representing ne w links among the non-eliminated v ariables; in our case, after eliminating all the sparsiﬁed variables, one marginal f actor still remains: f 0 x 3 l 1 . According to the previous analysis, in the sparsiﬁcation, each of the conditional distributions on the sparsiﬁed variables is replaced with an independent distrib ution. These are, in fact, unitary factors over the variables; here, we mark those as f 00 x 1 , f 00 l 1 , f 00 x 2 . The sparsiﬁed belief can thus be giv en as a product of these unitary factors on the sparsiﬁed variables, the marginal factors introduced after eliminating these variables, and the remaining non-eliminated factors (here, f x 3 l 3 ). Overall, in our e xample, this product is: b s ∝ f 00 x 1 · f 00 l 1 · f 00 x 2 · f 0 x 3 l 1 · f x 3 l 3 (41) The factor graph matching this belief is shown in Fig. 5e. It is clear that the sparsiﬁcation does not affect the elimination of the remaining variables (v ariables in ¬S ). Continuing the elimination process from either b (40) or b s (41) would result in the same distrib ution P ( ¬S ) . T o complete the analysis, we shall note that this sparsiﬁcation method does not change the diagonal entries in the information root matrix, and, thus, the determinants of Λ and Λ s remain the same: | Λ | = | Λ p | =    R p T R p    = | R p | 2 = n Y i ( R p ii ) 2 = | R p s | 2 =    R p s T R p s    = | Λ p s | = | Λ s | . (42) Hence, the sparsiﬁcation method preserves the ov erall entropy of the belief (as deﬁned in (25)), no matter which variables are sparsiﬁed. This is usually not guaranteed in the aforementioned traditional sparsiﬁcation methods. Still, when incorporating new factors in the future, di vergence in entropy between the original and sparsiﬁed beliefs (i.e., simpliﬁcation offset) might indeed happen. This offset depends on the variables selected for sparsiﬁcation, and can even be zero, as we shall discuss next. Since the sparsiﬁed variables become independent, if we wish to update our estimation after applying new actions, or after acquiring a ne w observation of an e xisting variable (i.e., loop closure), information would no longer propagate from a sparsiﬁed variable to another variable, or vice-versa, unless they are observed together . Though, notably , unlike simply marginalizing the sparsiﬁed v ariables out of state, as done in ﬁltering, they can still be updated in the future. Figure 6. A square root matrix (taken from our experimental e valuation) and its sparse appro ximations generated with Algorithm 1, for diff erent variable selections S . On the left – the original matr ix; in the center – the matrix after par tial sparsiﬁcation, of only the uninv olved v ariables (here, about half of the v ariables); on the right – the matrix after full sparsiﬁcation. The matrices on the left and in the center are guaranteed to be action consistent. Full sparsiﬁcation results in a conv enient diagonal appro ximation of the inf ormation. For all degrees of sparsiﬁcation, the determinant of the matr ix remains the same. Elimelech and Indelman 13 3.3 Optimality guarantees 3.3.1 V ariable selection and pre-solution guarantees Next, we shall present the conclusions of our symbolic analysis of the suggested simpliﬁcation method (as explained in Section 2.2). In this e valuation, we utilized our kno wledge on the decision problem formulation, and on Algorithm 1, in order to deriv e general guarantees for the simpliﬁcation loss. More speciﬁcally , we shall explain which v ariables should be sparsiﬁed, such that the ef fect on the objecti ve v alue for each candidate action (i.e., the simpliﬁcation offset) is minimal. Considering a speciﬁc action, a state variable is in volved if applying the action adds a constraint (factor) on it; i.e., if g or h , which deﬁne the relev ant transition and observation models (which are deﬁned in (16) and (17)), are af fected by this variable. Practically , in the collectiv e Jacobian of an action, each of the columns corresponds to a state v ariable, and ev ery ro w represents a constraint; a variable is in volv ed if at least one of the entries in its matching column is non-zero; unin volved variables correspond to columns of zeros. For example, in a navigation scenario, the landmarks we predict to observe by taking the action (along with the current pose) are in volved; variables referring to landmarks from the past, which we do not predict to observe, are unin volv ed. An illustration of this example is giv en in Fig. 7. W e emphasize that since this is a planning problem, the collectiv e Jacobians, the objective v alues, and the in volved variables are determined based on our prediction for the outcome of each action. Further, these components can only be based on our current belief, and not the ground truth, as it is unknown. Thus, although a landmark we identiﬁed as unin volved, might be observed when applying the action (e.g., if the initial belief was distant from the ground truth), this is not a concern in the planning context. As explained, in our formulation, the objecti ve function (27) relies on the “most likely” observation. In other words, we consider only the single “most likely” outcome for each action. Theoretically , we can consider multiple probabilistic outcomes for each action, each determining its own set of in volved v ariables; as mentioned, this generalized discussion is brought by Elimelech (2021). W e claim that for any given action, sparsifying the unin volved variables from the prior belief b , before computing the posterior belief, would not affect the posterior entropy (which deﬁnes our objecti ve function V ). Hence, for a set of candidate actions U , we can sparsify from the prior belief all variables which are unin volv ed in any of the actions, and use this sparsiﬁed belief b s to compute the objectiv e function, without affecting its values. Speciﬁcally , this means that the simpliﬁcation of fset is zero, and that this sparsiﬁed belief is action consistent with the original one: b ' b s . This claim is formally expressed in Theorem 1. A proof for this claim is giv en in Appendix B. Theorem 1. Consider a decision pr oblem P . = ( b, U , V ) , wher e b is a (Gaussian) initial belief, and V is the objective function fr om (27). Considering a set S of state variables, which ar e unin volved in any of candidates in U , Algorithm 1 returns a belief b s , such that ∆( P , P s ) = 0 , wher e P s . = ( b s , U , V ) . In principle, only a single sparsiﬁcation process is conducted for each decision problem (i.e., planning session), regardless of the number of candidate actions. Selecting variables which are unin volv ed in any of the candidate actions allo ws to keep action consistency considering the entire set of candidates. Still, it is possible to break the set of actions to several subsets of similar actions, and consider the unin volved variables in each subset. For each subset we would create a custom prior approximation, and then select the best candidate in each of the subsets, before ﬁnding the ov erall best candidate among those. This can result in a more adapted sparsiﬁcation for each subset. Y et, calculation of the sparsiﬁcation itself has a cost, which needs to be considered when trying to achie ve the best performance. Here we examine the most general case – treating the set of actions as a whole. Remark W e note that if we consider (1) sparsiﬁcation of only unin volved v ariables; (2) the output of Algorithm 1 to be the square root matrix; and (3) no requirement to maintain the original variable order after the sparsiﬁcation Figure 7. A factor g raph representing the belief of an agent in an e x emplar y full-SLAM scenario . The current (prior) state consists of three poses x 1 , x 2 , x 3 (blue nodes), and the position of three landmarks l 1 , l 2 , l 3 (yello w nodes), which were pre viously obser v ed. F actors (black nodes) betw een poses mark motion constraints, and f actors between a pose and a landmark mar k observation constraints. At time of planning, the agent is at pose x 3 , and wishes to inf er which of the candidate paths U = { left , r ight } is the optimal one. If taking the right path, the agent predicts augmenting its state with two ne w poses x right 4 , x right 5 , with motion constraints connecting them to the current pose; based on its current state estimation, it also predicts obser ving landmark l 3 from x right 4 (i.e., adding an obser v ation constraint between l 3 and the new pose). The variab les (from the prior state) inv olved with this action are those directly connected to any of the predicted ne w f actors – x 3 , l 3 . If taking the left path, the agent predicts augmenting its state with two ne w poses x left 4 , x left 5 , and obser ving landmark l 1 from x left 4 . The variab les inv olved with this action are x 3 , l 1 . The inv olved v ariables (in any of the actions) are marked with b lack outline. Note that x 1 , x 2 , l 2 are ne ver in volv ed; these are marked with dark green outline. Theorem 1 suggests that the uninv olved v ariables can be sparsiﬁed from the prior belief (via Algorithm 1), while maintaining action consistency . 14 (by instead, reordering the collectiv e Jacobians); then, there is no need to actually zero entries in the rows of the “sparsiﬁed” v ariables. The initial reordering is suf ﬁcient to make sure that these rows would not be updated when (incrementally) incorporating new constraints. An in-depth look at this variation was examined in our follow-up work (see Elimelech and Indelman 2019). W e proved that sparsifying uninv olved variables does not affect the objective function values, and, therefore, they should always be included in the set S of variables for sparsiﬁcation. It is possible to sparsify also in volved variables, but then “zero offset” and action consistency are not guaranteed. Intuitiv ely , selecting more in volved variables to S results in a sparser approximation, but potentially a larger div ergence from the original objectiv e values. In Appendix A.1, we show that under additional restrictions, we can symbolically deri ve offset (and loss) bounds also when sparsifying in volved variables; these bounds are only applicable for “rank 1” updates, i.e., when the collecti ve Jacobians are limited to a single row . 3.3.2 P ost-solution guarantees For a more general scenario, when sparsifying in volved variables, and with actions possibly having multi-row collectiv e Jacobians, we can try to bound the loss by performing post-solution analysis, as discussed in Section 2.2. Unlike before, such guarantees are derived after solving the simpliﬁed problem (but before applying the selected action). As explained, we can utilize the calculated (simpliﬁed) objectiv e values, and domain-speciﬁc lower and upper bounds of the objecti ve function ( LB , U B , respectiv ely), to yield of fset bounds (6); from these of fset bounds, we can then easily deriv e loss bounds (9). As our decision problem domain relies on beliefs, which, as we saw , can be represented with a (factor) graph, we can potentially exploit topological aspects to deri ve the desired objectiv e bounds. For example, we can utilize conclusions from a recent work by Kitanov and Indelman (2019), which extends a previous work by Khosoussi et al. (2018). There, the following bounds on the information gain were proved, for when the corresponding factor graph contains only the agent’ s poses, and each pose consists of the position and the orientation of the agent (i.e., pose-SLAM): LB top { V ( b, u ) } . = 3 · ln t ( b, u ) + µ + H( b ) , U B top { V ( b, u ) } . = LB top { V ( b, u ) } + n X i =2 ln( d i + Ψ) − ln    ˜ L    , (43) (44) where t ( b, u ) stands for the number of spanning trees in the factor graph of the posterior belief ( b after applying u ); n marks the graph size; ˜ L is the reduced Laplacian matrix of the graph; and d i ’ s are the node degrees corresponding to ˜ L . They also assume that the factors between the poses are described with a constant diagonal noise cov ariance; µ and Ψ are constants which depend on this noise model, and the posterior graph size (i.e., the length of the action sequence). In their demonstration, they show that when the ratio between the angular variance and the position variance is small, these bounds are empirically tight. This case can happen, for example, when a navigation agent is equipped with a compass, which reduces the angular noise. F or a detailed deriv ation of these bounds please refer to Kitanov and Indelman (2019). For different problem domains, it is possible to use various other objecti ve bounds in a similar manner . For example, in Appendix A.2, we present additional bounds, which exploit known determinant inequalities. These make no assumptions on the state structure, and are potentially useful when the matrix Λ is diagonally dominant. 4 Experimental results 4.1 The scenario T o demonstrate the adv antages of the approach, we applied it to the solution of a highly realistic active-SLAM problem. In this scenario, a robotic agent na vigates through a list of goals in an unknown indoor en vironment. W e used the Gazebo simulation engine (K oenig and How ard 2004) to simulate the en vironment and the robot – Pioneer 3-A T , which is a standard ground robot used in academic research worldwide. The robot is equipped with a lidar sensor, Hokuyo UST -10LX. These components can be seen in Fig. 8. Despite examining a 2D navigation scenario, our method does not impose any restrictions on the pose size nor on the state structure. Figure 8. A Pioneer 3-A T robot in the simulated indoor environment. The robot is equipped with a lidar sensor , Hokuyo UST -10LX, as visible on top of it. W e used the pose-SLAM paradigm, meaning, the agent’ s state X k . = ( x T 0 , . . . , x T k ) T consists only of poses k ept along its entire trajectory . Each of these poses consists of three variables, representing the position and orientation. Our approach is highly relev ant in this case, in which the state size grows quickly as the navig ation progresses, making the planning more computationally-challenging. The belief over the state is represented as a factor graph, and implemented using the GTSAM C++ library (Dellaert 2012). When adding a new pose to the graph, the sensor scans the en vironment in a range of 30 meters, and provides a point-cloud of it. This point-cloud is then matched to scans taken in previous poses using ICP matching (Besl and McKay 1992). If a match is found, a loop-closure factor (constraint) is added between these poses. T o keep the computation cost of the scan matching feasible, and to a void creating redundant constraints, we make sure to compare the current pose only to key poses within a certain range of (estimated) distances from it. T ransition (motion) constraints are also created Elimelech and Indelman 15 between ev ery two consecutive poses. Both the observation and motion contain some Gaussian noise, which matches the real hardware’ s specs. Robot Operating System (R OS) is used to run and coordinate the system components – state inference, decision making, sensing, etc. The full indoor map is unknown to the robot, and it is incrementally approximated by it using the scans during the navigation. W e do, howe ver , rely on the full and exact map to produce collison-free candidate trajectories. W e use the Probabilistic RoadMap (PRM) algorithm (Ka vraki et al. 1996) to sample that map, and then use the K-div erse-paths algorithm (V oss et al. 2015) to build a set U of trajectories to the current goal. This usage of the map is irrele vant to the demonstration of our method; in our formulation, we consider the candidate actions are giv en. The complete indoor map is sho wn in Fig. 9, with the sampled PRM graph on it. Each trajectory matches, of course, a certain control sequence, and is translated to a series of factors and constraints to be added to the prior factor graph. Loop closure constraints are added between poses in the new trajectory , and poses in the previously-e xecuted trajectory , according to their estimated location (i.e., where we expect to add them when executing this trajectory). The corresponding collective Jacobians of the candidate trajectories are constructed as explained in Section 3.1. Since all trajectories lead to the goal, we only wish to optimize the “safety” of taking the path. Meaning, keeping the uncertainty of the state low , by preferring a more informativ e trajectory . W e use the aforementioned objectiv e function V (from (31)) to compare between candidates. Under the “maximum likelihood” assumption, our method is only relev ant to the computation of this information-theoretic measure, so for a more con venient discussion, we do not consider other objecti ves, such as the length of the trajectory . T o cover its list of goals, the robot executes sev eral planning sessions. In each session, the robot is provided with one goal, generates a set of candidate trajectories U to it, and selects the best candidate by solving a decision problem. The robot completes ex ecuting the entire selected trajectory before starting a new planning session to the next Figure 9. The entire indoor environment from a top vie w . W alls are colored in light blue . The PRM graph, from which trajectories are built, is colored in red and green. Each square on the map represents a 1m × 1m square in reality . goal. T o ev aluate our method, in each planning session, we solved three decision problems, with each problem using another version of the initial belief. The robot’ s original initial belief accounts for the trajectory of poses executed up to that point (the entire inferred state). The other two versions are generated by sparsifying the original belief using Algorithm 1 – one with partial sparsiﬁcation, and one with full sparsiﬁcation. Overall, in each session, the three conﬁgurations of the decision problem are as follows: 1. P = ( b, U , V ) – the original decision problem; 2. P in volved = ( b in volved , U , V ) – with sparsiﬁcation of the unin volved variables – an action consistent problem. W e remind again that unin volv ed v ariables correspond to columns of zeros in the collecti ve Jacobians of all candidate actions, as explained in Section 3.3.1. 3. P diagonal = ( b diagonal , U , V ) – with sparsiﬁcation of all variables, leading to a diagonal information matrix, b ut not necessarily action consistent. For each conﬁguration, we measured the objecti ve function calculation time for each candidate action, along with the one-time calculation of the sparsiﬁcation itself for the latter two. On the whole, in each planning session, we measure the total decision making time for each of the three conﬁgurations. For a fair comparison of the problems, the objectiv e function calculation was detached from the factor graph-based implementation of the belief. From GTSAM, we extracted the square root matrix of the initial belief, and the collecti ve Jacobians corresponding to (the factors added by) each candidate trajectory . Then, using Algorithm 1, we created the two additional versions of the prior matrix, as detailed before. For each of the three decision problems, i.e., using each version of the prior square root matrix, we calculated the corresponding posterior square root matrix (via QR update); as e xplained in Section 3.1.3, we could then easily extract the determinant of these triangular matrices, to calculate the objectiv e values. At the end of each session, we applied the action selected by conﬁguration 1. Of course, in a real application we would only solve the problem using a single conﬁguration; here we present a comparison of the results for dif ferent conﬁgurations. W e also did not in vest in smart selection of variables for sparsiﬁcation, as e ven full sparsiﬁcation achiev ed very accurate results. 4.2 Results In the following section we present and analyze the results from a sequence of six planning sessions. Of course, these sessions took place after the robot had already executed a certain trajectory in the en vironment, in order to b uild a state in a substantial size, and a map; if the prior state is empty , examining its sparsiﬁcation is vain. Figs. 10-15 showcase a summary of each of the planning sessions, and contain sev eral components: (a) A screenshot of the scenario, which includes: the map estimation (blue occupancy grid); the current estimated position (yellow arrow-head) and goal (yellow circle); the trajectory taken up to that point (thin green line); the candidate trajectories from the current position to the goal (thick lines in v arious colors); and the selected trajectory (highlighted in bright green). 16 (b) A comparison of the objectiv e function values of the candidate actions (i.e., trajectories), considering each of the versions of the initial belief: P with the original belief in red; P in volved with sparsiﬁcation of the unin volved variables in blue; and P diagonal with sparsiﬁcation of all the variables in green. For scale, the comparison also contains the prior differential entropy , before applying any action. This “prior value” is not affected by the sparsiﬁcation, and is the same for the three conﬁgurations (see (42)). (c) A comparison of the the solution time for the three decision problems. Again, P in red, P in volved in blue, and P diagonal in green. The highlighted parts of the blue and green bars mark the cost of the sparsiﬁcation itself out of the total solution time. (d) A comparison of the three versions of the triangular square root matrix. The ﬁgures indicate non-zero entries in each matrix, i.e., their sparsity pattern. (e) The sparsity pattern of the collectiv e Jacobians of the examined trajectories. Again, uninv olved variables are identiﬁed by having columns of zero in all the Jacobians. For the ﬁrst and last sessions we provide an in-depth inspection, including all the components. Since the structure of the belief and Jacobians in all the sessions is similar , for the intermediate sessions we only present a summarized version, with components (a)-(c). The square root matrix and its approximations, gi ven previously in Fig. 6, are extracted from the third session. Additionally , the numerical data shown in the ﬁgures is summarized in T able 1. Further data regarding the loss is later gi ven in T able 2. 4.2.1 Efﬁciency As expected, the sparsiﬁcation leads to a signiﬁcant reduction in decision making time. The simpliﬁed problem P diagonal consistently achiev es the best performance, followed by P in volved , while both are vastly more efﬁcient than the original problem P . Surely , a higher degree of sparsiﬁcation ( S containing more variables) leads to a greater improvement in computation time. As discussed in Section 3.2.1, full sparsiﬁcation of the square root matrix has a particularly low cost – we only need to extract its diagonal. From T able 1 and the run-time comparison bar diagrams, it is clear that the cost of a partial sparsiﬁcation is also minor in relation to the entire decision making. In some of the diagrams, the highlighted section of the bar , which stands for the cost of the sparsiﬁcation, is hardly visible. Also, since the sparsiﬁcation cost does not depend on the number of candidate actions, the larger the set of actions is, the less signiﬁcant the sparsifcation should become. W e see a correlation between the ratio of uninv olved variables and the reduction in run time with P inv olved . V ariables corresponding to the executed trajectory become in volved when a loop closure f actor is created between them and a candidate trajectory . Hence, the ratio of unin volv ed variables represents the overlap of the candidate trajectories with the previously executed trajectory . In the ﬁrst session, the e xecuted trajectory is short, resulting in a relativ ely small state size, and sparse root matrix, since not many loop closures were formed. As the sessions progress, the prior matrix becomes larger and denser , due to new loop closures, as apparent in the sixth session. In principle, we also notice a correlation between the state size and relativ e improvement in performance, for both sparsiﬁcation conﬁgurations. Updating the square root factorization, in order to calculate the posterior determinant, has, at worst, cubical complexity in relation to the matrix size. An update to a variable at the beginning of the state (i.e., a loop closure) may force us to recalculate the entire factorization, baring this maximal computational cost. Sparsiﬁcation of variables reduces the number of elements to update, and thus should be more beneﬁcial when handling larger and denser beliefs. 4.2.2 Accuracy Alongside the undeniable improv ement in efﬁcienc y , we can also examine the quality of the selected action. According to Theorem 1, not only P and P in volved are action consistent, but they produce exactly the same objectiv e values. Hence, solving P in volved always leads to the optimal action selection, and induces no loss. P diagonal is not always action consistent with the original problem, and maintaining the same action selection is not guaranteed; howe ver , it is e vident from Figs. 10-15 that ev en when sparsifying all the variables, the quality of solution is maintained. Not only does the graphs of P and P diagonal maintain a very similar trend, which practically leads to the same action selection, and zero loss, but also the difference (offset) between them is slim. This is also evident by examining the Pearson rank correlation coefﬁcient ρ (which we mentioned in Section. 2.1) between the solutions of the original and simpliﬁed decision problems. A v alue of ρ = 1 represents perfect correlation of the candidate rankings (i.e., action consistency), and ρ = − 1 represents exactly opposite rankings. Clearly , the calculated v alues, presented in T able 2, indicate that P diagonal indeed resulted in an action consistent solution (or very close to it). W e emphasize again, that regardless of the selected action, the inference of the next state remains unchanged, as it is done on the original belief. T able 1. Numer ical summar y for all sessions . “Uninvolv ed var . ratio” represents the percentage of uninv olved v ar iables in the prior state. “Run-time” represents the reduction in decision making time in the speciﬁed conﬁguration, in comparison to the original problem. “Non z eros” represents the reduction in the number of non-z ero entries in the pr ior square root matrix, after using the sparsiﬁcation. “Sparsiﬁcation time” represents the cost of this one-time calculation, out of the entire problem run-time. Session Prior Size P inv olved P diagonal Unin volved var . ratio Run-time Sparsiﬁca- tion time Non zeros Run-time Sparsiﬁca- tion time Non zeros 1 567 46% -23% 3% -76% -55% 1% -97% 2 762 74% -34% 4% -77% -67% 1% -98% 3 1182 60% -66% 1% -83% -85% 1% -99% 4 1269 69% -70% 2% -86% -86% 2% -99% 5 1341 65% -67% 2% -84% -82% 2% -99% 6 1392 44% -52% < 1% -61% -80% < 1% -99% Elimelech and Indelman 17 (a) A screenshot of the scenario, which includes: the map estimation (blue occupancy grid); the current estimated position (y ellow arrow-head) and goal (y ellow circle); the trajectory taken up to that point (thin green line); the candidate trajectories from the current position to the goal (thick lines in v ar ious colors); and the selected trajectory (highlighted in bright green). Candidate Trajectory 800 850 900 950 1000 1050 Objective Value (b) Objective function comparison. 0 0.2 0.4 Run-time (c) Run-time. (d) Original prior information root matr ix and its sparse approximations . 1 5 10 7 2 8 4 3 12 6 17 20 9 19 13 15 11 18 16 14 (e) Collective J acobians of the candidate trajectories. Figure 10. Results summar y for planning session #1. (a) The scenario. Candidate Trajectory 1240 1260 1280 1300 1320 1340 1360 1380 Objective Value (b) Objective function comparison. 0 0.2 0.4 Run-time (c) Run-time. Figure 11. Results summar y for planning session #2 (a) The scenario. Candidate Trajectory 2100 2150 2200 2250 2300 2350 Objective Value (b) Objective function comparison. 0 1.0 2.0 3.0 Run-time (c) Run-time. Figure 12. Results summar y for planning session #3 18 (a) The scenario. Candidate Trajectory 2250 2300 2350 2400 2450 2500 Objective Value (b) Objective function comparison. 0 1.0 2.0 3.0 Run-time (c) Run-time. Figure 13. Results summar y for planning session #4 (a) The scenario. Candidate Trajectory 2400 2450 2500 2550 2600 2650 2700 Objective Value (b) Objective function comparison. 0 1.0 2.0 3.0 Run-time (c) Run-time. Figure 14. Results summar y for planning session #5 (a) A screenshot of the scenario, which includes: the map estimation (blue occupancy grid); the current estimated position (y ellow arrow-head) and goal (y ellow circle); the trajectory taken up to that point (thin green line); the candidate trajectories from the current position to the goal (thick lines in v ar ious colors); and the selected trajectory (highlighted in bright green). Candidate Trajectory 2500 2550 2600 2650 2700 2750 2800 2850 2900 Objective Value (b) Objective function comparison. 0 5.0 10.0 15.0 Run-time (c) Run-time. (d) Original prior information root matr ix and its sparse approximations . 2 1 3 7 4 8 6 18 13 10 5 16 14 17 9 11 15 12 20 19 (e) Collective J acobians of the candidate trajectories. Figure 15. Results summar y for planning session #6. Elimelech and Indelman 19 T able 2. The loss induced by the two simpliﬁed conﬁgurations, alongside the bounds on the loss (of the diagonal conﬁgur ation), f or diff erent noise models. The speciﬁed ratio f or each bound represents the ratio between the angular v ar iance and the position variance. No bound is calculated f or the other conﬁguration, since it is guaranteed to induce no loss. The loss and its bounds are brought as a percentage of the maximal appro ximated value in that session. Also sho wn is Pearson r ank correlation coefﬁcient ρ . Session ρ ( P , P inv olved ) ρ ( P , P diagonal ) loss ( P , P inv olved ) loss ( P , P diagonal ) loss ( P , P diagonal ) bound – 0.01:1 loss ( P , P diagonal ) bound – 0.25:1 loss ( P , P diagonal ) bound – 0.85:1 1 1 0.99 0% 0% 2% 16% 46% 2 1 1 0% 0% 2% 16% 47% 3 1 1 0% 0% 1% 13% 39% 4 1 0.99 0% 0% 1% 15% 43% 5 1 1 0% 0% 1% 16% 43% 6 1 0.99 0% 0% 1% 15% 41% 4.2.3 Guarantees Throughout the experiment, it w as possible to guarantee the quality-of-solution for P diagonal , by bounding loss ( P , P diagonal ) in post-solution evaluation – after solving each (simpliﬁed) planning session, and before applying the selected action. Obviously no bound should be calculated for P in volved , since the loss was guaranteed to be zero in our pre-solution “of ﬂine” e v aluation. As explained in Section 2.2, (9) provides a formula for the loss bound, given the solution of the simpliﬁed problem (which is available), and some domain-speciﬁc bounds/limits for the objecti ve function. Here, we used the topological bounds from (43) and (44), and assigned them in the formula to provide guarantees during each planning session. The tightness of these topological bounds, which affects the tightness of the loss bound, depends on the ratio between the angular variance, and the position v ariance, with which we model the noise in factors between poses; the smaller the angular noise is, in relation to the latter, the tighter the bounds are (as analyzed by Khosoussi et al. (2018) and by Kitanov and Indelman (2019)). Hence, we calculated the loss bound assuming dif ferent noise models (dif ferent such ratios), and examined their ef fects. Such a change to the noise model has a minor effect on the objectiv e ev aluation, since it does not change the sparsity pattern of the matrices; thus, we only present the ef fect on the inferred loss bound, and not on the entire planning process. The bounds, which were calculated assuming different noise ratios, are gi ven in T able 2. The loss and its bounds are brought as a percentage of the maximal approximated objectiv e function value in that session, to allow a correct comparison. In the scenario showcased before, the angular v ariance to position variance ratio was 0.25:1. Indeed, changing the noise model has a signiﬁcant inﬂuence on the tightness of the loss bounds. A ratio of 0.01:1 yields a very tight bound. It is not far-fetched that the angular variance would be this lo w in a navigation scenario, for example, by having a compass, as mentioned before. Raising this ratio results in more conserv ative bounds, especially in comparison to the e xact loss, which is zero. Y et they can still be used to guarantee that the solution stays in an acceptable range. Dev eloping tighter bounding methods for the objectiv e function shall help making these guarantees less conservati ve. T o clarify , this discussion, alongside any assumptions on the noise or state structure, is only brought in order to examine our ability to provide guarantees, using this speciﬁc topological method. It is not essential in any way in order to apply the sparsiﬁcation and improv e the performance. 5 Conclusions In an attempt to allow ef ﬁcient autonomous decision making, and, speciﬁcally , decision making in the (high-dimensional) belief space, we introduced a new solution paradigm, which suggests performing a conscious simpliﬁcation of the decision problem. Its impact is intended to be both conceptual and practical. Conceptually , we claimed that decision making, i.e., identiﬁcation of the best candidate action, can utilize a simpliﬁed representation or approximation of the initial state, without compromising the accuracy of the state inference process. After efﬁciently selecting a candidate action, it should be applied on the original state, which remains exact. On top of that, we presented the simpliﬁcation loss as a quality of solution measure, and explained how it can be bounded (e.g., using the simpliﬁcation offset ) in order to provide guarantees. W e recognized that when the simpliﬁcation maintains action consistency , i.e., when the trend of the objectiv e function is maintained after the simpliﬁcation, there is no loss. Practically , when applying the paradigm to the belief space, decision making can be conducted considering a sparse approximation of the prior belief. W e provided a scalable algorithm for generation of such approximations. This versatile algorithm can generate approximations of different degrees, based on the subset of state variables selected for the sparsiﬁcation. Speciﬁcally , by identifying the problem’ s uninvolved variables , we can provide an action consistent approximation, which is guaranteed to preserve the action selection. As explained in Section 3.2.2, our sparsiﬁcation approach is original and intuiti ve, as it e xploits the belief ’ s underlying Bayes net structure. W e presented an in-depth study of our approach, and demonstrated it in a highly realistic active SLAM simulation. W e showed that using sparsiﬁcation of unin volv ed variables, planning time can be signiﬁcantly reduced, while, as mentioned, guaranteeing no loss in the quality of solution. W e then showed that planning time can be reduced even further , when sparsifying all the state v ariables; in practice, for this conﬁguration, we experienced no loss in the quality of solution, as well. Nonetheless, we demonstrated ho w the theoretical loss in that case can be bounded. The proposed novel paradigm of fers many possible future research directions. In general, other sparsiﬁcation methods, besides the provided algorithm, can be used in similar ways; howe ver , their impact on the action selection should be examined. Potentially , existing (approximated) solution methods for POMDPs can also be ev aluated with our 20 theoretical frame work, to pro vide a standard comparison tool for measuring the accuracy of planning algorithms. Also, this framew ork can be used to develop a scheme for elimination of candidate actions; in fact, we have already de veloped a proof of concept for this idea (Elimelech and Indelman 2017 b ). W e can also examine other simpliﬁcation methods, such as altering the action set or the objecti ve function. Dev eloping simpliﬁcation methods for more general beliefs, such as multi-modal Gaussians, can hold important practical signiﬁcance. Deriv ation of tighter loss bounds is also of interest. Overall, with the versatility of these ideas, we expect the approach to yield a substantial contribution to the research community . 6 Ackno wledgments The authors would like to ackno wledge Dr . Andrej Kitanov from the Faculty of Aerospace Engineering at the T echnion — Israel Institute of T echnology , for insightful discussions concerning Section 3.3.2, and his assistance with implementing the simulation. 7 Declaration of conﬂicting interest The authors declare that there is no conﬂict of interest. 8 Funding This work was supported by the Israel Science Foundation (grant 351/15). References Agarwal, P . and Olson, E. (2012), V ariable reordering strategies for slam, in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’, IEEE, pp. 3844–3850. Agha-mohammadi, A.-a., Agarwal, S., Kim, S.-K., Chakrav orty , S. and Amato, N. M. (2018), ‘Slap: Simultaneous localization and planning under uncertainty via dynamic replanning in belief space’, IEEE T rans. Robotics 34 (5), 1195–1214. Agha-Mohammadi, A.-A., Chakrav orty , S. and Amato, N. M. (2014), ‘FIRM: Sampling-based feedback motion planning under motion uncertainty and imperfect measurements’, Intl. J. of Robotics Resear ch 33 (2), 268–304. Besl, P . and McKay , N. D. (1992), ‘ A method for registration of 3-D shapes’, IEEE T rans. P attern Anal. Machine Intell. 14 (2), 239– 256. Bopardikar , S. D., Englot, B., Speranzon, A. and van den Berg, J. (2016), ‘Robust belief space planning under intermittent sensing via a maximum eigenv alue-based bound’, IJRR 35 (13), 1609–1626. Boyen, X. and K oller, D. (1998), T ractable inference for complex stochastic processes, in ‘Proc. 14 th Conf. on Uncertainty in AI (U AI)’, Madison, WI, pp. 33–42. Carlev aris-Bianco, N. and Eustice, R. M. (2014), Conserv ativ e edge sparsiﬁcation for graph slam node removal, in ‘IEEE Intl. Conf. on Robotics and Automation (ICRA)’, pp. 854–860. Carlev aris-Bianco, N., Kaess, M. and Eustice, R. M. (2014), ‘Generic node removal for factor -graph SLAM’, IEEE T rans. Robotics 30 (6), 1371–1385. Chav es, S. M. and Eustice, R. M. (2016), Efﬁcient planning with the Bayes tree for active SLAM, in ‘Intelligent Robots and Systems (IR OS), 2016 IEEE/RSJ International Conference on’, IEEE, pp. 4664–4671. Davis, T . A. (2006), Direct Methods for Sparse Linear Systems , Fundamentals of Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, P A, United States. Davis, T ., Gilbert, J., Larimore, S. and Ng, E. (2004), ‘ A column approximate minimum degree ordering algorithm’, A CM T rans. Math. Softw . 30 (3), 353–376. Dellaert, F . (2012), Factor graphs and GTSAM: A hands- on introduction, T echnical Report GT -RIM-CP&R-2012-002, Georgia Institute of T echnology . Dellaert, F . and Kaess, M. (2006), ‘Square Root SAM: Simultaneous localization and mapping via square root information smoothing’, Intl. J . of Robotics Researc h 25 (12), 1181–1203. Dellaert, F . and Kaess, M. (2017), ‘Factor graphs for robot perception’, F oundations and T rends in Robotics 6 (1-2), 1–139. Elimelech, K. (2021), Efﬁcient Decision Making under Uncertainty in High-Dimensional State Spaces, PhD thesis, T echnion – Israel Institute of T echnology . Elimelech, K. and Indelman, V . (2017 a ), Consistent sparsiﬁcation for efﬁcient decision making under uncertainty in high dimensional state spaces, in ‘IEEE Intl. Conf. on Robotics and Automation (ICRA)’, pp. 3786–3791. Elimelech, K. and Indelman, V . (2017 b ), Fast action elimination for efﬁcient decision making and belief space planning using bounded approximations, in ‘Proc. of the Intl. Symp. of Robotics Research (ISRR)’. Elimelech, K. and Indelman, V . (2017 c ), Scalable sparsiﬁcation for efﬁcient decision making under uncertainty in high dimensional state spaces, in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’, pp. 5668–5673. Elimelech, K. and Indelman, V . (2019), Introducing PIVO T: Predictiv e incremental v ariable ordering tactic for efﬁcient belief space planning, in ‘Proc. of the Intl. Symp. of Robotics Research (ISRR)’. Elimelech, K. and Indelman, V . (2021), ‘Efﬁcient modiﬁcation of the upper triangular square root matrix on variable reordering’, IEEE Robotics and Automation Letter s (RA-L) 6 (2), 675–682. Frey , K. M., Steiner, T . J. and Ho w , J. P . (2017), ‘Complexity analysis and ef ﬁcient measurement selection primitives for high-rate graph slam’, arXiv pr eprint arXiv:1709.06821 . H ¨ ammerlin, G. and Hof fmann, K.-H. (2012), Numerical mathemat- ics , Springer Science & Business Media. Harville, D. A. (1998), ‘Matrix algebra from a statistician’ s perspectiv e’, T echnometrics 40 (2), 164–164. Hsiung, J., Hsiao, M., W estman, E., V alencia, R. and Kaess, M. (2018), Information sparsiﬁcation in visual-inertial odometry , in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’, pp. 1146–1153. Huang, G., Kaess, M. and Leonard, J. (2012), Consistent sparsiﬁcation for graph optimization, in ‘Proc. of the European Conference on Mobile Robots (ECMR)’, pp. 150 – 157. Indelman, V . (2015), T o wards information-theoretic decision making in a conserv ativ e information space, in ‘ American Control Conference’, pp. 2420–2426. 21 Indelman, V . (2016), ‘No correlations in volved: Decision making under uncertainty in a conservativ e sparse information space’, IEEE Robotics and Automation Letter s (RA-L) 1 (1), 407–414. Indelman, V ., Carlone, L. and Dellaert, F . (2015), ‘Planning in the continuous domain: a generalized belief space approach for autonomous navigation in unknown en vironments’, Intl. J. of Robotics Resear ch 34 (7), 849–882. Kaelbling, L. P ., Littman, M. L. and Cassandra, A. R. (1998), ‘Planning and acting in partially observable stochastic domains’, Artiﬁcial intelligence 101 (1), 99–134. Kaess, M., Johannsson, H., Roberts, R., Ila, V ., Leonard, J. and Dellaert, F . (2012), ‘iSAM2: Incremental smoothing and mapping using the Bayes tree’, Intl. J. of Robotics Researc h 31 (2), 217–236. Karaman, S. and Frazzoli, E. (2011), ‘Sampling-based algorithms for optimal motion planning’, Intl. J. of Robotics Research 30 (7), 846–894. Kavraki, L., Svestka, P ., Latombe, J.-C. and Overmars, M. (1996), ‘Probabilistic roadmaps for path planning in high- dimensional conﬁguration spaces’, IEEE T rans. Robot. Automat. 12 (4), 566–580. Kendall, M. G. (1948), Rank Corr elation Methods , Grifﬁn. Khosoussi, K., Giamou, M., Sukhatme, G. S., Huang, S., Dissanayake, G. and How , J. P . (2018), ‘Reliable graph topologies for SLAM’, Intl. J. of Robotics Resear ch . Kim, A. and Eustice, R. M. (2014), ‘ Acti ve visual SLAM for robotic area coverage: Theory and experiment’, Intl. J. of Robotics Resear ch 34 (4-5), 457–475. Kitanov , A. and Indelman, V . (2019), ‘T opological information- theoretic belief space planning with optimality guarantees’, arXiv pr eprint arXiv:1903.00927 . K oenig, N. and Ho ward, A. (2004), Design and use paradigms for gazebo, an open-source multi-robot simulator , in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’. K opitkov , D. and Indelman, V . (2017), ‘No belief propagation required: Belief space planning in high-dimensional state spaces via factor graphs, matrix determinant lemma and re- use of calculation’, Intl. J. of Robotics Resear ch 36 (10), 1088– 1130. Kretzschmar , H. and Stachniss, C. (2012), ‘Information-theoretic compression of pose graphs for laser-based SLAM’, Intl. J. of Robotics Resear ch 31 (11), 1219–1230. K ¨ unzi, H.-P . A. (2001), Nonsymmetric distances and their associated topologies: about the origins of basic ideas in the area of asymmetric topology , in ‘Handbook of the history of general topology’, Springer , pp. 853–968. Manski, C. F . (1988), ‘Ordinal utility models of decision making under uncertainty’, Theory and Decision 25 (1), 79–104. McAllester , D. A. and Singh, S. (1999), Approximate planning for factored pomdps using belief state simpliﬁcation, in ‘UAI’, Morgan Kaufmann Publishers Inc., pp. 409–416. Mu, B., Paull, L., Agha-Mohammadi, A.-A., Leonard, J. J. and How , J. P . (2017), ‘T wo-stage focused inference for resource-constrained minimal collision navigation’, IEEE T rans. Robotics 33 (1), 124–140. Patil, S., Kahn, G., Laskey , M., Schulman, J., Goldberg, K. and Abbeel, P . (2014), Scaling up gaussian belief space planning through cov ariance-free trajectory optimization and automatic differentiation, in ‘Intl. W orkshop on the Algorithmic Foundations of Robotics (W AFR)’, pp. 515–533. Pineau, J., Gordon, G. J. and Thrun, S. (2006), ‘ Anytime point- based approximations for large POMDPs. ’, J. of Artiﬁcial Intelligence Resear ch 27 , 335–380. Platt, R., T edrake, R., Kaelbling, L. and Lozano-P ´ erez, T . (2010), Belief space planning assuming maximum likelihood observations, in ‘Robotics: Science and Systems (RSS)’, Zaragoza, Spain, pp. 587–593. Porta, J. M., Vlassis, N., Spaan, M. T . and Poupart, P . (2006), ‘Point-based v alue iteration for continuous pomdps’, J . of Machine Learning Resear ch 7 , 2329–2367. Prentice, S. and Roy , N. (2009), ‘The belief roadmap: Efﬁcient planning in belief space by factoring the cov ariance’, Intl. J. of Robotics Resear ch 28 (11-12), 1448–1465. Roy , N., Gordon, G. J. and Thrun, S. (2005), ‘Finding approximate pomdp solutions through belief compression’, J. Artif. Intell. Res.(J AIR) 23 , 1–40. Silver , D. and V eness, J. (2010), Monte-carlo planning in large pomdps, in ‘ Adv ances in Neural Information Processing Systems (NIPS)’, pp. 2164–2172. Stachniss, C., Haehnel, D. and Burgard, W . (2004), Exploration with active loop-closing for FastSLAM, in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’. Thrun, S., Liu, Y ., Koller , D., Ng, A., Ghahramani, Z. and Durrant- Whyte, H. (2004), ‘Simultaneous localization and mapping with sparse extended information ﬁlters’, Intl. J. of Robotics Resear ch 23 (7-8), 693–716. V an Den Berg, J., Patil, S. and Alterovitz, R. (2012), ‘Motion planning under uncertainty using iterative local optimization in belief space’, Intl. J. of Robotics Resear ch 31 (11), 1263–1278. V oss, C., Moll, M. and Kavraki, L. E. (2015), A heuristic approach to ﬁnding div erse short paths, in ‘IEEE Intl. Conf. on Robotics and Automation (ICRA)’, pp. 4173–4179. Y e, N., Somani, A., Hsu, D. and Lee, W . S. (2017), ‘Despot: Online pomdp planning with regularization’, J AIR 58 , 231–266. A ppendices A Additional loss bounds W e present here additional techniques to bound the loss between a decision problem P . = ( b, U , J ) , and its simpliﬁed version P s . = ( b s , U , J ) , which uses a sparse belief approximation, created with Algorithm 1. A.1 Pre-solution guarantees: rank-1 updates W e remind again that according to Lemmas 1 and 2 in Section 2.2, we can use (a bound of) the offset between the problem and its simpliﬁcation, to derive a loss bound. In Section 3.3.1, we prov ed that sparsiﬁcation of the unin volved variables always results in zero offset, and hence zero loss. Now , we sho w that under additional restrictions, we can deriv e an offset bound also when sparsifying in volved variables. Assume that for every action u ∈ U the corresponding collectiv e Jacocian U ∈ R 1 × N contains only a single row , i.e., rank-1 information updates. This can be the case, for example, in sensor placement problems with scalar 22 measurements (like temperature). Now , let us analyze the simpliﬁcation offset: 2 · δ ( P , P s , u ) = (45) 2 · | V ( b, u ) − V ( b s , u ) | = (46)   ln   Λ + U T U   − ln   Λ s + U T U     = (47) (Matrix determinant lemma (see Harville 1998)) | ln  | Λ | ·  1 + U Λ − 1 U T  − ln  | Λ s | ·  1 + U Λ − 1 s U T  | = (48) (Eq. 42)   ln  1 + U Λ − 1 U T  − ln  1 + U Λ − 1 s U T    = (49) | ln  1 + U Λ − 1 s U T + U ( Λ − 1 − Λ − 1 s ) U T  − ln  1 + U Λ − 1 s U T  | = ( ? ) (50) The logarithm is a monotonously increasing concave function, thus, ev ery a, b ∈ R and c ≥ 0 satisfy | ln( a ) − ln( b ) | ≥ | ln( a + c ) − ln( b + c ) | . (51) In other words, the difference in the function value between a pair of inputs decreases, when the inputs equally grow . Surely , 0 ≤ U Λ − 1 s U T , since Λ − 1 s is positive semi-deﬁnite. Thus, we may choose a = 1 + U ( Λ − 1 − Λ − 1 s ) U T , b = 1 , and c = U Λ − 1 s U T . Therefore, ( ? ) ≤   ln  1 + U ( Λ − 1 − Λ − 1 s ) U T  − ln (1)   = (52)   ln  1 + U ( Λ − 1 − Λ − 1 s ) U T    ≤ (53)       ln   1 + α · X i,j ∈I nv ( u ) ( Λ − 1 − Λ − 1 s ) ij         , (54) where I nv ( u ) is the set of (prior state) variables inv olved in u , and the scalar α complies to α ≥ max i U 2 i . W e recall that U i is uninv olved ⇐ ⇒ U i = 0 . When considering the in volved v ariables among all the actions, and α is valid ∀ u ∈ U , this bound becomes independent of a speciﬁc action, and only a single expression needs to be calculated. Overall, we can conclude the following bound on the of fset: ∆( P , P s ) ≤ 1 2 ·       ln   1 + α · X i,j ∈I nv ( U ) ( Λ − 1 − Λ − 1 s ) ij         . (55) As we may notice, this symbolic bound depends on the initial belief of the original and simpliﬁed problems, yet not on their solution; it hence can be utilized before actually solving the problem. When calculating this bound, we considered only single-row collectiv e Jacobians, but otherwise arbitrary . Although, the considered assumption is restrictiv e, the concluded bound is indeed usable for certain problems, as evident in our follow-up work (Elimelech and Indelman 2017 b ). Guaranteed action consistency for the case of single-ro w Jacobians, which are also limited to a single non-zero entry , was pre viously shown by Indelman (2016). A.2 P ost-solution guarantees W e recall that the of fset can also be bounded by utilizing domain-speciﬁc upper and lo wer bounds of the objectiv e function ( U B , LB , respectiv ely), as indicated in (6). In addition to the topological objectiv e bounds, which were presented in Section 3.3.2, we may also utilize alternative bounds, which rely on known determinant bounds. For the lower bound, we can use Minkowski determinant inequality , which states that for positiv e semi-deﬁnite matrices M 1 , M 2 ∈ R N × N | M 1 + M 2 | 1 N ≥ | M 1 | 1 N + | M 2 | 1 N , (56) ln | M 1 + M 2 | ≥ N · ln  | M 1 | 1 N + | M 2 | 1 N  . (57) Let us assign M 1 . = Λ , M 2 . = U T U ; when U T U is not a full rank update (e.g. U has less than N rows),   U T U   = 0 , and we are left with ln   Λ + U T U   ≥ ln | Λ | (58) For formality , it is easy to show that ev en if the prior state size is smaller than N , the validity of the conclusion is not compromised. For the upper bound, we can use Hadamard inequality , which states that for a positive semi-deﬁnite matrix M ∈ R N × N | M | ≤ N Y i =1 ( M ) ii . (59) Let us assign M . = Λ + U T U ; then   Λ + U T U   ≤ N Y i =1 ( Λ + U T U ) ii , (60) ln   Λ + U T U   ≤ n X i =1 ln[( Λ + U T U ) ii ] . (61) Overall, we get the follo wing objective function bounds: LB det { V ( b, u ) } . = ln | Λ | − N · ln(2 π e ) , U B det { V ( b, u ) } . = N X i =1 ln[( Λ + U T U ) ii ] − N · ln(2 π e ) , (62) (63) where Λ is the information matrix of prior belief b , and U is the collecti ve Jacobian of action u , and N is the posterior state size. Unlike the bounds presented in Section 3.3.2, these bounds are extremely general, as they make no assumptions on the state nor actions, besides the standard problem formulation. As expected, this advantage comes at the expense of tightness. Nonetheless, they may especially be useful when the matrix Λ is diagonally dominant. 23 B Proofs B.1 Lemma 1 Proof. Refer to the proof of the more general case, stated in Lemma 6. B.2 Lemma 2 Proof. Refer to Elimelech (2021) for an or an extended discussion and formulation of this statement. B.3 Lemma 3 The properties are tri vially giv en from the deﬁnition of action consistency . B.4 Lemma 4 Proof. Assume f is a monotonously increasing function such that for ev ery two actions a i , a j ∈ A f ( V 1 ( ξ 1 , a i )) = V 2 ( ξ 2 , a i ) , f ( V 1 ( ξ 1 , a j )) = V 2 ( ξ 2 , a j ) , (64) then f ( V 1 ( ξ 1 , a i )) < f ( V 1 ( ξ 1 , a j )) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) , (65) Because f is monotonously increasing, then f ( x ) < f ( y ) ⇐ ⇒ x < y , and V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) . (66) Meaning, ( ξ 1 , A , V 1 ) ' ( ξ 2 , A , V 2 ) . Now to prove the opposite direction, assume ( ξ 1 , A , J 1 ) ' ( ξ 2 , A , J 2 ) ; hence, V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) . (67) Let us deﬁne a ne w function f on the domain { V 1 ( ξ 1 , a ) | a ∈ A} such that f ( V 1 ( ξ 1 , a )) . = V 2 ( ξ 2 , a ) . Giv en this deﬁnition and the action consistency conditions from (67), we can conclude that f ( V 1 ( ξ 1 , a i )) < f ( V 1 ( ξ 1 , a j )) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) ⇐ ⇒ V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) . (68) Thus, f is monotonously increasing on its domain. B.5 Lemma 5 Proof. Both directions are a direct consequence of Lemma 4. Assume ∆ ∗ ( P , P s ) = 0 . Thus, a monotonously increasing function f exists such that ∆( P , P f s ) = 0 . Meaning, for ev ery action a ∈ A , f ( V s ( ξ s , a )) = V ( ξ , a ) . According to Lemma 4, it is sufﬁcient to pro ve that P ' P s . T o prove the opposite direction, assume P ' P s . Let us deﬁne a ne w function f on the domain { V s ( ξ s , a ) | a ∈ A} such that f ( V s ( ξ s , a )) . = V ( ξ , a ) . From this deﬁnition, ∆( P , P f s ) = 0 . Also, according to Lemma 4, this function f is monotonously increasing, and thus ∆ ∗ ( P , P s ) = 0 . B.6 Lemma 6 Proof. From the deﬁnition of the simpliﬁcation offset, we kno w that for e very monotonously increasing function f , the follo wing is true: | V ( ξ , a ∗ ) − f ( V s ( ξ s , a ∗ )) | ≤ ∆( P , P f s ) , (69) | V ( ξ , a ∗ s ) − f ( V s ( ξ s , a ∗ s )) | ≤ ∆( P , P f s ) . (70) Removing the absolute v alues surely does not compromise the inequalities: V ( ξ , a ∗ ) − f ( V s ( ξ s , a ∗ )) ≤ ∆( P , P f s ) , (71) f ( V s ( ξ s , a ∗ s )) − V ( ξ , a ∗ s ) ≤ ∆( P , P f s ) . (72) By adding the two inequalities, and utilizing the deﬁnition of the loss , we get: loss ( P , P f s ) + f ( V s ( ξ s , a ∗ s )) − f ( V s ( ξ s , a ∗ )) ≤ 2 · ∆( P , P f s ) . (73) From the deﬁnition of a ∗ s , we know that V s ( ξ s , a ∗ s )) ≥ V s ( ξ s , a ∗ ) . (74) Since f is monotonously increasing, then also f ( V s ( ξ s , a ∗ s ))) ≥ f ( V s ( ξ s , a ∗ )) , (75) f ( V s ( ξ s , a ∗ s ))) − f ( V s ( ξ s , a ∗ )) ≥ 0 . (76) Thus, we can infer that loss ( P , P f s ) ≤ 2 · ∆( P , P f s ) . (77) Since the ﬁnal statement is true for any monotonously increasing function f , we may conclude the desired upper bound ov er the loss, loss ( P , P s ) ≤ 2 · ∆ ∗ ( P , P s ) (78) B.7 Lemma 7 Proof. Let us examine three decision problems P 1 , P 2 , P 3 , where P i . = ( ξ i , A , V i ) . First, let us deﬁne the notation δ ( P i , P j , a ) . = | V i ( ξ i , a ) − V j ( ξ j , a ) | . Now , for each two problems P i , P j , we mark a ij ∈ A as the action, and f ij as the balance function, for which ∆ ∗ ( P i , P j ) . = δ ( P i , P f ij j , a ij ) (the values can be chosen arbitrarily from all values which comply to the equation). According to this 24 notation we can conclude: ∆ ∗ ( P 1 , P 2 ) + ∆ ∗ ( P 2 , P 3 ) . = δ ( P 1 , P f 12 2 , a 12 ) + δ ( P 2 , P f 23 3 , a 23 ) ≥ δ ( P 1 , P f 12 2 , a 13 ) + δ ( P 2 , P f 23 3 , a 13 ) . = | V 1 ( ξ 1 , a 13 ) − f 12 ( V 2 ( ξ 2 , a 13 )) | + | V 2 ( ξ 2 , a 13 ) − f 23 ( V 3 ( ξ 3 , a 13 )) | ≥ | V 1 ( ξ 1 , a 13 ) − f 12 ( V 2 ( ξ 2 , a 13 ))+ V 2 ( ξ 2 , a 13 ) − f 23 ( V 3 ( ξ 3 , a 13 )) | . = ( ?? ) . (79) Let us deﬁne the following scalar function: F ( x ) . = f 23 ( x ) + f 12 ( V 2 ( ξ 2 , a 13 )) − V 2 ( ξ 2 , a 13 ) = f 23 ( x ) + constant . (80) Since f 23 is a monotonously increasing, so is F , and ( ?? ) = | V 1 ( ξ 1 , a 13 ) − F ( V 3 ( ξ 3 , a 13 )) | . = δ ( P 1 , P F 3 , a 13 ) ≥ δ ( P 1 , P f 13 3 , a 13 ) = ∆ ∗ ( P 1 , P 3 ) . (81) Hence, ∆ ∗ satisﬁes the triangle inequality . B.8 Corollar y 1 Proof. Let us mark as R p s the sparsiﬁed square root matrix, before permuting the v ariables back to their original order in line 8 of Algorithm 1. First, we show that applying the reverse permutation P 2 P T on R p s indeed leads to a square root of the sparse information matrix Λ s (in the original order): ( P R p s P T ) T ( P R p s P T ) = P R p s T R p s P T = P Λ p s P T = Λ s , (82) where Λ p s is the sparsiﬁed information matrix, before permuting the variables back. Now , we want to examine the shape of the matrix R s . = P R p s P T , and sho w that it is indeed triangular . According to Algorithm 1, before executing line 8, R p s is of the following structure: R p s =  diagonal 0 0 triangular  , (83) where the rows of the diagonal block correspond to the sparsiﬁed variables. W ithout losing generality , we should only prov e that applying a permutation of the form p 0 : (1 , . . . , n ) 7→ (2 , . . . , i, 1 , i + 1 , . . . , n ) on this matrix (i.e., “pushing forwards” one of the sparsiﬁed v ariables), does not break the triangular form. Hence, assuming P T is the column permutation matrix matching such p 0 , let us look at R s . = P R p s P T = P      d ∈ R 0 . . . 0 0 . . . 0 triangular      P T =             0 . . . 0 triangular ∗ d 0 . . . 0 0 . . . 0 0 . . . 0 0 triangular             P T =             triangular 0 . . . 0 ∗ 0 . . . 0 d 0 . . . 0 0 0 . . . 0 triangular             . (84) Recursiv ely utilizing this conclusion, for more intricate permutations, proves that R s is indeed triangular , whene ver permuting the sparsiﬁed variables back to their original order , as desired. B.9 Theorem 1 Proof. Consider a belief b = N ( X ∗ , Λ − 1 ) , where the state contains n 1 unin volved v ariables and n 2 in volved v ariables, such that n = n 1 + n 2 is the prior state size. Also consider the simpliﬁed belief b s = N ( X ∗ , Λ − 1 s ) , in which all unin volved variables were sparsiﬁed, by applying Algorithm 1. W e mark with P the (column) permutation matrix that positions all the inv olved variable at the end of the state. Now , let R p be the Cholesky factor of the permuted information matrix Λ p . = P T Λ P , such that Λ p = R p T R p . This R p can be divided into block form: R p . =  R p 11 R p 12 0 n 2 × n 1 R p 22  , (85) where R p 11 ∈ R n 1 × n 1 and R p 22 ∈ R n 2 × n 2 are triangular sub- matrices, R p 12 ∈ R n 1 × n 2 , and 0 n 1 × n 2 is a zero matrix in the speciﬁed size. By following the steps of Algorithm 1, we realize that the returned sparsiﬁed information matrix Λ s is given as Λ s . = P R p s T R p s P T (or , equally , satisﬁes P T Λ s P . = R p s T R p s ), where R p s . =  D p 11 0 n 1 × n 2 0 n 2 × n 1 R p 22  , (86) and D p 11 is the diagonal matrix formed by cop ying the diagonal of R p 11 (and assigning zero elsewhere). W e would like to ﬁnd the simpliﬁcation offset between the two decision problems P and P s (for which b and b s are 25 the initial beliefs, respectiv ely). Let us consider a candidate action u ∈ U with a collectiv e Jacobian U ∈ R h × ( n + m ) , where n + m is the posterior state size. W e may deriv e the following from the deﬁnition of the offset and the objecti ve function V : δ ( P , P s , u ) = 1 2 ·    ln    ˘ Λ + U T U    − ln    ˘ Λ s + U T U       . (87) Now , let us examine the follo wing expression: # . =    ˘ Λ + U T U    −    ˘ Λ s + U T U    , (88) W e kno w that (unitary) variable permutation does not affect the determinant of a matrix, thus # =    ˘ P T  ˘ Λ + U T U  ˘ P    −    ˘ P T  ˘ Λ s + U T U  ˘ P    =    ˘ P T ˘ Λ ˘ P + ( U ˘ P ) T ( U ˘ P )    −    ˘ P T ˘ Λ s ˘ P + ( U ˘ P ) T ( U ˘ P )    , (89) where ˘ P . =  P 0 n × m 0 m × n I m × m  (90) is the augmented permutation matrix, which keeps the variables added in the update at the end of the state. Note that if the variables were not originally added to the end of the state, the permutation ˘ P can be easily adapted to enforce this property . W e can also augment the matrix R p with m empty columns (and similarly for R p s ): ˘ R p . =  R p 11 R p 12 0 n × m 0 n 2 × n 1 R p 22  , (91) and assign the result in # , to yield: # =    ˘ R p T ˘ R p + ( U ˘ P ) T ( U ˘ P )    −     ˘ R p s T ˘ R p s + ( U ˘ P ) T ( U ˘ P )     (92) This expression can be reor ganized to the following form: # =       ˘ R p U ˘ P  T  ˘ R p U ˘ P       −       ˘ R p s U ˘ P ! T ˘ R p s U ˘ P !       (93) The two matrices which appear in this expression also follo w a block form:  ˘ R p U ˘ P  =  R p 11 R p 12 0 n 1 × m 0 ( n 2 + h ) × n 1 B  , (94) ˘ R p s U ˘ P ! =  D p 11 0 n 1 × ( n 2 + m ) 0 ( n 2 + h ) × n 1 B  , (95) where B . =  R p 22 0 n 2 × m U in v  , (96) and U in v is a sub-matrix of U ˘ P , containing its right n 2 + m columns. Since the left n 1 columns of U ˘ P correspond to unin volved v ariables, we know the y may only contain zeros. Thus, if we mark ˘ R p 12 . =  R p 12 0 n 1 × m  , then the left term in (93) is:       ˘ R p U ˘ P  T  ˘ R p U ˘ P       =      R p 11 T R p 11 R p 11 T ˘ R p 12 ˘ R p 12 T R p 11 ˘ R p 12 T ˘ R p 12 + B T B      . (97) From the block-determinant formula (see Harville 1998), this equals to    R p 11 T R p 11    ·     ˘ R p 12 T ˘ R p 12 + B T B − . . . ˘ R p 12 T R p 11 R p 11 − 1 R p 11 T − 1 R p 11 T ˘ R p 12     = | R p 11 | 2 ·   B T B   (98) The right term in (93) is:       ˘ R p s U ˘ P ! T ˘ R p s U ˘ P !       =     D p 11 T D p 11 0 0 B T B     =    D p 11 2    ·   B T B   (99) Since R p 11 and D p 11 are triangular matrices with the same diagonal, their determinants are equal (to the product of the diagonal elements). Thus, # = 0 , and ov erall    ˘ Λ + U T U    =    ˘ Λ s + U T U    . (100) This surely means that ln    ˘ Λ + U T U    − ln    ˘ Λ s + U T U    = 0 . (101) Finally , assigning this expression in (87) means that δ ( P , P s , u ) = 0 . (102) Since the pre vious conclusion is true ∀ u ∈ U , this means that ∆( P , P s ) . = max u ∈U δ ( P , P s , u ) = 0 , (103) as desired.

Simplified decision making in the belief space using belief sparsification

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment