Simplified decision making in the belief space using belief sparsification

In this work, we introduce a new and efficient solution approach for the problem of decision making under uncertainty, which can be formulated as decision making in a belief space, over a possibly high-dimensional state space. Typically, to solve a d…

Authors: Khen Elimelech, Vadim Indelman

Simplified decision making in the belief space using belief   sparsification
Simplified decision making in the belief space using belief spar sification Khen Elimelech 1 and V adim Indelman 2 Abstract In this work, we introduce a new and efficient solution approach f or the prob lem of decision making under uncer tainty , which can be f ormulated as decision making in a belief space, ov er a possib ly high-dimensional state space. T ypically , to solve a decision prob lem, one should identify the optimal action from a set of candidates, according to some objectiv e. W e claim that one can often generate and solve an analogous yet simplified decision problem, which can be solved more efficiently . A wise simplification method can lead to the same action selection, or one f or which the maximal loss in optimality can be guaranteed. Fur ther more, such simplification is separated from the state inference and does not compromise its accuracy , as the selected action would finally be applied on the original state. First, we present the concept f or general decision prob lems and provide a theoretical frame w ork for a coherent f ormulation of the approach. W e then practically apply these ideas to decision problems in the belief space, which can be simplified by consider ing a sparse appro ximation of their initial belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a realistic activ e-SLAM problem and manage to significantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical, and holds numerous possib le e xtensions . Ke ywor ds Decision making under uncer tainty , belief space planning, POMDP , sparse systems, sparsification, activ e SLAM 1 Introduction 1.1 Backg round In this era, intelligent autonomous agents and robots can be found all around us. They are designed for v arious functions, such as operating in remote domains, e.g., underwater and space; imitating humans and interacting with them; performing repetiti ve tasks; and ensuring safety of operations. They might be physically noticeable, e.g., personal-use drones, industrial robotic arms, and military vehicles; or less so, with the popularization of internet of things (IoT), smart homes, and virtual assistants. Still, these agents share the same fundamental goal – to autonomously plan and execute their actions. Y et, the increasing demand for these “smart” systems presents new challenges: integration of robotic agents into ev eryday life requires them to operate in real time, using inexpensi ve hardware. In addition, when planning their actions, these agents should account for real-world uncertainty in order to achiev e reliable and robust performance. There are multiple possible sources for such uncertainty , including dynamic en vironments, in which unpredictable e v ents might occur; noisy or limited observations, such as an imprecise GPS signal; and inaccurate deliv ery of actions. Also, problems, such as long-term autonomous navig a- tion, and sensor placement ov er lar ge areas, often in volv e op- timization of numerous v ariables. These settings require rea- soning ov er high-dimensional probabilistic states, kno wn as “beliefs”. Appropriately , the corresponding planning prob- lem is kno wn as Belief Space Planning (BSP). The objectiv e in such a problem is to select “safe” actions, which account for the uncertainty of the agent’ s belief. Other relev ant instantiations include acti ve Simultaneous Localization and Mapping (SLAM), acti ve sensing, robotic manipulation, and ev en cognitiv e tasks, such as dialogue management. The BSP problem is often modeled as a Partially Observable Markov Decision Process (POMDP), according to which we shall propagate the belief, and ev aluate the development of uncertainty , considering multiple courses of action (Kael- bling et al. 1998). Further , proper uncertainty measures, such as differential entropy , are expensi ve to calculate for high- dimensional and continuous beliefs. Overall, the compu- tational complexity of the problem can turn exceptionally high, thus making it challenging for online systems, or when having a limited processing po wer . 1.2 Objectiv es and approach ov er view The previous discussion leads us to our main goal – allowing computationally ef ficient decision making. Note that in this study , we dif ferentiate between planning and decision making. Planning is a broad concept, which takes into consideration many aspects, such as goal setting and balancing, generation of candidate actions, accounting for different planning horizons and future developments, 1 Robotics and Autonomous Systems Progr am, T echnion — Israel Institute of T echnology . 2 Depar tment of Aerospace Engineering, T echnion — Israel Institute of T echnology Corresponding author: Khen Elimelech, T echnion, Haifa 3200003, Isr ael. Email: khen@technion.ac.il 2 coordination of agents, and so on. After refining these aspects, we e ventually result in a decision problem: considering an initial state, and a given set of candidate actions (or action sequences), we use an objective function to measure the scalar values attained by applying each action on the initial state; to solve the problem, we shall identify the optimal candidate action, which generates the highest objective v alue. With this rudimentary view- point, we dismiss problem-specific attrib utes, which allo ws our formulation to address a wider range of problems. Nonetheless, our work heavily focuses on contrib uting to decision making in the belief space. In these decision problems, the initial state is a belief over a (possibly) high- dimensional state, and the objecti ve function is a belief-based information-theoretic value, measured from the propagated (updated) belief, after applying a candidate action. A traditional solution to the decision problem requires calculation of the objective function for each candidate action. W e would like to reduce the cost of the solution by sparing this exhausti ve calculation and comparison. Instead, we suggest to identify and solve a simplified decision problem, which leads to the same action selection, or one for which the loss in quality of solution can be bounded. A problem may be simplified by adapting each of its components – initial state, objecti ve function, and candidate actions. T o allow such analysis, we first provide a general theoretical framew ork, which does not depend on any problem-specific attributes; the frame work allows us to formally quantify the effect of the simplification on the action selection, and form optimality guarantees for it. W e then show how these ideas can be practically applied to high-dimensional BSP problems. In this case, the problem is simplified by considering a sparse approximation of the initial belief, which can be efficiently propagated, in order to calculate the candidates’ objective values. The resulting simplified problem can be solved in any desired manner, making our approach complementary to other solvers. Furthermore, while se veral works already utilize belief sparsification to allow long-term operation and tractable state inference, the nov elty in our approach is the exploitation of sparsification e xclusively and dedicatedly for ef ficient decision making. After solving the decision problem, the selected action is then applied on the original belief; by such, we do not compromise the accuracy of the estimated state. For clarity , we list down the contributions of this work, in the order they are presented in the manuscript: 1. A theoretical framew ork supporting the concept of decision problem simplification; 2. Formulation of decision making in the belief space, and application of the concept to it; 3. A scalable belief sparsification algorithm; 4. Deriv ation of quality-of-solution guarantees; 5. Experimental demonstration in a highly realistic activ e-SLAM scenario, where a significant improve- ment in run-time is achiev ed. Please note that this paper extends our previous publica- tions (Elimelech and Indelman 2017 a , b , c ). Besides the ex- panded experimental ev aluation, the belief sparsification al- gorithm, which was previously introduced, is now reformed to a more stable and efficient version. Also, the theoretical formulation includes se veral re visions and corrections to previously introduced definitions; the conclusive versions are those presented here. Also, to allow fluid reading, proofs for all theorems, lemmas, and corollaries are given in the appendix. 1.3 Related work Sev eral works explore similar ideas to the ones presented here. In this section we do our best to provide an extensi ve revie w of such works, in comparison to ours. As mentioned, numerous methods consider sparsification for the probabilistic state inference problem, in order to limit the belief size, and improve its tractability for long-term operation. Although being a well-researched concept, these methods do not examine sparsification in the context of planning problems (influence ov er action selection, computational benefits, etc.). Thrun et al. (2004), for example, showed that in a SLAM scenario, when using the information filter, forcing a certain sparsity pattern on the belief ’ s information matrix can lead to improved efficienc y in belief update. Howe ver , they emphasized that the approximation quality was not guaranteed and that certain scenarios could lead to significant div ergence. Also, since Dellaert and Kaess (2006) demonstrated the equiv alence between sparse matrices and (factor) graphs for belief representation, graph-based solutions for SLAM problems (which is often a sparse problem) hav e become more popular . Accordingly , methods for graph sparsification hav e also gained relev ance. For example, Huang et al. (2012) introduced a graph sparsification method, using node marginalization. The resulting graph is notably consistent, meaning, the sparsified representation is not more confident than the original one. Several other approaches suggest to sparsify the graph using the Cho w-Liu tree approximation, and show that the KL-div ergence from the original graph remains low (Carlev aris-Bianco et al. 2014; Carlev aris- Bianco and Eustice 2014; Kretzschmar and Stachniss 2012). Hsiung et al. (2018) reach similar conclusions for fixed- lag Marko v blankets. Notably , our sparsification method, which is presented both in matrix and graph forms, preserves the dimensionality of the belief, and only modifies the correlations between the variables. It is also guaranteed to exactly preserv e the entropy of the belief. The approach described by Mu et al. (2017) separated the sparsification into two stages: problem-specific remov al of nodes, and problem-agnostic remov al of correlations. The authors then demonstrated the superiority of their scheme ov er agnostic graph optimization, in terms of collision percentage. This two-stage solution reminds the logic in our sparsification method: first, identifying variables with minimal contribution to the decision problem, and then sparsification of corresponding elements. Of course, we use such sparsification for planning and not graph optimization. Exploiting sparsity to impro ve ef ficiency can also be done in other manners. Fundamental works (e.g., Davis et al. 2004), alongside ne wer ones (e.g. Frey et al. 2017; Agarwal and Olson 2012), provide heuristics for variable elimination order or variable pruning order , in order to minimize fill- in during factorization of the information matrix (which is utilized during belief propagation). Elimelech and Indelman 3 In the context of planning under uncertainty and POMDP , the research community has been extensi vely inv estigating solution methods to provide better scalability for real-world problems. Finding optimal solutions (policies) according to the POMDP formulation is often done by utilizing dynamic programming algorithms, such as, value and polic y iteration (e.g., Porta et al. 2006; Pineau et al. 2006). Such methods are extremely computationally demanding, especially when considering high-dimensional state space (i.e., search spaces). These methods are thus generally not suitable for “online” planning problems for autonomous agents, in which we want to infer a specific sequence of actions to be ex ecuted immediately . Instead, when considering “online” scenarios, we typically perform a forward search from the current belief, and often forced to rely on approximated solutions. Standard online POMDP solvers (e.g., Silver and V eness 2010; Y e et al. 2017) often perform search in the state-space, and not the belief space, as we care to do here. W orks which do consider planning in the belief space, typically focus on methods for alleviating the search. For example, some solution methods perform direct (localized) trajectory optimization (e.g. Indelman et al. 2015; V an Den Berg et al. 2012). Otherwise, while building on established motions planners (e.g., Karaman and Frazzoli 2011; Kavraki et al. 1996), works such as the Belief Roadmap (by Prentice and Roy 2009), FIRM (by Agha-Mohammadi et al. 2014), SLAP (by Agha-mohammadi et al. 2018), and others (e.g., by P atil et al. 2014) rely on sub-sampling a finite graph in the belief space, in which the solution can be searched. Ho wev er, such methods are sev erely limited, by only allowing propagation of the belief over a single (most-recent) pose through the graph; i.e., they perform lo w-dimensional pose filtering, rather than high-dimensional belief smoothing, as we do. This forced marginalization of state variables surely compromises the accuracy of the estimation, and limits the applicability to (problems such as) active-SLAM, in which we often wish to examine the information (uncertainty) of the entire posterior state, including the map and/or ex ecuted trajectory (Stachniss et al. 2004; Kim and Eustice 2014). Nonetheless, we do not focus on generation (or sampling) of candidates, but, instead, on efficient comparison of their objectiv e values, by lowering the cost of belief updates. Hence, our approach is complementary to the aforemen- tioned graph-based methods, which focus on generating feasible candidates. W e demonstrated this compatibility in our experimental ev aluation, where we used a graph-based motion planner (from the most recent pose) to simply gen- erate a set of candidate actions; we then efficiently selected the optimal candidate by propagating the sparsified (high- dimensional) belief, and ev aluating its posterior uncertainty . In that regard, we may mention additional works which similarly address the issue of high-dimensional belief prop- agation, in the context of active-SLAM (e.g., Cha ves and Eustice 2016; K opitkov and Indelman 2017). Also, closely related to our approach, sev eral other works examine approximation of the state or the objectiv e function in order to reduce the planning complexity . A recent approach (Bopardikar et al. 2016) suggested using a bound over the maximal eigen v alue of the co variance matrix as a cost function for planning, in an autonomous navigation scenario. Benefits of using this cost function include easy computation, holding an optimal substructure property (incremental search) and the ability to account to misdetection of measurements. Y et, the actual quality of results in terms of final uncertainty , when measured in con ventional methods, is unclear . Their usage of bounds in attempt to improve planning efficiency reminds aspects of our work; howe ver , we use bounds to quantify the quality of solution. As they mention in their discussion, an unanswered question is the dif ference in quality of solution between planning using the exact maximal eigen value, and planning using its bound. Our theoretical frame work might be able to provide answer to this question. Boyen and Koller (1998) suggested maintaining an approximation of the belief for efficient state inference. This approximation is done by dividing state variables into a set number of classes, and then using a product of marginals, while treating each class of variables as a single “meta variable”. A k -class belief simplification cuts the original exponential inference complexity by a factor of k . The study showed that in rapidly-mixing POMDPs the expectation of the error could be bounded. This simplification method was later examined under a restrictiv e planning scenario (McAllester and Singh 1999). The planning was performed using a planning-tree search, in which a constant amount of possible observations was sampled for each tree lev el, and again assuming a rapidly- mixing POMDP . There, the error induced by planning in the approximated belief space can be bounded as well. This method shares similar objecti ves with our work, b ut examines a v ery specific scenario, which limits its generality . In the approach described by Roy et al. (2005), the authors attempted to find approximate POMDP solutions by utilizing belief compression, which was done with a PCA- based algorithm. This key idea is similar to ours, yet, in that work, the objectiv e value calculation (i.e., decision making) still relied on the original decompressed belief, instead of the simplified one. Thus, no apparent computational improv ement was achiev ed in planning complexity . The paper also did not make a comparison of this nature, and only presented analysis on the quality of compression. The work presented by Indelman (2015, 2016) contained the first explicit attempt to use belief sparsification to specifically achieve efficient planning. The papers showed that using a diagonal cov ariance approximation, a similar action selection could usually be maintained, while significantly reducing the complexity of the objecti ve calculation. This claim, howe ver , is most often not guaranteed. Optimal action selection was only proved under sev erely simplifying assumptions – when candidate actions and observations only update a single state variable, with a rank-1 update of the information. This attempt inspired our extensi ve research and in-depth, formal analysis. Finally , it is worth mentioning that the idea of examining only the order of candidate actions, instead of their cardinal objectiv e values, sometimes appears in the context of economics under the term or dinal utility (e.g. Manski 1988); this term, howe ver , is not prominent in the context of artificial intelligence. W e examine a similar idea in our theoretical framew ork, to follow . 4 2 Simplified decision making T o begin with, let us consider a decision pr oblem P , which we formally define in Definition 1. Definition 1. A decision pr oblem P is a 3-tuple ( ξ , A , V ) , where ξ is the initial state , from which we examine a set of candidate actions A (finite or infinite), using an objective function V : { ξ } × A → R . Solving the problem means selecting the optimal action a ∗ , such that a ∗ = argmax a ∈A V ( ξ , a ) . (1) According to our suggested solution approach, we wish to generate and solve a simplified yet analogous decision problem P s . = ( ξ s , A s , V s ) , which results in the same (or similar) action selection, but for which the solution is more computationally efficient. This can be achiev ed by altering or approximating any of the problem components – initial state, candidate actions, or objective function – in order to alleviate the calculation of the candidates’ objectiv e v alues. Nonetheless, approximating each of these components represents a different simplification approach. For example, there is a logical difference between simplifying the initial state (i.e., e xamining dif ferent states under the same objectiv e function), and simplifying the objective function (i.e., examining the same state under different objecti ves); in the first case, we would like to maintain a certain relation between states, and in the second one, a relation between functions. Next, we will introduce additional ideas to help formalize our goal, and see how these can guide us to wards designing effecti ve simplification methods, which are guaranteed to preserve the quality of solution. 2.1 Analyzing simplifications 2.1.1 Simplification loss Examining a simplified decision problem may lead to loss in the quality of solution, when the selected action is not the real optimal action. W e can e xpress this loss with the following simplification quality measure: Definition 2. The simplification loss between a deci- sion problem P . = ( ξ , A , V ) and its simplified version P s . = ( ξ s , A s , V s ) , due to sub-optimal action selection, is loss ( P , P s ) . = V ( ξ , a ∗ ) − V ( ξ , a ∗ s ) , where a ∗ = argmax a ∈A V ( ξ , a ) , a ∗ s = argmax a s ∈A s V s ( ξ s , a s ) . (2) T o put in words, this loss is the difference between the maximal objectiv e value, attained by applying the optimal candidate action a ∗ on ξ , and the v alue attained by applying a ∗ s (the action returned from the solution of the simplified solution) on ξ . This idea is illustrated in Fig. 1a. W e implicitly assume that the original objecti ve function V can accept actions from the simplified set of candidates A s . When the solutions to the problems agree loss ( P , P s ) = 0 . Most often it is indeed possible to settle for simplified decision problem formulation (which can lead to a sub- optimal action), in order to reduce the complexity of action Action 0 2 4 6 8 10 V alue Simplification Loss } (a) 1 2 3 4 5 6 7 8 9 10 Action 0 2 4 6 8 10 Value } Simplification Offset } } (b) Figure 1. P s is a simplified version of a decision prob lem P ; the graphs sho w the objectiv e values of each prob lem’ s candidate actions. (a) a ∗ s is the optimal action according to the simplified problem, and a ∗ is the real optimal action; the diff erence between the (real) objectiv e v alues of these two actions is the loss induced by the simplification. (b) The offset measures the maximal diff erence between respectiv e objective values from the tw o problems , and does not require to explicitly identify a ∗ / a ∗ s . selection; though, it is important to quantify and bound the potential loss, before applying the selected action, in order to guarantee that this solution can be relied on. 2.1.2 Simplification offset T o asses the simplification loss, we suggest to identify the simplification offset , which acts as an intuitive “distance” measure in the space of decision problems: Definition 3. The simplification offset of a candidate a ∈ A , between a decision problem P . = ( ξ , A , V ) , and its simplified version P s . = ( ξ s , A , V s ) is δ ( P , P s , a ) . = | V ( ξ , a ) − V s ( ξ s , a ) | . (3) Overall, the simplification of fset between P and P s is ∆( P , P s ) . = max a ∈A { δ ( P , P s , a ) } . (4) Elimelech and Indelman 5 Unlike the loss, the of fset (which is illustrated in Fig. 1b) measures the maximal difference between r espective objectiv e v alues from the two problems, and does not require to explicitly identify the optimal actions. Further , for each candidate a ∈ A , the offset represents an interval for the real value V ( ξ , a ) , around the respective approximated value V s ( ξ s , a ) , in which it must lie, i.e.: V s ( ξ s , a ) − δ ( a ) ≤ V ( ξ , a ) ≤ V s ( ξ s , a ) + δ ( a ) (5) Notably , the offset represents only the size of this interval, and not its location on the v alue axis (around V s ( ξ s , a ) ). This means that the of fset, in contrast to the loss, is a property of the simplification method , and does not depend on the solution of P nor P s . It can thus potentially be examined without explicitly solving either of the problems, nor calculating V nor V s , as we shall see. Note that when defining the offset, we implicitly considered that the two problems examine the same set of candidate actions; this will be valid from no w on, unless stated otherwise. Also, for brevity , we will no longer write the initial state as input to V / V s , nor V , V s as input to δ / ∆ , whenev er the context is clear . Next, we will explain how we can utilize the offset to infer loss guarantees. 2.2 Optimality guarantees 2.2.1 Bounding the offset Ob viously , knowing the offset exactly for ev ery action would be equiv alent to having access to the original solution. W e would thus usually rely on a bound of the offset to infer loss guarantees. As mentioned, the offset measures the difference between respectiv e objectiv e values from the original and simplified problems, and is independent of their solutions. Thus, we can ev aluate and attempt to bound the offset befor e solving the problem; by utilizing the general structure of problems in our domain, and knowing how they are affected by the nominativ e simplification method, we can try to infer a symbolic formula for the of fset, and draw conclusions from it. This type of analysis often allows us to draw general conclusions regarding the simplification method , rather than a specific problem. For example, in Section 3.2, we discuss a novel belief simplification method, used to reduce the cost of planning in the “belief space”. By symbolically analyzing the offset (for any decision problem in this domain), we could identify the conditions under which its value is zero, and the simplification is guaranteed to induce no loss. This idea is later demonstrated in Section 3.3.1. Still, we note that providing completely general guarantees, which are v alid for all the decision problems in the domain, is not always possible from pure symbolic analysis. Sometimes, to draw decisiv e conclusions, we must assign the properties of the specific decision problem we wish to solve. If we failed to reach v aluable conclusions from such “pre- solution” symbolic analysis of the of fset, we can try to bound it “post-solution”, by utilizing the calculated (simplified) values, and (any) kno wn bounds, or limits, for the real objectiv e v alues; these limits should be selected based on domain kno wledge of the specific problem. Then, the following can be easily derived from the definition of the simplification loss: δ ( a ) ≤ max  V s ( ξ s , a ) − LB { V ( ξ , a ) } , U B { V ( ξ , a ) } − V s ( ξ s , a )  (6) where LB , U B stand for lo wer and upper bounds, respectiv ely . W e demonstrate how to practically utilize this idea in Section 3.3.2. 2.2.2 Bounding the loss As discussed, our goal is to guarantee that relying on a certain simplification would not induce more than the acceptable loss. As with the offset, bounding the loss can be done on two occasions: (i) pre- solution analysis – this type of analysis occurs befor e solving the simplified problem (based on the availability of “symbolic” offset bounds); and (ii) post-solution analysis – which occurs after solving the simplified problem (b ut before applying the selected action). Surely , we prefer to know if the simplified solution would be worthwhile befor e in vesting in it; for example, we may consider the case where action ex ecution is costly (as measured with the objecti ve function), and beyond a certain loss, improving the decision making efficienc y is not w orth the e xecution of a sub-optimal action. Nonetheless, post-solution guarantees are typically tighter , as we can also rely on the calculated v alues. The notion of offset allow us to seamlessly deri ve both types of guarantees, and easily improv e them when refining the solution, or gi ven access to new information. From the properties of the absolute v alue, it is also easy to infer that the offset is a valid metric (a distance measure) between decision problems. Indeed, Lemma 1 intuiti vely indicates that when the offset between a problem and its simplification is small, then the induced loss is also small, and the action selection stays “similar”. Lemma 1. F or any two decision pr oblems P and P s , 0 ≤ loss ( P , P s ) ≤ 2 · ∆( P , P s ) . (7) This conclusion is potentially reachable in pre-solution analysis, as it does not rely on the simplified solution, i.e., the calculated objecti ve v alues; when these become av ailable, in post-solution analysis, this bound can be refined, as indicated in Lemma 2. Lemma 2. F or any two decision pr oblems P and P s , loss ( P , P s ) ≤ max n 0 , 2 · ∆( P , P s ) + max a 6 = a ∗ s  V s ( a )  − V s ( a ∗ s ) o . (8) For an extended discussion regarding deriv ation of loss guarantees, including a proof of Lemma 2, and more intricate loss bounding techniques, please refer to Elimelech (2021). Specifically , when we do not hav e access to a symbolic formula for the offset, and instead rely on the “post-solution offset bound” (6), the e xpression in (8) simplifies to: loss ( P , P s ) ≤ max a 6 = a ∗ s {U B { V ( a ) }} − LB { V ( a ∗ s ) } . (9) Notably , such post-solution analysis allows us to understand not only what is the maximal possible loss, but also which candidates are likely to cause it. 6 2.3 Reducing simplification bias Previously , we suggested the simplification of fset as a “distance measure” between decision problems, and recognized that it(s bound) can be used to bound the simplification loss. Ho we ver , this distance measure may be deceiving, as the problems may appear to be separated by a large offset, ev en when the simplification induces a small loss. Specifically , this can be the case when the simplification causes a large “bias” in the simplified objectiv e v alues. In the following section we introduce another concept, to help us handle such scenarios. 2.3.1 Action consistency W e point out a key observ ation: to solve the decision problem, we only need to sort (or rank) the candidate actions in terms of their objective function value; changing the v alues themselves, without changing the order of actions, does not change the action selection. Hence, when two problems maintain the same order of candidate actions, their solution is equi valent. In this case, we can simply say that the two problems are action consistent , as demonstrated in Fig. 2a. Definition 4. T wo decision problems, P 1 . = ( ξ 1 , A , V 1 ) and P 2 . = ( ξ 2 , A , V 2 ) , are action consistent , and marked P 1 ' P 2 , if the following applies ∀ a i , a j ∈ A : V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) . (10) If also V 1 ≡ V 2 , we can simply say that ξ 1 , ξ 2 are action consistent , and mark ξ 1 ' ξ 2 . This relation holds sev eral interesting properties. Lemma 3. Action consistency ( ' ) is an equivalence r elation; i.e., any thr ee decision pr oblems P 1 , P 2 , P 3 , satisfy the following pr operties: 1. Reflexivity: P 1 ' P 1 . 2. Symmetry: P 1 ' P 2 ⇐ ⇒ P 2 ' P 1 . 3. T ransitivity: P 1 ' P 2 ∧ P 2 ' P 3 = ⇒ P 1 ' P 3 . Lemma 3 implies that the entire space of decision problems is divided into separate equiv alence-classes of action consistent problems. Lemma 4 adds that we can transfer between action consistent problems using monotonically increasing functions. W e remind again that all proofs are giv en in Appendix B. Lemma 4. F or any two decision pr oblems P 1 and P 2 , P 1 ' P 2 ⇐ ⇒ the mapping f : V 1 ( ξ 1 , a ) 7→ V 2 ( ξ 2 , a ) is monotonically incr easing. (11) Meaning, if the (scalar) mapping of respectiv e objectiv e values between the two problems agrees with a monoton- ically increasing function (e.g., a constant shift, a linear transform, or a logarithmic function), then the problems are action consistent. If this mapping is not monotonically increasing, then the problems are not action consistent. 2.3.2 Unbiased simplification offset The notion of action consistency can help us to achiev e better guarantees when utilizing our pre viously de veloped analysis approach. W e now understand that when deriving loss bounds, instead of 1 2 3 4 5 6 7 8 9 10 Action 0 2 4 6 8 10 V alue Action Consistent Decision Problems (a) 1 2 3 4 5 6 7 8 9 10 Action 0 2 4 6 8 10 Value } } Reduced Bias Simplification Offset (b) Figure 2. (a) Each graph represents the objectiv e values of the candidate actions of a cer tain decision problem; although the values are diff erent, all the graphs maintain the same trend among the actions, and theref ore the problems are action consistent. (b) The simplification offset ∆ betw een P and P s is the maximal diff erence between the v alues of respectiv e actions. The offset can be reduced b y utilizing a monotonically increasing function f (here we used a constant-shift), which leads to an less biased yet action consistent prob lem P f s . examining a simplified problem P s , we can, equiv alently , examine an y other problem P f s that is action consistent with it. Further , such a problem will necessarily be of the form P f s . = ( ξ s , A , f ◦ V s ) , where f is monotonically increasing. Accordingly , instead of examining the simplification offset, as considered thus far , we can examine the unbiased simplification offset : Definition 5. The unbiased simplification offset between a decision problem P . = ( ξ , A , V ) , and its simplified version P s . = ( ξ s , A , V s ) is ∆ ∗ ( P , P s ) . = min  ∆( P , P f s ) | f : R → R is monotonically increasing ∧ P f s . = ( ξ s , A , f ◦ V s )  . (12) The unbiased offset is the minimal offset between P and any problem action consistent with P s . A demonstrativ e Elimelech and Indelman 7 example appears in Fig. 2b. Specifically , P ' P s , if and only if the unbiased offset is zero: Lemma 5. F or any two decision pr oblems P and P s , P ' P s ⇐ ⇒ ∆ ∗ ( P , P s ) = 0 . (13) Thankfully , our previous conclusions still hold, and we can use the unbiased simplification offset to bound the loss: Lemma 6. F or any two decision pr oblems P and P s , 0 ≤ loss ( P , P s ) ≤ 2 · ∆ ∗ ( P , P s ) . (14) Since ∆ ∗ ( P , P s ) ≤ ∆( P , P f s ) , for any monotonically increasing f . W e can symbolically dev elop ∆( P , P f s ) , for any such f that is con venient, in order to bound the loss; such a function should help “counter” the effect of the simplification on the objecti ve v alues. W e may also recognize that the unbiased offset satisfies the triangle inequality (like the standard of fset): Lemma 7. F or any thr ee decision pr oblems P 1 , P 2 , and P 3 , the unbiased simplification offset satisfies the triangle inequality , i.e., ∆ ∗ ( P 1 , P 2 ) + ∆ ∗ ( P 2 , P 3 ) ≥ ∆ ∗ ( P 1 , P 3 ) . (15) This property can potentially help in bounding the loss, when applying multiple simplifications. Ho we ver , unlike the standard of fset, the unbiased of fset is scaled according to the original objecti ve v alues (like the loss), and is asymmetric in its input arguments. It is, therefore, not considered a metric * . W e may also note that the notions of action consistency and simplification offset are related to the concept of “rank correlation” – a scalar statistic which measures the correlation between two ranking v ectors (see Kendall 1948). Y et, such ordinal vectors are oblivious to the cardinal objectiv e values, and, therefore, cannot be used to bound the simplification loss. The rank correlation coefficient mostly serves for statistical analysis, as its calculation requires perfect knowledge on the ranking vectors. Since the rank variables are not independent of each other, a change or addition of a single vector entry may subsequently lead to change in all other entries, and require complete recalculation of the correlation coef ficient. On the other hand, the concepts we introduced rely on a “local relation” between the problems: to check for action consistency , we only examine pairs of actions at a time; and to e v aluate the offset – only pairs of respectiv e objective values. Addition of candidates, for example, does not affect these relations between the existing candidates. As we explain next, this locality can be utilized to deriv e offset and loss bounds. 3 Decision making in the belief space In the pre vious section, we examined the concept of decision problem simplification. W e now wish to practically apply this idea to allow efficient decision making under uncertainty , which we formulate as decision making in the belief space. In this domain, the initial state of the decision problem is actually a probability distribution (“belief”), and, as to be explained, the problem is simplified by considering a sparse approximation of it. W e pro vide an appropriate sparsification algorithm, and then show that the induced loss can be bounded. First of all, we define the problem. 3.1 Problem definition 3.1.1 Belief propagation W e consider a sequential prob- abilistic process. At time-step k , an agent transitions from pose x k − 1 to pose x k , using a control u k . It then receiv es an observation of the world z k , based on its updated state. The agent’ s state vector X k . = ( x T 0 , . . . , x T k , L T k ) T consists of the series of poses, and may also include external variables, which are introduced by the observations; for example, in a full-SLAM scenario, L k can stand for the positions of maintained landmarks. Pose transition and observation are both probabilistic operations, which induce probabilistic constraints over the state variables, known as factors. Here, we assume the transition and observation models are described with the following dependencies: x k = g k ( x k − 1 , u k ) + w k , w k ∼ N (0 , W k ) , (16) z k = h k ( X k ) + v k , v k ∼ N (0 , V k ) , (17) where W k , V k are the covariance matrices of the respective normally-distributed (Gaussian) zero-mean noise models w k , v k , and g k , h k are deterministic functions. At each time-step, the agent maintains the posterior distribution over its current state vector X k , given the controls and observ ations taken until that time; this distribution, which is defined by the product of these factors, is also known as its belief : b k . = P ( X k | u 1: k , z 1: k ) ∝ k Y i =1 f u i f z i , (18) where u 1: k . = { u 1 , . . . , u k } and z 1: k . = { z 1 , . . . , z k } , and f u i , f z i are the factors matching the respecti ve controls and observations. As widely considered, by utilizing local model linearization, we may conclude that giv en the previously- defined models, the belief b k is also normally-distrib uted (for the full deriv ation see Elimelech (2021)). Hence, to describe it, we can use a cov ariance matrix Σ k , or equiv alently , its in verse, the (Fisher) information matrix Λ k : b k = N ( X ∗ k , Σ k ) ≡ N  X ∗ k , Λ − 1 k  . (19) The matrices are symmetric, and the order of their rows and columns matches the specific order of variables in the state. W e may now reason about a posterior belief b k +1 , after performing a control u k +1 and taking an observation z k +1 : b k +1 . = P ( X k +1 | u 1: k +1 , z 1: k +1 ) ∝ b k · P ( x k +1 | x k , u k +1 ) · P ( z k +1 | X k +1 ) . (20) This belief remains normally-distributed and can be described with the following information matrix: Λ k +1 = ˘ Λ k + G T k +1 W − 1 k +1 G k +1 + H T k +1 V − 1 k +1 H k +1 , (21) where the matrices G k +1 and H k +1 are the Jacobians ∇ g k +1 | X k +1 and ∇ h k +1 | X k +1 , respecti vely , around some ∗ Still, the aforementioned properties, along with the obvious non-negativity , make the unbiased offset a quasi-metric (or asymmetric metric), which induces an appropriate topology on the space of decision problems, as explained by K ¨ unzi (2001). 8 initial estimate, and ˘ Λ k is the augmented prior information matrix. Since controls and observations may introduce new v ariables to the state vector , its size at time-step k , often does not match its size at time-step k + 1 . Hence, the prior information matrix Λ k should be augmented to accommodate these new v ariables. W e use the accent ˘  to indicate augmentation of the prior information matrix (with entries of zero) to match the posterior size. Adding new variables is possible at any index in the state, as long as we make sure the augmentation keeps the same variable order . If the prior state is of size n , and we add m new vari ables to the end of it, then ˘ Λ k . =  Λ n × n k 0 n × m 0 m × n 0 m × m  . (22) The expression in (21) can be written in a more compact form, by marking the collective Jacobian J δ k +1 , which encapsulates the ne w information regarding the control and the succeeding observation: Λ k +1 = ˘ Λ k + J δ k +1 T J δ k +1 , where J δ k +1 = " W − 1 2 k +1 G k +1 V − 1 2 k +1 H k +1 # . (23) Each belief update can be described using a collective Jacobian of this form. Thanks to the additivity of the information, we can easily examine the information matrix of the posterior belief b k + T after applying a sequence of T controls u . = u k +1: k + T ; the respective collectiv e Jacobians of each control can simply be stacked to yield the collectiv e Jacobian U of the entire sequence u : Λ k + T = ˘ Λ k + T X t =1 J δ k + t T J δ k + t . = ˘ Λ k + U T U , where U . =    J δ k +1 . . . J δ k + T    . (24) 3.1.2 Decision making At time-step k , the agent performs a planning session. According to its current (prior) belief b k , it wishes to select the control sequence which minimizes the expected uncertainty in the future (posterior) belief. T o measure the uncertainty we use the dif ferential entropy , which, for a normally-distrib uted belief b of state size n , with an information matrix Λ , is H( b ) = 1 2 · ln  (2 π e ) n | Λ |  = − 1 2 · (ln | Λ | − n · ln(2 π e )) , (25) where |  | represents the determinant operation. Although other uncertainty measures with a lower computational cost exist, e.g., the trace of the covariance matrix, the entropy bests those by taking inter -variable correlations into account; those can ha ve a dramatic effect on the measured uncertainty , and are crucial for correct analysis. Thus, while utilizing the information update rule from (24), we define the following information-theoretic value or objective function, which measures the expected information gain between the current and final beliefs: ˜ V ( b k , u ) . = E Z [H( b k ) − H( b k + T )] , (26) where u is a candidate control sequence, and Z is the set of observations taken while performing this sequence. W e may also take the common as sumption of achie ving the most likely observations, around the current mean (“maximum likelihood” assumption, as examined by Platt et al. (2010)), which would allow us to drop the e xpectation from this expression. W e will also drop the augmentation mark and time index from no w on, for the sake of concise writing. Overall, from an initial belief b , and considering a giv en set of candidate control sequences U , we are interested in solving the decision problem P . = ( b, U , V ) , where V is the objectiv e function: V ( b, u ) . = 1 2 ·  ln    ˘ Λ + U T U    − ln | Λ | − m · ln(2 π e )  , (27) Λ is the information matrix of the prior belief b , U is the collectiv e Jacobian of u , and m is the number of variables added to the state when executing u (the difference between the number of columns in U and in Λ ). For clarification, we described the process as sequential to conform to the common POMDP framework; we treat ev ery planning session as a separate decision problem. Further , the “maximum likelihood” assumption is not essential, but is used to achiev e a clear discussion, where each candidate control sequence can be described with a single collectiv e Jacobian; for a generalized discussion, where this assumption is relaxed, and where we also allo w examination of candidate policies, please see Elimelech (2021). Finally , we can use the information matrix to examine the future beliefs, e ven if the state inference process is not based on such information smoother . If the initial information matrix is not provided, it can be calculated by in verting the cov ariance matrix. 3.1.3 The square root matrix An alternative way to represent the belief b k (and propagate it), is using the upper triangular square root matrix R k of the information matrix Λ k , giv en (e.g.) by calculating the Cholesky factorization: Λ k = R T k R k . (28) Like Λ k , the order of ro ws and columns of R k also matches the order of variables in the state. Prominent state-of-the-art SLAM algorithms, e.g., iSAM2 (Kaess et al. 2012), rely on this representation, as it allows the calculation of the posterior mean (state inference) to be performed incrementally , while exploiting inherent sparsity . Our belief simplification method, as described in the following section, also relies on this representation. Unfortunately , in this form, the information update losses its con venient additi vity property , and requires re-calculation (or update) of the factorization, in order to find the posterior square root matrix R k + T , such that R T k + T R k + T = Λ k + T = ˘ R k T ˘ R k + U T U , (29) where U is defined as in (24), and ˘ R k marks an appropriate augmentation of the prior root matrix: ˘ R k . =  R n × n k 0 n × m  . (30) Elimelech and Indelman 9 On the other hand, the determinant of the posterior information can be calculated in linear time – by multiplying of the diagonal elements of this triangular matrix. The objectiv e function (27) can thus be re-written as V ( b, u ) ≡ 1 2 · N X i =1 ln( R + ii ) 2 − n X i =1 ln( R ii ) 2 − m · ln(2 π e ) ! , (31) where n is the prior state size, N is the posterior state size, R + marks the posterior square root matrix, and the subscript  ij marks the matrix element in the i -th row and j -th column. As explained, using this form, the significant computational cost of calculating the objective value moves from the determinant calculation to the information update phase, though this can be performed incrementally . 3.2 Belief sparsification W e no w wish to present a simplification method for the decision problem we hav e just formalized: P . = ( b, U , V ) . W e choose keep the same objective function V , and set U of candidate actions, and focus on simplifying the initial belief b . As stated, candidate actions here are actually control sequences for the agent; we assume the collecti ve Jacobians for the set of actions are av ailable. As we saw , calculation of the objective function (as defined in (27)) in volv es calculation of the determinant of the posterior information matrix, after performing an appropriate belief update for the candidate action. The cost of this calculation depends directly on the number of non- zero elements in the matrix, and is significantly lower for sparse matrices. Thanks to the additivity of the information, sparsifying the prior information matrix Λ could potentially lead to a sparser posterior information matrix Λ + U T U , for ev ery candidate action u with collective Jacobian U ; notably , such sparsification of the prior is only calculated once, for any number of actions. W e also note that in many problems, especially in navigation problems, the collective Jacobians are inherently sparse, and as the state gro ws, in volve less variables in relation to its size. Hence, even after their addition to the sparsified prior information matrix, its sparsity shall be retained. Equiv alently , we may seek to sparsify R , the square root of Λ , which is used in (31), in order to improv e the efficiency of the factorization update process. Overall, assuming the initial belief of the decision problem is b = N ( X ∗ , Λ − 1 ) , our simplified problem shall rely instead on b s = N ( X ∗ , Λ − 1 s ) as the initial belief, where Λ s is a sparse approximation of Λ . In the following section, we present a sparsification algorithm † for the information matrix (or its square root matrix). Fig. 3 summarizes the paradigm of belief sparsification for efficient decision making in the belief space; clarification regarding its steps is to follo w . Identify unin volved v ariables Select a subset S of state v ariables to sparsify Find a sparse approximation b s of the initial belief using Algorithm 1 Pre-solution analysis Calculate the objectiv e values for all candidates using b s Select the “optimal” candidate Post-solution analysis: deriv e loss bounds, to guarantee the quality of solution Apply the selected action on the original belief b Initial belief b Updates corrsponding to each candidate action (“collectiv e Jacobians”) Figure 3. Belief sparsification for efficient decision making in the belief space. Essential steps are in dark blue; optional steps, in order to pro vide guarantees, are in light b lue. Here, candidate actions represent control sequences f or the agent. 3.2.1 The algorithm Algorithm 1 summarizes our suggested method for belief sparsification. The algorithm may receiv e as input, and return as output, a belief represented using either the information matrix, or its square root. This scalable algorithm depends on a pre-selected subset S of state v ariables, and wisely removes elements which correspond to these variables from the matrix. Approximations of different degrees can be generated using different v ariable selections S , as to be explained in Section 3.3.1. For a clear discussion, when S contains all the variables, we say this is a full sparsification ; using any other partial selection of v ariables is a partial sparsification . Fig. 4 contains a visual demonstration of the algorithm steps. In the following section (Section 3.2.2), we provide an extended probabilistic analysis of the algorithm, and explain how it can also be applied to general (non-Gaussian) beliefs; a visual demonstration of such application, where we represent the belief using a generic factor graph, is giv en in Figure 5. An example of the the algorithm output is pro vided in Figure 6. Let us break down the algorithm steps: first, we should check if the variables are ordered properly , i.e., such that the v ariables we wish to sparsify (variables in S ) appear first in the state. If not, we should reorder the v ariables accordingly . This requires appropriate modification of the input matrix. If the algorithm input is the symmetric matrix Λ (line 2), we shall simply permute its rows and columns by calculating the product P T Λ P of the information matrix with an appropriate (column) permutation matrix P . After † Algorithm 1 is a revised version of the sparsification algorithm that appeared in our previous publication (Elimelech and Indelman 2017 c ). 10 (a) (b) (c) (d) (e) (f ) Figure 4. The steps of Algorithm 1 (from left-to-r ight), for sparsification of a Gaussian belief (sho wn in Fig. 5a); the state variables are X . = [ x 1 , l 1 , l 2 , x 2 , x 3 , l 3 ] T (in that order), and the subset of variab les selected f or sparsification is S = { x 1 , l 2 , x 2 } (in green). (a) The sparsity pattern of the symmetr ic information matrix of belief. (b) Reordering the variab les, such that all the v ar iables in S appear first; this is done by simply permuting the rows and columns of the matrix. (c) Calculating the upper triangular square root matrix chol ( Λ p ) of the permuted inf or mation matr ix; each row corresponds to a state v ar iable . (d) Removing off-diagonal elements from rows corresponding to v ar iables in S . (e) After the sparsification, we may permute the variables bac k to their or iginal order directly in the square root matrix, without breaking its upper tr iangular shape. (f) Ref or ming the sparsified information matrix Λ s . = R T s R s ; note that the process aff ects the values in the matrix, and ma y also introduce new non-zeros (marked in purple). Algorithm 1: Scalable belief sparsification. Inputs: A belief b = N ( X ∗ , Λ − 1 ) , such that Λ = R T R A subset S of state v ariables to sparsify Output: A sparsified belief b s . = N ( X ∗ , Λ − 1 s ) , such that Λ s . = R T s R s // reorder the state variables such that the variables in S are first in the state vector 1 P ← an appropriate (column) permutation matrix 2 if the algorithm input is Λ then 3 Λ p ← P T Λ P 4 R p ← chol ( Λ p ) 5 else if the algorithm input is R then 6 R p ← modify R to conv ey appropriate variable reordering (see remark in the main text) 7 R p s ← zero off-diagonal elements from R p in rows matching variables in S // sparsify R p 8 R s ← P R p s P T // return to the original variable order 9 if the algorithm output is Λ then 10 Λ s ← R T s R s // reform the information matrix this permutation, we can deri ve R p , the square root matrix of the permuted information matrix, using the Cholesky decomposition (line 4). If the algorithm input is the matrix R (line 5), the task of variable reordering is not trivial, as trying to modify R by permuting its rows and columns would break its triangular shape. Instead, this task (typically) requires re- factorization of Λ under the new v ariable order . Remark In our follo w-up work (Elimelech and Indelman 2021), we pro vide an efficient modification algorithm for R , which is intended for the task of v ariable reordering, and can spare the matrix re-factorization; we can use this algorithm to efficiently deri ve R p (line 6). If no reordering is required, and the algorithm input is Λ , we may directly calculate the Cholesky decomposition (line 4); if no reordering is required, and the input is R , we may skip directly to line 7. Specifically , when all of S is already at the beginning of the state, no reordering is needed. This situation particularly occurs when sparsifying all the v ariables (i.e., full sparsification). Next, in line 7, we zero off-diagonal elements in the permuted square root matrix R p , in rows corresponding to v ariables in S , to yield the sparsified square root matrix R p s . Since the prior belief should be updated according to the predicted hypotheses, the variable order in the sparsified information matrix (or its square root) must match the v ariable order in the collecti ve Jacobians. Thus, we should reorder the variables back to their original order (line 8). Though, we notice that after the sparsification this permutation can be performed on the square root matrix dir ectly , without resorting to the information matrix, and without breaking its triangular shape, by calculating P R p s P T (note the rev erse multiplication order). This claim is formalized in Corollary 1 (and prov ed in Appendix B). Corollary 1. After sparsification of the squar e root matrix (line 7 of Algorithm 1), permutation of the variables back to their original or der can be performed on the square r oot matrix dir ectly , without breaking its triangular shape . Finally , we may return the sparsified belief, represented either with R s or Λ s . In the latter case, this requires to (easily) reconstruct the sparsified information matrix from its sparsified root (line 10). After the sparsification, the v alue of the non-zero (NZ) entries in the sparsified information matrix may be different than the corresponding entries in the original matrix (including the diagonal), and new NZs may be added in compensation for the removed entries (factors). Also, note that the permutation of variables back to their original order can potentially be skipped, by equiv alently permuting the columns of all the candidate collectiv e Jacobians, to match the altered order . The deriv ation of R p (in line 4 or line 6), when conducted, is the costliest step of the algorithm, which defines its maximal computational complexity; we may recall that the complexity of the Cholesky decomposition is O ( n 3 ) , at worst, where n is the state size (H ¨ ammerlin and Hoffmann 2012). In comparison, the computational cost of Elimelech and Indelman 11 the remaining steps, i.e., matrix permutation (lines 3 and 8), remov al of matrix elements (line 7), and reconstruction of the information matrix (line 10), is usually minor . Still, it should be noted that depending on the configuration, many of the steps are often not necessary . For example, as mentioned, when the input matrix is already in the desired order, the permutations can be skipped; this is specifically correct in full sparsification. In that case, if gi ven the square root matrix as input, the algorithm holds an almost negligible complexity – we only need to extract the matrix’ diagonal. Also, in full sparsification, the sparsified information matrix, if required, can be reconstructed from its root in linear complexity , as both R s and Λ s are diagonal. Nonetheless, we remind that the approach is meant to ov erall reduce the decision making time, as the time spent on performing the sparsification (performed once) is lower than the time saved in performing (the multiple) belief updates. F or example, since full sparsification leads to a diagonal approximation (information or its root), considering the collecti ve Jacobians are sparse, belief updates can be performed with an almost linear complexity . Also, since the cost of sparsification does not depend on the number of candidates or hypotheses, as this number grows, the relative “in vestment” in calculating the sparsification becomes less significant. 3.2.2 Probabilistic analysis Let us analyze the suggested sparsification algorithm from a wider perspective, using probabilistic graphical models. As explained, the belief b (18) is constructed as a product of factors – probabilistic constraints between variables, e.g., those induced by observ ations or constraints between poses. A belief can be graphically represented with a factor graph – where variable nodes are connected with edges to the factor nodes in which they are in volv ed. In Fig. 5a, we can see an ex emplary factor graph, which represents a belief b with six variables and eight f actors: b ( X ) ∝ f x 1 · f x 1 l 1 · f x 1 l 2 · f x 1 x 2 · f x 2 l 1 · f x 2 l 2 · f x 2 x 3 · f x 3 l 3 , (32) where the state X . = [ x 1 , l 1 , l 2 , x 2 , x 3 , l 3 ] T contains three poses and three landmarks, and f ij is a factor between i and j . As explained, in the linear(ized) Gaussian system, the belief b is described with the information matrix Λ , as sho wn in Fig. 4a. Off-diagonal non-zero entries in the information matrix Λ indicate the existence of factors between the corresponding variables. The belief b can be factorized to a product of conditional probability distributions, in a process kno wn as “variable elimination” (see Davis 2006): b ∝ n − 1 Y i =1 P ( X i | d ( X i )) · P ( X n ) , (33) where d ( X i ) denotes the set of v ariables X i is conditionally dependent on – a subset of the variables which follow X i according to the v ariable (elimination) order . Practically , fixing the variable order in the state sets the decomposition of the belief. Thus, according to Algorithm 1, we begin the sparsification process by reordering the state variables, such that all variables in S appear first in the state. This step requires us to permute the information matrix accordingly (as sho wn in Fig. 4b); here, we chose S = { x 1 , l 2 , x 2 } . Note that variables can be conditionally dependent even if there is no factor between them. By starting the elimination with the variables in S , we force conditional separation of the variables for sparsification and the remaining v ariables, i.e., b ∝ P ( S | ¬S ) · P ( ¬S ) . (34) This means that the no variable in ¬S is conditionally dependent on a variable in S . The factorization of the belief to a product of conditional probabilities can be graphically represented with a Bayesian network (“Bayes net”), as shown in Fig. 5c. In this directed graph, the existence of an edge from node i to j indicates that i ∈ d ( j ) . As established by Dellaert and Kaess (2006), this f actorization is equiv alent to the factorization of the (permuted) information matrix Λ p to its upper triangular square root R p (Fig. 4c). The conditional probability distribution of the i -th v ariable corresponds to the respective row of R p . Off-diagonal entries in that row represent the conditional dependencies: if the off diagonal entry R p ij is non-zero, then X j is in d ( X i ) , and X j is a parent of X i in the Bayes net; specifically , if all elements on the i -th row , besides the diagonal entry , are zero, then X i is not conditionally dependent on any variable (according to the elimination order), and has no parents in the Bayes net. For more details, see Dellaert and Kaess (2017). According to the next step in the algorithm, we shall now zero off-diagonal entries in R p , in the rows which correspond to variables in S (Fig. 4d); equi v alently , this process can be seen as removing edges from the Bayes net (Fig. 5d). By removing all the off-diagonal entries from the i -th row , we replace the conditional probability distribution P ( X i | d ( X i )) = N  µ ( d ( X i )) , ( R p ii T R p ii ) − 1  (35) with an independent probability distribution o ver X i , P s ( X i ) . = N  µ i , ( R p ii T R p ii ) − 1  . (36) Essentially , we fix the mean of X i to a constant v alue, which is no longer dependent on other variables. W e, of course, would like to preserve the mean of the overall belief, and therefore shall select µ i = X ∗ i . It should be mentioned that this probability distribution is not the marginal distribution ov er X i , which is giv en as N ( X ∗ i , Σ ii ) . The sparisified belief is thus giv en as the product b s ∝ Y x ∈S P s ( x ) · P ( ¬S ) . (37) The chosen elimination order makes sure that the inner dependencies among the non-sparsified variables remain exact. Notably , the suggested sparsification is performed by manipulating the square root matrix, which is equiv alent to manipulating the Bayes net. In contrast, traditional belief sparsification methods (as we reviewed) perform sparsification on Λ directly , or equiv alently , the factor graph. Still, we would like to understand what the factor- decomposition, which corresponds to the sparsified belief, is. 12 (a) F actor Graph (c) Bayes Net (e) F actor Graph (d) Bayes Net (b) (Partial) V ariable Elimination Figure 5. Visualizing the steps of Algorithm 1 (from left-to-r ight), for sparsification of a belief with probabilistic g raphical models. (a) The f actor graph of the prior belief b (matching Fig. 4a); the state variables are X . = [ x 1 , l 1 , l 2 , x 2 , x 3 , l 3 ] T , and the subset of variab les selected f or sparsification are S = { x 1 , l 2 , x 2 } (circled in green). (b) Eliminating the variables in the f actor graph in order to derive the corresponding Ba yes net; the figure describes an inter mediate step of the elimination process, after eliminating the variab les in S : x 1 , l 2 , x 2 (in this order); note the added marginal f actor (in pur ple). (c) The final Ba yes net of b , after eliminating all the variab les. (d) Remo ving all edges which lead to variab les in S (green arrows); this is the Ba yes net describing the sparsified belief b s . (e) Ref orming the factor graph of the sparsified belief b s ; variab les in S are now independent, and each is connected to a modified prior factor (in green); the remaining v ar iables are inter-connected with the same f actors which connected them originally (in blac k), alongside the marginal factors, which w ere added after elimination of S (in pur ple). Let us look again at the exemplary belief, gi ven in (32). W e begin its factorization (after the initial reordering) by eliminating the variables in S (in order). First, x 1 : b ∝ P ( x 1 | x 2 , l 1 , l 2 ) · f 0 x 2 l 1 l 2 · f x 2 l 1 · f x 2 l 2 · f x 2 x 3 · f x 3 l 3 . (38) Then, l 2 : b ∝ P ( x 1 | x 2 , l 1 , l 2 ) · P ( l 2 | x 2 , l 1 ) · f 0 x 2 ,l 1 · f x 2 l 1 · f x 2 x 3 · f x 3 l 3 . (39) Finally , x 2 : b ∝ P ( x 1 | x 2 , l 1 , l 2 ) · P ( l 2 | x 2 , l 1 ) · P ( x 2 | l 1 , x 3 ) · f 0 x 3 l 1 · f x 3 l 3 . (40) This partial elimination is visualized in Fig. 5b . As we can see, after elimination of variables, new “marginal” factors ( f 0 x 2 l 1 l 2 , f 0 x 2 ,l 1 , f 0 x 3 l 1 ) may be introduced to the belief, representing ne w links among the non-eliminated v ariables; in our case, after eliminating all the sparsified variables, one marginal f actor still remains: f 0 x 3 l 1 . According to the previous analysis, in the sparsification, each of the conditional distributions on the sparsified variables is replaced with an independent distrib ution. These are, in fact, unitary factors over the variables; here, we mark those as f 00 x 1 , f 00 l 1 , f 00 x 2 . The sparsified belief can thus be giv en as a product of these unitary factors on the sparsified variables, the marginal factors introduced after eliminating these variables, and the remaining non-eliminated factors (here, f x 3 l 3 ). Overall, in our e xample, this product is: b s ∝ f 00 x 1 · f 00 l 1 · f 00 x 2 · f 0 x 3 l 1 · f x 3 l 3 (41) The factor graph matching this belief is shown in Fig. 5e. It is clear that the sparsification does not affect the elimination of the remaining variables (v ariables in ¬S ). Continuing the elimination process from either b (40) or b s (41) would result in the same distrib ution P ( ¬S ) . T o complete the analysis, we shall note that this sparsification method does not change the diagonal entries in the information root matrix, and, thus, the determinants of Λ and Λ s remain the same: | Λ | = | Λ p | =    R p T R p    = | R p | 2 = n Y i ( R p ii ) 2 = | R p s | 2 =    R p s T R p s    = | Λ p s | = | Λ s | . (42) Hence, the sparsification method preserves the ov erall entropy of the belief (as defined in (25)), no matter which variables are sparsified. This is usually not guaranteed in the aforementioned traditional sparsification methods. Still, when incorporating new factors in the future, di vergence in entropy between the original and sparsified beliefs (i.e., simplification offset) might indeed happen. This offset depends on the variables selected for sparsification, and can even be zero, as we shall discuss next. Since the sparsified variables become independent, if we wish to update our estimation after applying new actions, or after acquiring a ne w observation of an e xisting variable (i.e., loop closure), information would no longer propagate from a sparsified variable to another variable, or vice-versa, unless they are observed together . Though, notably , unlike simply marginalizing the sparsified v ariables out of state, as done in filtering, they can still be updated in the future. Figure 6. A square root matrix (taken from our experimental e valuation) and its sparse appro ximations generated with Algorithm 1, for diff erent variable selections S . On the left – the original matr ix; in the center – the matrix after par tial sparsification, of only the uninv olved v ariables (here, about half of the v ariables); on the right – the matrix after full sparsification. The matrices on the left and in the center are guaranteed to be action consistent. Full sparsification results in a conv enient diagonal appro ximation of the inf ormation. For all degrees of sparsification, the determinant of the matr ix remains the same. Elimelech and Indelman 13 3.3 Optimality guarantees 3.3.1 V ariable selection and pre-solution guarantees Next, we shall present the conclusions of our symbolic analysis of the suggested simplification method (as explained in Section 2.2). In this e valuation, we utilized our kno wledge on the decision problem formulation, and on Algorithm 1, in order to deriv e general guarantees for the simplification loss. More specifically , we shall explain which v ariables should be sparsified, such that the ef fect on the objecti ve v alue for each candidate action (i.e., the simplification offset) is minimal. Considering a specific action, a state variable is in volved if applying the action adds a constraint (factor) on it; i.e., if g or h , which define the relev ant transition and observation models (which are defined in (16) and (17)), are af fected by this variable. Practically , in the collectiv e Jacobian of an action, each of the columns corresponds to a state v ariable, and ev ery ro w represents a constraint; a variable is in volv ed if at least one of the entries in its matching column is non-zero; unin volved variables correspond to columns of zeros. For example, in a navigation scenario, the landmarks we predict to observe by taking the action (along with the current pose) are in volved; variables referring to landmarks from the past, which we do not predict to observe, are unin volv ed. An illustration of this example is giv en in Fig. 7. W e emphasize that since this is a planning problem, the collectiv e Jacobians, the objective v alues, and the in volved variables are determined based on our prediction for the outcome of each action. Further, these components can only be based on our current belief, and not the ground truth, as it is unknown. Thus, although a landmark we identified as unin volved, might be observed when applying the action (e.g., if the initial belief was distant from the ground truth), this is not a concern in the planning context. As explained, in our formulation, the objecti ve function (27) relies on the “most likely” observation. In other words, we consider only the single “most likely” outcome for each action. Theoretically , we can consider multiple probabilistic outcomes for each action, each determining its own set of in volved v ariables; as mentioned, this generalized discussion is brought by Elimelech (2021). W e claim that for any given action, sparsifying the unin volved variables from the prior belief b , before computing the posterior belief, would not affect the posterior entropy (which defines our objecti ve function V ). Hence, for a set of candidate actions U , we can sparsify from the prior belief all variables which are unin volv ed in any of the actions, and use this sparsified belief b s to compute the objectiv e function, without affecting its values. Specifically , this means that the simplification of fset is zero, and that this sparsified belief is action consistent with the original one: b ' b s . This claim is formally expressed in Theorem 1. A proof for this claim is giv en in Appendix B. Theorem 1. Consider a decision pr oblem P . = ( b, U , V ) , wher e b is a (Gaussian) initial belief, and V is the objective function fr om (27). Considering a set S of state variables, which ar e unin volved in any of candidates in U , Algorithm 1 returns a belief b s , such that ∆( P , P s ) = 0 , wher e P s . = ( b s , U , V ) . In principle, only a single sparsification process is conducted for each decision problem (i.e., planning session), regardless of the number of candidate actions. Selecting variables which are unin volv ed in any of the candidate actions allo ws to keep action consistency considering the entire set of candidates. Still, it is possible to break the set of actions to several subsets of similar actions, and consider the unin volved variables in each subset. For each subset we would create a custom prior approximation, and then select the best candidate in each of the subsets, before finding the ov erall best candidate among those. This can result in a more adapted sparsification for each subset. Y et, calculation of the sparsification itself has a cost, which needs to be considered when trying to achie ve the best performance. Here we examine the most general case – treating the set of actions as a whole. Remark W e note that if we consider (1) sparsification of only unin volved v ariables; (2) the output of Algorithm 1 to be the square root matrix; and (3) no requirement to maintain the original variable order after the sparsification Figure 7. A factor g raph representing the belief of an agent in an e x emplar y full-SLAM scenario . The current (prior) state consists of three poses x 1 , x 2 , x 3 (blue nodes), and the position of three landmarks l 1 , l 2 , l 3 (yello w nodes), which were pre viously obser v ed. F actors (black nodes) betw een poses mark motion constraints, and f actors between a pose and a landmark mar k observation constraints. At time of planning, the agent is at pose x 3 , and wishes to inf er which of the candidate paths U = { left , r ight } is the optimal one. If taking the right path, the agent predicts augmenting its state with two ne w poses x right 4 , x right 5 , with motion constraints connecting them to the current pose; based on its current state estimation, it also predicts obser ving landmark l 3 from x right 4 (i.e., adding an obser v ation constraint between l 3 and the new pose). The variab les (from the prior state) inv olved with this action are those directly connected to any of the predicted ne w f actors – x 3 , l 3 . If taking the left path, the agent predicts augmenting its state with two ne w poses x left 4 , x left 5 , and obser ving landmark l 1 from x left 4 . The variab les inv olved with this action are x 3 , l 1 . The inv olved v ariables (in any of the actions) are marked with b lack outline. Note that x 1 , x 2 , l 2 are ne ver in volv ed; these are marked with dark green outline. Theorem 1 suggests that the uninv olved v ariables can be sparsified from the prior belief (via Algorithm 1), while maintaining action consistency . 14 (by instead, reordering the collectiv e Jacobians); then, there is no need to actually zero entries in the rows of the “sparsified” v ariables. The initial reordering is suf ficient to make sure that these rows would not be updated when (incrementally) incorporating new constraints. An in-depth look at this variation was examined in our follow-up work (see Elimelech and Indelman 2019). W e proved that sparsifying uninv olved variables does not affect the objective function values, and, therefore, they should always be included in the set S of variables for sparsification. It is possible to sparsify also in volved variables, but then “zero offset” and action consistency are not guaranteed. Intuitiv ely , selecting more in volved variables to S results in a sparser approximation, but potentially a larger div ergence from the original objectiv e values. In Appendix A.1, we show that under additional restrictions, we can symbolically deri ve offset (and loss) bounds also when sparsifying in volved variables; these bounds are only applicable for “rank 1” updates, i.e., when the collecti ve Jacobians are limited to a single row . 3.3.2 P ost-solution guarantees For a more general scenario, when sparsifying in volved variables, and with actions possibly having multi-row collectiv e Jacobians, we can try to bound the loss by performing post-solution analysis, as discussed in Section 2.2. Unlike before, such guarantees are derived after solving the simplified problem (but before applying the selected action). As explained, we can utilize the calculated (simplified) objectiv e values, and domain-specific lower and upper bounds of the objecti ve function ( LB , U B , respectiv ely), to yield of fset bounds (6); from these of fset bounds, we can then easily deriv e loss bounds (9). As our decision problem domain relies on beliefs, which, as we saw , can be represented with a (factor) graph, we can potentially exploit topological aspects to deri ve the desired objectiv e bounds. For example, we can utilize conclusions from a recent work by Kitanov and Indelman (2019), which extends a previous work by Khosoussi et al. (2018). There, the following bounds on the information gain were proved, for when the corresponding factor graph contains only the agent’ s poses, and each pose consists of the position and the orientation of the agent (i.e., pose-SLAM): LB top { V ( b, u ) } . = 3 · ln t ( b, u ) + µ + H( b ) , U B top { V ( b, u ) } . = LB top { V ( b, u ) } + n X i =2 ln( d i + Ψ) − ln    ˜ L    , (43) (44) where t ( b, u ) stands for the number of spanning trees in the factor graph of the posterior belief ( b after applying u ); n marks the graph size; ˜ L is the reduced Laplacian matrix of the graph; and d i ’ s are the node degrees corresponding to ˜ L . They also assume that the factors between the poses are described with a constant diagonal noise cov ariance; µ and Ψ are constants which depend on this noise model, and the posterior graph size (i.e., the length of the action sequence). In their demonstration, they show that when the ratio between the angular variance and the position variance is small, these bounds are empirically tight. This case can happen, for example, when a navigation agent is equipped with a compass, which reduces the angular noise. F or a detailed deriv ation of these bounds please refer to Kitanov and Indelman (2019). For different problem domains, it is possible to use various other objecti ve bounds in a similar manner . For example, in Appendix A.2, we present additional bounds, which exploit known determinant inequalities. These make no assumptions on the state structure, and are potentially useful when the matrix Λ is diagonally dominant. 4 Experimental results 4.1 The scenario T o demonstrate the adv antages of the approach, we applied it to the solution of a highly realistic active-SLAM problem. In this scenario, a robotic agent na vigates through a list of goals in an unknown indoor en vironment. W e used the Gazebo simulation engine (K oenig and How ard 2004) to simulate the en vironment and the robot – Pioneer 3-A T , which is a standard ground robot used in academic research worldwide. The robot is equipped with a lidar sensor, Hokuyo UST -10LX. These components can be seen in Fig. 8. Despite examining a 2D navigation scenario, our method does not impose any restrictions on the pose size nor on the state structure. Figure 8. A Pioneer 3-A T robot in the simulated indoor environment. The robot is equipped with a lidar sensor , Hokuyo UST -10LX, as visible on top of it. W e used the pose-SLAM paradigm, meaning, the agent’ s state X k . = ( x T 0 , . . . , x T k ) T consists only of poses k ept along its entire trajectory . Each of these poses consists of three variables, representing the position and orientation. Our approach is highly relev ant in this case, in which the state size grows quickly as the navig ation progresses, making the planning more computationally-challenging. The belief over the state is represented as a factor graph, and implemented using the GTSAM C++ library (Dellaert 2012). When adding a new pose to the graph, the sensor scans the en vironment in a range of 30 meters, and provides a point-cloud of it. This point-cloud is then matched to scans taken in previous poses using ICP matching (Besl and McKay 1992). If a match is found, a loop-closure factor (constraint) is added between these poses. T o keep the computation cost of the scan matching feasible, and to a void creating redundant constraints, we make sure to compare the current pose only to key poses within a certain range of (estimated) distances from it. T ransition (motion) constraints are also created Elimelech and Indelman 15 between ev ery two consecutive poses. Both the observation and motion contain some Gaussian noise, which matches the real hardware’ s specs. Robot Operating System (R OS) is used to run and coordinate the system components – state inference, decision making, sensing, etc. The full indoor map is unknown to the robot, and it is incrementally approximated by it using the scans during the navigation. W e do, howe ver , rely on the full and exact map to produce collison-free candidate trajectories. W e use the Probabilistic RoadMap (PRM) algorithm (Ka vraki et al. 1996) to sample that map, and then use the K-div erse-paths algorithm (V oss et al. 2015) to build a set U of trajectories to the current goal. This usage of the map is irrele vant to the demonstration of our method; in our formulation, we consider the candidate actions are giv en. The complete indoor map is sho wn in Fig. 9, with the sampled PRM graph on it. Each trajectory matches, of course, a certain control sequence, and is translated to a series of factors and constraints to be added to the prior factor graph. Loop closure constraints are added between poses in the new trajectory , and poses in the previously-e xecuted trajectory , according to their estimated location (i.e., where we expect to add them when executing this trajectory). The corresponding collective Jacobians of the candidate trajectories are constructed as explained in Section 3.1. Since all trajectories lead to the goal, we only wish to optimize the “safety” of taking the path. Meaning, keeping the uncertainty of the state low , by preferring a more informativ e trajectory . W e use the aforementioned objectiv e function V (from (31)) to compare between candidates. Under the “maximum likelihood” assumption, our method is only relev ant to the computation of this information-theoretic measure, so for a more con venient discussion, we do not consider other objecti ves, such as the length of the trajectory . T o cover its list of goals, the robot executes sev eral planning sessions. In each session, the robot is provided with one goal, generates a set of candidate trajectories U to it, and selects the best candidate by solving a decision problem. The robot completes ex ecuting the entire selected trajectory before starting a new planning session to the next Figure 9. The entire indoor environment from a top vie w . W alls are colored in light blue . The PRM graph, from which trajectories are built, is colored in red and green. Each square on the map represents a 1m × 1m square in reality . goal. T o ev aluate our method, in each planning session, we solved three decision problems, with each problem using another version of the initial belief. The robot’ s original initial belief accounts for the trajectory of poses executed up to that point (the entire inferred state). The other two versions are generated by sparsifying the original belief using Algorithm 1 – one with partial sparsification, and one with full sparsification. Overall, in each session, the three configurations of the decision problem are as follows: 1. P = ( b, U , V ) – the original decision problem; 2. P in volved = ( b in volved , U , V ) – with sparsification of the unin volved variables – an action consistent problem. W e remind again that unin volv ed v ariables correspond to columns of zeros in the collecti ve Jacobians of all candidate actions, as explained in Section 3.3.1. 3. P diagonal = ( b diagonal , U , V ) – with sparsification of all variables, leading to a diagonal information matrix, b ut not necessarily action consistent. For each configuration, we measured the objecti ve function calculation time for each candidate action, along with the one-time calculation of the sparsification itself for the latter two. On the whole, in each planning session, we measure the total decision making time for each of the three configurations. For a fair comparison of the problems, the objectiv e function calculation was detached from the factor graph-based implementation of the belief. From GTSAM, we extracted the square root matrix of the initial belief, and the collecti ve Jacobians corresponding to (the factors added by) each candidate trajectory . Then, using Algorithm 1, we created the two additional versions of the prior matrix, as detailed before. For each of the three decision problems, i.e., using each version of the prior square root matrix, we calculated the corresponding posterior square root matrix (via QR update); as e xplained in Section 3.1.3, we could then easily extract the determinant of these triangular matrices, to calculate the objectiv e values. At the end of each session, we applied the action selected by configuration 1. Of course, in a real application we would only solve the problem using a single configuration; here we present a comparison of the results for dif ferent configurations. W e also did not in vest in smart selection of variables for sparsification, as e ven full sparsification achiev ed very accurate results. 4.2 Results In the following section we present and analyze the results from a sequence of six planning sessions. Of course, these sessions took place after the robot had already executed a certain trajectory in the en vironment, in order to b uild a state in a substantial size, and a map; if the prior state is empty , examining its sparsification is vain. Figs. 10-15 showcase a summary of each of the planning sessions, and contain sev eral components: (a) A screenshot of the scenario, which includes: the map estimation (blue occupancy grid); the current estimated position (yellow arrow-head) and goal (yellow circle); the trajectory taken up to that point (thin green line); the candidate trajectories from the current position to the goal (thick lines in v arious colors); and the selected trajectory (highlighted in bright green). 16 (b) A comparison of the objectiv e function values of the candidate actions (i.e., trajectories), considering each of the versions of the initial belief: P with the original belief in red; P in volved with sparsification of the unin volved variables in blue; and P diagonal with sparsification of all the variables in green. For scale, the comparison also contains the prior differential entropy , before applying any action. This “prior value” is not affected by the sparsification, and is the same for the three configurations (see (42)). (c) A comparison of the the solution time for the three decision problems. Again, P in red, P in volved in blue, and P diagonal in green. The highlighted parts of the blue and green bars mark the cost of the sparsification itself out of the total solution time. (d) A comparison of the three versions of the triangular square root matrix. The figures indicate non-zero entries in each matrix, i.e., their sparsity pattern. (e) The sparsity pattern of the collectiv e Jacobians of the examined trajectories. Again, uninv olved variables are identified by having columns of zero in all the Jacobians. For the first and last sessions we provide an in-depth inspection, including all the components. Since the structure of the belief and Jacobians in all the sessions is similar , for the intermediate sessions we only present a summarized version, with components (a)-(c). The square root matrix and its approximations, gi ven previously in Fig. 6, are extracted from the third session. Additionally , the numerical data shown in the figures is summarized in T able 1. Further data regarding the loss is later gi ven in T able 2. 4.2.1 Efficiency As expected, the sparsification leads to a significant reduction in decision making time. The simplified problem P diagonal consistently achiev es the best performance, followed by P in volved , while both are vastly more efficient than the original problem P . Surely , a higher degree of sparsification ( S containing more variables) leads to a greater improvement in computation time. As discussed in Section 3.2.1, full sparsification of the square root matrix has a particularly low cost – we only need to extract its diagonal. From T able 1 and the run-time comparison bar diagrams, it is clear that the cost of a partial sparsification is also minor in relation to the entire decision making. In some of the diagrams, the highlighted section of the bar , which stands for the cost of the sparsification, is hardly visible. Also, since the sparsification cost does not depend on the number of candidate actions, the larger the set of actions is, the less significant the sparsifcation should become. W e see a correlation between the ratio of uninv olved variables and the reduction in run time with P inv olved . V ariables corresponding to the executed trajectory become in volved when a loop closure f actor is created between them and a candidate trajectory . Hence, the ratio of unin volv ed variables represents the overlap of the candidate trajectories with the previously executed trajectory . In the first session, the e xecuted trajectory is short, resulting in a relativ ely small state size, and sparse root matrix, since not many loop closures were formed. As the sessions progress, the prior matrix becomes larger and denser , due to new loop closures, as apparent in the sixth session. In principle, we also notice a correlation between the state size and relativ e improvement in performance, for both sparsification configurations. Updating the square root factorization, in order to calculate the posterior determinant, has, at worst, cubical complexity in relation to the matrix size. An update to a variable at the beginning of the state (i.e., a loop closure) may force us to recalculate the entire factorization, baring this maximal computational cost. Sparsification of variables reduces the number of elements to update, and thus should be more beneficial when handling larger and denser beliefs. 4.2.2 Accuracy Alongside the undeniable improv ement in efficienc y , we can also examine the quality of the selected action. According to Theorem 1, not only P and P in volved are action consistent, but they produce exactly the same objectiv e values. Hence, solving P in volved always leads to the optimal action selection, and induces no loss. P diagonal is not always action consistent with the original problem, and maintaining the same action selection is not guaranteed; howe ver , it is e vident from Figs. 10-15 that ev en when sparsifying all the variables, the quality of solution is maintained. Not only does the graphs of P and P diagonal maintain a very similar trend, which practically leads to the same action selection, and zero loss, but also the difference (offset) between them is slim. This is also evident by examining the Pearson rank correlation coefficient ρ (which we mentioned in Section. 2.1) between the solutions of the original and simplified decision problems. A v alue of ρ = 1 represents perfect correlation of the candidate rankings (i.e., action consistency), and ρ = − 1 represents exactly opposite rankings. Clearly , the calculated v alues, presented in T able 2, indicate that P diagonal indeed resulted in an action consistent solution (or very close to it). W e emphasize again, that regardless of the selected action, the inference of the next state remains unchanged, as it is done on the original belief. T able 1. Numer ical summar y for all sessions . “Uninvolv ed var . ratio” represents the percentage of uninv olved v ar iables in the prior state. “Run-time” represents the reduction in decision making time in the specified configuration, in comparison to the original problem. “Non z eros” represents the reduction in the number of non-z ero entries in the pr ior square root matrix, after using the sparsification. “Sparsification time” represents the cost of this one-time calculation, out of the entire problem run-time. Session Prior Size P inv olved P diagonal Unin volved var . ratio Run-time Sparsifica- tion time Non zeros Run-time Sparsifica- tion time Non zeros 1 567 46% -23% 3% -76% -55% 1% -97% 2 762 74% -34% 4% -77% -67% 1% -98% 3 1182 60% -66% 1% -83% -85% 1% -99% 4 1269 69% -70% 2% -86% -86% 2% -99% 5 1341 65% -67% 2% -84% -82% 2% -99% 6 1392 44% -52% < 1% -61% -80% < 1% -99% Elimelech and Indelman 17 (a) A screenshot of the scenario, which includes: the map estimation (blue occupancy grid); the current estimated position (y ellow arrow-head) and goal (y ellow circle); the trajectory taken up to that point (thin green line); the candidate trajectories from the current position to the goal (thick lines in v ar ious colors); and the selected trajectory (highlighted in bright green). Candidate Trajectory 800 850 900 950 1000 1050 Objective Value (b) Objective function comparison. 0 0.2 0.4 Run-time (c) Run-time. (d) Original prior information root matr ix and its sparse approximations . 1 5 10 7 2 8 4 3 12 6 17 20 9 19 13 15 11 18 16 14 (e) Collective J acobians of the candidate trajectories. Figure 10. Results summar y for planning session #1. (a) The scenario. Candidate Trajectory 1240 1260 1280 1300 1320 1340 1360 1380 Objective Value (b) Objective function comparison. 0 0.2 0.4 Run-time (c) Run-time. Figure 11. Results summar y for planning session #2 (a) The scenario. Candidate Trajectory 2100 2150 2200 2250 2300 2350 Objective Value (b) Objective function comparison. 0 1.0 2.0 3.0 Run-time (c) Run-time. Figure 12. Results summar y for planning session #3 18 (a) The scenario. Candidate Trajectory 2250 2300 2350 2400 2450 2500 Objective Value (b) Objective function comparison. 0 1.0 2.0 3.0 Run-time (c) Run-time. Figure 13. Results summar y for planning session #4 (a) The scenario. Candidate Trajectory 2400 2450 2500 2550 2600 2650 2700 Objective Value (b) Objective function comparison. 0 1.0 2.0 3.0 Run-time (c) Run-time. Figure 14. Results summar y for planning session #5 (a) A screenshot of the scenario, which includes: the map estimation (blue occupancy grid); the current estimated position (y ellow arrow-head) and goal (y ellow circle); the trajectory taken up to that point (thin green line); the candidate trajectories from the current position to the goal (thick lines in v ar ious colors); and the selected trajectory (highlighted in bright green). Candidate Trajectory 2500 2550 2600 2650 2700 2750 2800 2850 2900 Objective Value (b) Objective function comparison. 0 5.0 10.0 15.0 Run-time (c) Run-time. (d) Original prior information root matr ix and its sparse approximations . 2 1 3 7 4 8 6 18 13 10 5 16 14 17 9 11 15 12 20 19 (e) Collective J acobians of the candidate trajectories. Figure 15. Results summar y for planning session #6. Elimelech and Indelman 19 T able 2. The loss induced by the two simplified configurations, alongside the bounds on the loss (of the diagonal configur ation), f or diff erent noise models. The specified ratio f or each bound represents the ratio between the angular v ar iance and the position variance. No bound is calculated f or the other configuration, since it is guaranteed to induce no loss. The loss and its bounds are brought as a percentage of the maximal appro ximated value in that session. Also sho wn is Pearson r ank correlation coefficient ρ . Session ρ ( P , P inv olved ) ρ ( P , P diagonal ) loss ( P , P inv olved ) loss ( P , P diagonal ) loss ( P , P diagonal ) bound – 0.01:1 loss ( P , P diagonal ) bound – 0.25:1 loss ( P , P diagonal ) bound – 0.85:1 1 1 0.99 0% 0% 2% 16% 46% 2 1 1 0% 0% 2% 16% 47% 3 1 1 0% 0% 1% 13% 39% 4 1 0.99 0% 0% 1% 15% 43% 5 1 1 0% 0% 1% 16% 43% 6 1 0.99 0% 0% 1% 15% 41% 4.2.3 Guarantees Throughout the experiment, it w as possible to guarantee the quality-of-solution for P diagonal , by bounding loss ( P , P diagonal ) in post-solution evaluation – after solving each (simplified) planning session, and before applying the selected action. Obviously no bound should be calculated for P in volved , since the loss was guaranteed to be zero in our pre-solution “of fline” e v aluation. As explained in Section 2.2, (9) provides a formula for the loss bound, given the solution of the simplified problem (which is available), and some domain-specific bounds/limits for the objecti ve function. Here, we used the topological bounds from (43) and (44), and assigned them in the formula to provide guarantees during each planning session. The tightness of these topological bounds, which affects the tightness of the loss bound, depends on the ratio between the angular variance, and the position v ariance, with which we model the noise in factors between poses; the smaller the angular noise is, in relation to the latter, the tighter the bounds are (as analyzed by Khosoussi et al. (2018) and by Kitanov and Indelman (2019)). Hence, we calculated the loss bound assuming dif ferent noise models (dif ferent such ratios), and examined their ef fects. Such a change to the noise model has a minor effect on the objectiv e ev aluation, since it does not change the sparsity pattern of the matrices; thus, we only present the ef fect on the inferred loss bound, and not on the entire planning process. The bounds, which were calculated assuming different noise ratios, are gi ven in T able 2. The loss and its bounds are brought as a percentage of the maximal approximated objectiv e function value in that session, to allow a correct comparison. In the scenario showcased before, the angular v ariance to position variance ratio was 0.25:1. Indeed, changing the noise model has a significant influence on the tightness of the loss bounds. A ratio of 0.01:1 yields a very tight bound. It is not far-fetched that the angular variance would be this lo w in a navigation scenario, for example, by having a compass, as mentioned before. Raising this ratio results in more conserv ative bounds, especially in comparison to the e xact loss, which is zero. Y et they can still be used to guarantee that the solution stays in an acceptable range. Dev eloping tighter bounding methods for the objectiv e function shall help making these guarantees less conservati ve. T o clarify , this discussion, alongside any assumptions on the noise or state structure, is only brought in order to examine our ability to provide guarantees, using this specific topological method. It is not essential in any way in order to apply the sparsification and improv e the performance. 5 Conclusions In an attempt to allow ef ficient autonomous decision making, and, specifically , decision making in the (high-dimensional) belief space, we introduced a new solution paradigm, which suggests performing a conscious simplification of the decision problem. Its impact is intended to be both conceptual and practical. Conceptually , we claimed that decision making, i.e., identification of the best candidate action, can utilize a simplified representation or approximation of the initial state, without compromising the accuracy of the state inference process. After efficiently selecting a candidate action, it should be applied on the original state, which remains exact. On top of that, we presented the simplification loss as a quality of solution measure, and explained how it can be bounded (e.g., using the simplification offset ) in order to provide guarantees. W e recognized that when the simplification maintains action consistency , i.e., when the trend of the objectiv e function is maintained after the simplification, there is no loss. Practically , when applying the paradigm to the belief space, decision making can be conducted considering a sparse approximation of the prior belief. W e provided a scalable algorithm for generation of such approximations. This versatile algorithm can generate approximations of different degrees, based on the subset of state variables selected for the sparsification. Specifically , by identifying the problem’ s uninvolved variables , we can provide an action consistent approximation, which is guaranteed to preserve the action selection. As explained in Section 3.2.2, our sparsification approach is original and intuiti ve, as it e xploits the belief ’ s underlying Bayes net structure. W e presented an in-depth study of our approach, and demonstrated it in a highly realistic active SLAM simulation. W e showed that using sparsification of unin volv ed variables, planning time can be significantly reduced, while, as mentioned, guaranteeing no loss in the quality of solution. W e then showed that planning time can be reduced even further , when sparsifying all the state v ariables; in practice, for this configuration, we experienced no loss in the quality of solution, as well. Nonetheless, we demonstrated ho w the theoretical loss in that case can be bounded. The proposed novel paradigm of fers many possible future research directions. In general, other sparsification methods, besides the provided algorithm, can be used in similar ways; howe ver , their impact on the action selection should be examined. Potentially , existing (approximated) solution methods for POMDPs can also be ev aluated with our 20 theoretical frame work, to pro vide a standard comparison tool for measuring the accuracy of planning algorithms. Also, this framew ork can be used to develop a scheme for elimination of candidate actions; in fact, we have already de veloped a proof of concept for this idea (Elimelech and Indelman 2017 b ). W e can also examine other simplification methods, such as altering the action set or the objecti ve function. Dev eloping simplification methods for more general beliefs, such as multi-modal Gaussians, can hold important practical significance. Deriv ation of tighter loss bounds is also of interest. Overall, with the versatility of these ideas, we expect the approach to yield a substantial contribution to the research community . 6 Ackno wledgments The authors would like to ackno wledge Dr . Andrej Kitanov from the Faculty of Aerospace Engineering at the T echnion — Israel Institute of T echnology , for insightful discussions concerning Section 3.3.2, and his assistance with implementing the simulation. 7 Declaration of conflicting interest The authors declare that there is no conflict of interest. 8 Funding This work was supported by the Israel Science Foundation (grant 351/15). References Agarwal, P . and Olson, E. (2012), V ariable reordering strategies for slam, in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’, IEEE, pp. 3844–3850. Agha-mohammadi, A.-a., Agarwal, S., Kim, S.-K., Chakrav orty , S. and Amato, N. M. (2018), ‘Slap: Simultaneous localization and planning under uncertainty via dynamic replanning in belief space’, IEEE T rans. Robotics 34 (5), 1195–1214. Agha-Mohammadi, A.-A., Chakrav orty , S. and Amato, N. M. (2014), ‘FIRM: Sampling-based feedback motion planning under motion uncertainty and imperfect measurements’, Intl. J. of Robotics Resear ch 33 (2), 268–304. Besl, P . and McKay , N. D. (1992), ‘ A method for registration of 3-D shapes’, IEEE T rans. P attern Anal. Machine Intell. 14 (2), 239– 256. Bopardikar , S. D., Englot, B., Speranzon, A. and van den Berg, J. (2016), ‘Robust belief space planning under intermittent sensing via a maximum eigenv alue-based bound’, IJRR 35 (13), 1609–1626. Boyen, X. and K oller, D. (1998), T ractable inference for complex stochastic processes, in ‘Proc. 14 th Conf. on Uncertainty in AI (U AI)’, Madison, WI, pp. 33–42. Carlev aris-Bianco, N. and Eustice, R. M. (2014), Conserv ativ e edge sparsification for graph slam node removal, in ‘IEEE Intl. Conf. on Robotics and Automation (ICRA)’, pp. 854–860. Carlev aris-Bianco, N., Kaess, M. and Eustice, R. M. (2014), ‘Generic node removal for factor -graph SLAM’, IEEE T rans. Robotics 30 (6), 1371–1385. Chav es, S. M. and Eustice, R. M. (2016), Efficient planning with the Bayes tree for active SLAM, in ‘Intelligent Robots and Systems (IR OS), 2016 IEEE/RSJ International Conference on’, IEEE, pp. 4664–4671. Davis, T . A. (2006), Direct Methods for Sparse Linear Systems , Fundamentals of Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, P A, United States. Davis, T ., Gilbert, J., Larimore, S. and Ng, E. (2004), ‘ A column approximate minimum degree ordering algorithm’, A CM T rans. Math. Softw . 30 (3), 353–376. Dellaert, F . (2012), Factor graphs and GTSAM: A hands- on introduction, T echnical Report GT -RIM-CP&R-2012-002, Georgia Institute of T echnology . Dellaert, F . and Kaess, M. (2006), ‘Square Root SAM: Simultaneous localization and mapping via square root information smoothing’, Intl. J . of Robotics Researc h 25 (12), 1181–1203. Dellaert, F . and Kaess, M. (2017), ‘Factor graphs for robot perception’, F oundations and T rends in Robotics 6 (1-2), 1–139. Elimelech, K. (2021), Efficient Decision Making under Uncertainty in High-Dimensional State Spaces, PhD thesis, T echnion – Israel Institute of T echnology . Elimelech, K. and Indelman, V . (2017 a ), Consistent sparsification for efficient decision making under uncertainty in high dimensional state spaces, in ‘IEEE Intl. Conf. on Robotics and Automation (ICRA)’, pp. 3786–3791. Elimelech, K. and Indelman, V . (2017 b ), Fast action elimination for efficient decision making and belief space planning using bounded approximations, in ‘Proc. of the Intl. Symp. of Robotics Research (ISRR)’. Elimelech, K. and Indelman, V . (2017 c ), Scalable sparsification for efficient decision making under uncertainty in high dimensional state spaces, in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’, pp. 5668–5673. Elimelech, K. and Indelman, V . (2019), Introducing PIVO T: Predictiv e incremental v ariable ordering tactic for efficient belief space planning, in ‘Proc. of the Intl. Symp. of Robotics Research (ISRR)’. Elimelech, K. and Indelman, V . (2021), ‘Efficient modification of the upper triangular square root matrix on variable reordering’, IEEE Robotics and Automation Letter s (RA-L) 6 (2), 675–682. Frey , K. M., Steiner, T . J. and Ho w , J. P . (2017), ‘Complexity analysis and ef ficient measurement selection primitives for high-rate graph slam’, arXiv pr eprint arXiv:1709.06821 . H ¨ ammerlin, G. and Hof fmann, K.-H. (2012), Numerical mathemat- ics , Springer Science & Business Media. Harville, D. A. (1998), ‘Matrix algebra from a statistician’ s perspectiv e’, T echnometrics 40 (2), 164–164. Hsiung, J., Hsiao, M., W estman, E., V alencia, R. and Kaess, M. (2018), Information sparsification in visual-inertial odometry , in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’, pp. 1146–1153. Huang, G., Kaess, M. and Leonard, J. (2012), Consistent sparsification for graph optimization, in ‘Proc. of the European Conference on Mobile Robots (ECMR)’, pp. 150 – 157. Indelman, V . (2015), T o wards information-theoretic decision making in a conserv ativ e information space, in ‘ American Control Conference’, pp. 2420–2426. 21 Indelman, V . (2016), ‘No correlations in volved: Decision making under uncertainty in a conservativ e sparse information space’, IEEE Robotics and Automation Letter s (RA-L) 1 (1), 407–414. Indelman, V ., Carlone, L. and Dellaert, F . (2015), ‘Planning in the continuous domain: a generalized belief space approach for autonomous navigation in unknown en vironments’, Intl. J. of Robotics Resear ch 34 (7), 849–882. Kaelbling, L. P ., Littman, M. L. and Cassandra, A. R. (1998), ‘Planning and acting in partially observable stochastic domains’, Artificial intelligence 101 (1), 99–134. Kaess, M., Johannsson, H., Roberts, R., Ila, V ., Leonard, J. and Dellaert, F . (2012), ‘iSAM2: Incremental smoothing and mapping using the Bayes tree’, Intl. J. of Robotics Researc h 31 (2), 217–236. Karaman, S. and Frazzoli, E. (2011), ‘Sampling-based algorithms for optimal motion planning’, Intl. J. of Robotics Research 30 (7), 846–894. Kavraki, L., Svestka, P ., Latombe, J.-C. and Overmars, M. (1996), ‘Probabilistic roadmaps for path planning in high- dimensional configuration spaces’, IEEE T rans. Robot. Automat. 12 (4), 566–580. Kendall, M. G. (1948), Rank Corr elation Methods , Griffin. Khosoussi, K., Giamou, M., Sukhatme, G. S., Huang, S., Dissanayake, G. and How , J. P . (2018), ‘Reliable graph topologies for SLAM’, Intl. J. of Robotics Resear ch . Kim, A. and Eustice, R. M. (2014), ‘ Acti ve visual SLAM for robotic area coverage: Theory and experiment’, Intl. J. of Robotics Resear ch 34 (4-5), 457–475. Kitanov , A. and Indelman, V . (2019), ‘T opological information- theoretic belief space planning with optimality guarantees’, arXiv pr eprint arXiv:1903.00927 . K oenig, N. and Ho ward, A. (2004), Design and use paradigms for gazebo, an open-source multi-robot simulator , in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’. K opitkov , D. and Indelman, V . (2017), ‘No belief propagation required: Belief space planning in high-dimensional state spaces via factor graphs, matrix determinant lemma and re- use of calculation’, Intl. J. of Robotics Resear ch 36 (10), 1088– 1130. Kretzschmar , H. and Stachniss, C. (2012), ‘Information-theoretic compression of pose graphs for laser-based SLAM’, Intl. J. of Robotics Resear ch 31 (11), 1219–1230. K ¨ unzi, H.-P . A. (2001), Nonsymmetric distances and their associated topologies: about the origins of basic ideas in the area of asymmetric topology , in ‘Handbook of the history of general topology’, Springer , pp. 853–968. Manski, C. F . (1988), ‘Ordinal utility models of decision making under uncertainty’, Theory and Decision 25 (1), 79–104. McAllester , D. A. and Singh, S. (1999), Approximate planning for factored pomdps using belief state simplification, in ‘UAI’, Morgan Kaufmann Publishers Inc., pp. 409–416. Mu, B., Paull, L., Agha-Mohammadi, A.-A., Leonard, J. J. and How , J. P . (2017), ‘T wo-stage focused inference for resource-constrained minimal collision navigation’, IEEE T rans. Robotics 33 (1), 124–140. Patil, S., Kahn, G., Laskey , M., Schulman, J., Goldberg, K. and Abbeel, P . (2014), Scaling up gaussian belief space planning through cov ariance-free trajectory optimization and automatic differentiation, in ‘Intl. W orkshop on the Algorithmic Foundations of Robotics (W AFR)’, pp. 515–533. Pineau, J., Gordon, G. J. and Thrun, S. (2006), ‘ Anytime point- based approximations for large POMDPs. ’, J. of Artificial Intelligence Resear ch 27 , 335–380. Platt, R., T edrake, R., Kaelbling, L. and Lozano-P ´ erez, T . (2010), Belief space planning assuming maximum likelihood observations, in ‘Robotics: Science and Systems (RSS)’, Zaragoza, Spain, pp. 587–593. Porta, J. M., Vlassis, N., Spaan, M. T . and Poupart, P . (2006), ‘Point-based v alue iteration for continuous pomdps’, J . of Machine Learning Resear ch 7 , 2329–2367. Prentice, S. and Roy , N. (2009), ‘The belief roadmap: Efficient planning in belief space by factoring the cov ariance’, Intl. J. of Robotics Resear ch 28 (11-12), 1448–1465. Roy , N., Gordon, G. J. and Thrun, S. (2005), ‘Finding approximate pomdp solutions through belief compression’, J. Artif. Intell. Res.(J AIR) 23 , 1–40. Silver , D. and V eness, J. (2010), Monte-carlo planning in large pomdps, in ‘ Adv ances in Neural Information Processing Systems (NIPS)’, pp. 2164–2172. Stachniss, C., Haehnel, D. and Burgard, W . (2004), Exploration with active loop-closing for FastSLAM, in ‘IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IR OS)’. Thrun, S., Liu, Y ., Koller , D., Ng, A., Ghahramani, Z. and Durrant- Whyte, H. (2004), ‘Simultaneous localization and mapping with sparse extended information filters’, Intl. J. of Robotics Resear ch 23 (7-8), 693–716. V an Den Berg, J., Patil, S. and Alterovitz, R. (2012), ‘Motion planning under uncertainty using iterative local optimization in belief space’, Intl. J. of Robotics Resear ch 31 (11), 1263–1278. V oss, C., Moll, M. and Kavraki, L. E. (2015), A heuristic approach to finding div erse short paths, in ‘IEEE Intl. Conf. on Robotics and Automation (ICRA)’, pp. 4173–4179. Y e, N., Somani, A., Hsu, D. and Lee, W . S. (2017), ‘Despot: Online pomdp planning with regularization’, J AIR 58 , 231–266. A ppendices A Additional loss bounds W e present here additional techniques to bound the loss between a decision problem P . = ( b, U , J ) , and its simplified version P s . = ( b s , U , J ) , which uses a sparse belief approximation, created with Algorithm 1. A.1 Pre-solution guarantees: rank-1 updates W e remind again that according to Lemmas 1 and 2 in Section 2.2, we can use (a bound of) the offset between the problem and its simplification, to derive a loss bound. In Section 3.3.1, we prov ed that sparsification of the unin volved variables always results in zero offset, and hence zero loss. Now , we sho w that under additional restrictions, we can deriv e an offset bound also when sparsifying in volved variables. Assume that for every action u ∈ U the corresponding collectiv e Jacocian U ∈ R 1 × N contains only a single row , i.e., rank-1 information updates. This can be the case, for example, in sensor placement problems with scalar 22 measurements (like temperature). Now , let us analyze the simplification offset: 2 · δ ( P , P s , u ) = (45) 2 · | V ( b, u ) − V ( b s , u ) | = (46)   ln   Λ + U T U   − ln   Λ s + U T U     = (47) (Matrix determinant lemma (see Harville 1998)) | ln  | Λ | ·  1 + U Λ − 1 U T  − ln  | Λ s | ·  1 + U Λ − 1 s U T  | = (48) (Eq. 42)   ln  1 + U Λ − 1 U T  − ln  1 + U Λ − 1 s U T    = (49) | ln  1 + U Λ − 1 s U T + U ( Λ − 1 − Λ − 1 s ) U T  − ln  1 + U Λ − 1 s U T  | = ( ? ) (50) The logarithm is a monotonously increasing concave function, thus, ev ery a, b ∈ R and c ≥ 0 satisfy | ln( a ) − ln( b ) | ≥ | ln( a + c ) − ln( b + c ) | . (51) In other words, the difference in the function value between a pair of inputs decreases, when the inputs equally grow . Surely , 0 ≤ U Λ − 1 s U T , since Λ − 1 s is positive semi-definite. Thus, we may choose a = 1 + U ( Λ − 1 − Λ − 1 s ) U T , b = 1 , and c = U Λ − 1 s U T . Therefore, ( ? ) ≤   ln  1 + U ( Λ − 1 − Λ − 1 s ) U T  − ln (1)   = (52)   ln  1 + U ( Λ − 1 − Λ − 1 s ) U T    ≤ (53)       ln   1 + α · X i,j ∈I nv ( u ) ( Λ − 1 − Λ − 1 s ) ij         , (54) where I nv ( u ) is the set of (prior state) variables inv olved in u , and the scalar α complies to α ≥ max i U 2 i . W e recall that U i is uninv olved ⇐ ⇒ U i = 0 . When considering the in volved v ariables among all the actions, and α is valid ∀ u ∈ U , this bound becomes independent of a specific action, and only a single expression needs to be calculated. Overall, we can conclude the following bound on the of fset: ∆( P , P s ) ≤ 1 2 ·       ln   1 + α · X i,j ∈I nv ( U ) ( Λ − 1 − Λ − 1 s ) ij         . (55) As we may notice, this symbolic bound depends on the initial belief of the original and simplified problems, yet not on their solution; it hence can be utilized before actually solving the problem. When calculating this bound, we considered only single-row collectiv e Jacobians, but otherwise arbitrary . Although, the considered assumption is restrictiv e, the concluded bound is indeed usable for certain problems, as evident in our follow-up work (Elimelech and Indelman 2017 b ). Guaranteed action consistency for the case of single-ro w Jacobians, which are also limited to a single non-zero entry , was pre viously shown by Indelman (2016). A.2 P ost-solution guarantees W e recall that the of fset can also be bounded by utilizing domain-specific upper and lo wer bounds of the objectiv e function ( U B , LB , respectiv ely), as indicated in (6). In addition to the topological objectiv e bounds, which were presented in Section 3.3.2, we may also utilize alternative bounds, which rely on known determinant bounds. For the lower bound, we can use Minkowski determinant inequality , which states that for positiv e semi-definite matrices M 1 , M 2 ∈ R N × N | M 1 + M 2 | 1 N ≥ | M 1 | 1 N + | M 2 | 1 N , (56) ln | M 1 + M 2 | ≥ N · ln  | M 1 | 1 N + | M 2 | 1 N  . (57) Let us assign M 1 . = Λ , M 2 . = U T U ; when U T U is not a full rank update (e.g. U has less than N rows),   U T U   = 0 , and we are left with ln   Λ + U T U   ≥ ln | Λ | (58) For formality , it is easy to show that ev en if the prior state size is smaller than N , the validity of the conclusion is not compromised. For the upper bound, we can use Hadamard inequality , which states that for a positive semi-definite matrix M ∈ R N × N | M | ≤ N Y i =1 ( M ) ii . (59) Let us assign M . = Λ + U T U ; then   Λ + U T U   ≤ N Y i =1 ( Λ + U T U ) ii , (60) ln   Λ + U T U   ≤ n X i =1 ln[( Λ + U T U ) ii ] . (61) Overall, we get the follo wing objective function bounds: LB det { V ( b, u ) } . = ln | Λ | − N · ln(2 π e ) , U B det { V ( b, u ) } . = N X i =1 ln[( Λ + U T U ) ii ] − N · ln(2 π e ) , (62) (63) where Λ is the information matrix of prior belief b , and U is the collecti ve Jacobian of action u , and N is the posterior state size. Unlike the bounds presented in Section 3.3.2, these bounds are extremely general, as they make no assumptions on the state nor actions, besides the standard problem formulation. As expected, this advantage comes at the expense of tightness. Nonetheless, they may especially be useful when the matrix Λ is diagonally dominant. 23 B Proofs B.1 Lemma 1 Proof. Refer to the proof of the more general case, stated in Lemma 6. B.2 Lemma 2 Proof. Refer to Elimelech (2021) for an or an extended discussion and formulation of this statement. B.3 Lemma 3 The properties are tri vially giv en from the definition of action consistency . B.4 Lemma 4 Proof. Assume f is a monotonously increasing function such that for ev ery two actions a i , a j ∈ A f ( V 1 ( ξ 1 , a i )) = V 2 ( ξ 2 , a i ) , f ( V 1 ( ξ 1 , a j )) = V 2 ( ξ 2 , a j ) , (64) then f ( V 1 ( ξ 1 , a i )) < f ( V 1 ( ξ 1 , a j )) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) , (65) Because f is monotonously increasing, then f ( x ) < f ( y ) ⇐ ⇒ x < y , and V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) . (66) Meaning, ( ξ 1 , A , V 1 ) ' ( ξ 2 , A , V 2 ) . Now to prove the opposite direction, assume ( ξ 1 , A , J 1 ) ' ( ξ 2 , A , J 2 ) ; hence, V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) . (67) Let us define a ne w function f on the domain { V 1 ( ξ 1 , a ) | a ∈ A} such that f ( V 1 ( ξ 1 , a )) . = V 2 ( ξ 2 , a ) . Giv en this definition and the action consistency conditions from (67), we can conclude that f ( V 1 ( ξ 1 , a i )) < f ( V 1 ( ξ 1 , a j )) ⇐ ⇒ V 2 ( ξ 2 , a i ) < V 2 ( ξ 2 , a j ) ⇐ ⇒ V 1 ( ξ 1 , a i ) < V 1 ( ξ 1 , a j ) . (68) Thus, f is monotonously increasing on its domain. B.5 Lemma 5 Proof. Both directions are a direct consequence of Lemma 4. Assume ∆ ∗ ( P , P s ) = 0 . Thus, a monotonously increasing function f exists such that ∆( P , P f s ) = 0 . Meaning, for ev ery action a ∈ A , f ( V s ( ξ s , a )) = V ( ξ , a ) . According to Lemma 4, it is sufficient to pro ve that P ' P s . T o prove the opposite direction, assume P ' P s . Let us define a ne w function f on the domain { V s ( ξ s , a ) | a ∈ A} such that f ( V s ( ξ s , a )) . = V ( ξ , a ) . From this definition, ∆( P , P f s ) = 0 . Also, according to Lemma 4, this function f is monotonously increasing, and thus ∆ ∗ ( P , P s ) = 0 . B.6 Lemma 6 Proof. From the definition of the simplification offset, we kno w that for e very monotonously increasing function f , the follo wing is true: | V ( ξ , a ∗ ) − f ( V s ( ξ s , a ∗ )) | ≤ ∆( P , P f s ) , (69) | V ( ξ , a ∗ s ) − f ( V s ( ξ s , a ∗ s )) | ≤ ∆( P , P f s ) . (70) Removing the absolute v alues surely does not compromise the inequalities: V ( ξ , a ∗ ) − f ( V s ( ξ s , a ∗ )) ≤ ∆( P , P f s ) , (71) f ( V s ( ξ s , a ∗ s )) − V ( ξ , a ∗ s ) ≤ ∆( P , P f s ) . (72) By adding the two inequalities, and utilizing the definition of the loss , we get: loss ( P , P f s ) + f ( V s ( ξ s , a ∗ s )) − f ( V s ( ξ s , a ∗ )) ≤ 2 · ∆( P , P f s ) . (73) From the definition of a ∗ s , we know that V s ( ξ s , a ∗ s )) ≥ V s ( ξ s , a ∗ ) . (74) Since f is monotonously increasing, then also f ( V s ( ξ s , a ∗ s ))) ≥ f ( V s ( ξ s , a ∗ )) , (75) f ( V s ( ξ s , a ∗ s ))) − f ( V s ( ξ s , a ∗ )) ≥ 0 . (76) Thus, we can infer that loss ( P , P f s ) ≤ 2 · ∆( P , P f s ) . (77) Since the final statement is true for any monotonously increasing function f , we may conclude the desired upper bound ov er the loss, loss ( P , P s ) ≤ 2 · ∆ ∗ ( P , P s ) (78) B.7 Lemma 7 Proof. Let us examine three decision problems P 1 , P 2 , P 3 , where P i . = ( ξ i , A , V i ) . First, let us define the notation δ ( P i , P j , a ) . = | V i ( ξ i , a ) − V j ( ξ j , a ) | . Now , for each two problems P i , P j , we mark a ij ∈ A as the action, and f ij as the balance function, for which ∆ ∗ ( P i , P j ) . = δ ( P i , P f ij j , a ij ) (the values can be chosen arbitrarily from all values which comply to the equation). According to this 24 notation we can conclude: ∆ ∗ ( P 1 , P 2 ) + ∆ ∗ ( P 2 , P 3 ) . = δ ( P 1 , P f 12 2 , a 12 ) + δ ( P 2 , P f 23 3 , a 23 ) ≥ δ ( P 1 , P f 12 2 , a 13 ) + δ ( P 2 , P f 23 3 , a 13 ) . = | V 1 ( ξ 1 , a 13 ) − f 12 ( V 2 ( ξ 2 , a 13 )) | + | V 2 ( ξ 2 , a 13 ) − f 23 ( V 3 ( ξ 3 , a 13 )) | ≥ | V 1 ( ξ 1 , a 13 ) − f 12 ( V 2 ( ξ 2 , a 13 ))+ V 2 ( ξ 2 , a 13 ) − f 23 ( V 3 ( ξ 3 , a 13 )) | . = ( ?? ) . (79) Let us define the following scalar function: F ( x ) . = f 23 ( x ) + f 12 ( V 2 ( ξ 2 , a 13 )) − V 2 ( ξ 2 , a 13 ) = f 23 ( x ) + constant . (80) Since f 23 is a monotonously increasing, so is F , and ( ?? ) = | V 1 ( ξ 1 , a 13 ) − F ( V 3 ( ξ 3 , a 13 )) | . = δ ( P 1 , P F 3 , a 13 ) ≥ δ ( P 1 , P f 13 3 , a 13 ) = ∆ ∗ ( P 1 , P 3 ) . (81) Hence, ∆ ∗ satisfies the triangle inequality . B.8 Corollar y 1 Proof. Let us mark as R p s the sparsified square root matrix, before permuting the v ariables back to their original order in line 8 of Algorithm 1. First, we show that applying the reverse permutation P 2 P T on R p s indeed leads to a square root of the sparse information matrix Λ s (in the original order): ( P R p s P T ) T ( P R p s P T ) = P R p s T R p s P T = P Λ p s P T = Λ s , (82) where Λ p s is the sparsified information matrix, before permuting the variables back. Now , we want to examine the shape of the matrix R s . = P R p s P T , and sho w that it is indeed triangular . According to Algorithm 1, before executing line 8, R p s is of the following structure: R p s =  diagonal 0 0 triangular  , (83) where the rows of the diagonal block correspond to the sparsified variables. W ithout losing generality , we should only prov e that applying a permutation of the form p 0 : (1 , . . . , n ) 7→ (2 , . . . , i, 1 , i + 1 , . . . , n ) on this matrix (i.e., “pushing forwards” one of the sparsified v ariables), does not break the triangular form. Hence, assuming P T is the column permutation matrix matching such p 0 , let us look at R s . = P R p s P T = P      d ∈ R 0 . . . 0 0 . . . 0 triangular      P T =             0 . . . 0 triangular ∗ d 0 . . . 0 0 . . . 0 0 . . . 0 0 triangular             P T =             triangular 0 . . . 0 ∗ 0 . . . 0 d 0 . . . 0 0 0 . . . 0 triangular             . (84) Recursiv ely utilizing this conclusion, for more intricate permutations, proves that R s is indeed triangular , whene ver permuting the sparsified variables back to their original order , as desired. B.9 Theorem 1 Proof. Consider a belief b = N ( X ∗ , Λ − 1 ) , where the state contains n 1 unin volved v ariables and n 2 in volved v ariables, such that n = n 1 + n 2 is the prior state size. Also consider the simplified belief b s = N ( X ∗ , Λ − 1 s ) , in which all unin volved variables were sparsified, by applying Algorithm 1. W e mark with P the (column) permutation matrix that positions all the inv olved variable at the end of the state. Now , let R p be the Cholesky factor of the permuted information matrix Λ p . = P T Λ P , such that Λ p = R p T R p . This R p can be divided into block form: R p . =  R p 11 R p 12 0 n 2 × n 1 R p 22  , (85) where R p 11 ∈ R n 1 × n 1 and R p 22 ∈ R n 2 × n 2 are triangular sub- matrices, R p 12 ∈ R n 1 × n 2 , and 0 n 1 × n 2 is a zero matrix in the specified size. By following the steps of Algorithm 1, we realize that the returned sparsified information matrix Λ s is given as Λ s . = P R p s T R p s P T (or , equally , satisfies P T Λ s P . = R p s T R p s ), where R p s . =  D p 11 0 n 1 × n 2 0 n 2 × n 1 R p 22  , (86) and D p 11 is the diagonal matrix formed by cop ying the diagonal of R p 11 (and assigning zero elsewhere). W e would like to find the simplification offset between the two decision problems P and P s (for which b and b s are 25 the initial beliefs, respectiv ely). Let us consider a candidate action u ∈ U with a collectiv e Jacobian U ∈ R h × ( n + m ) , where n + m is the posterior state size. W e may deriv e the following from the definition of the offset and the objecti ve function V : δ ( P , P s , u ) = 1 2 ·    ln    ˘ Λ + U T U    − ln    ˘ Λ s + U T U       . (87) Now , let us examine the follo wing expression: # . =    ˘ Λ + U T U    −    ˘ Λ s + U T U    , (88) W e kno w that (unitary) variable permutation does not affect the determinant of a matrix, thus # =    ˘ P T  ˘ Λ + U T U  ˘ P    −    ˘ P T  ˘ Λ s + U T U  ˘ P    =    ˘ P T ˘ Λ ˘ P + ( U ˘ P ) T ( U ˘ P )    −    ˘ P T ˘ Λ s ˘ P + ( U ˘ P ) T ( U ˘ P )    , (89) where ˘ P . =  P 0 n × m 0 m × n I m × m  (90) is the augmented permutation matrix, which keeps the variables added in the update at the end of the state. Note that if the variables were not originally added to the end of the state, the permutation ˘ P can be easily adapted to enforce this property . W e can also augment the matrix R p with m empty columns (and similarly for R p s ): ˘ R p . =  R p 11 R p 12 0 n × m 0 n 2 × n 1 R p 22  , (91) and assign the result in # , to yield: # =    ˘ R p T ˘ R p + ( U ˘ P ) T ( U ˘ P )    −     ˘ R p s T ˘ R p s + ( U ˘ P ) T ( U ˘ P )     (92) This expression can be reor ganized to the following form: # =       ˘ R p U ˘ P  T  ˘ R p U ˘ P       −       ˘ R p s U ˘ P ! T ˘ R p s U ˘ P !       (93) The two matrices which appear in this expression also follo w a block form:  ˘ R p U ˘ P  =  R p 11 R p 12 0 n 1 × m 0 ( n 2 + h ) × n 1 B  , (94) ˘ R p s U ˘ P ! =  D p 11 0 n 1 × ( n 2 + m ) 0 ( n 2 + h ) × n 1 B  , (95) where B . =  R p 22 0 n 2 × m U in v  , (96) and U in v is a sub-matrix of U ˘ P , containing its right n 2 + m columns. Since the left n 1 columns of U ˘ P correspond to unin volved v ariables, we know the y may only contain zeros. Thus, if we mark ˘ R p 12 . =  R p 12 0 n 1 × m  , then the left term in (93) is:       ˘ R p U ˘ P  T  ˘ R p U ˘ P       =      R p 11 T R p 11 R p 11 T ˘ R p 12 ˘ R p 12 T R p 11 ˘ R p 12 T ˘ R p 12 + B T B      . (97) From the block-determinant formula (see Harville 1998), this equals to    R p 11 T R p 11    ·     ˘ R p 12 T ˘ R p 12 + B T B − . . . ˘ R p 12 T R p 11 R p 11 − 1 R p 11 T − 1 R p 11 T ˘ R p 12     = | R p 11 | 2 ·   B T B   (98) The right term in (93) is:       ˘ R p s U ˘ P ! T ˘ R p s U ˘ P !       =     D p 11 T D p 11 0 0 B T B     =    D p 11 2    ·   B T B   (99) Since R p 11 and D p 11 are triangular matrices with the same diagonal, their determinants are equal (to the product of the diagonal elements). Thus, # = 0 , and ov erall    ˘ Λ + U T U    =    ˘ Λ s + U T U    . (100) This surely means that ln    ˘ Λ + U T U    − ln    ˘ Λ s + U T U    = 0 . (101) Finally , assigning this expression in (87) means that δ ( P , P s , u ) = 0 . (102) Since the pre vious conclusion is true ∀ u ∈ U , this means that ∆( P , P s ) . = max u ∈U δ ( P , P s , u ) = 0 , (103) as desired.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment