Misleading Stars: What Cannot Be Measured in the Internet?

Misleading Stars: What Cannot Be Measured in the Internet? Yvonne-Anne Pignolet 1 , Stefan Schmid 2 , Gilles T redan 2 1 ABB Research, Switzerland; yvonne-anne.pignolet@ch.abb .com 2 Deutsche T elekom Laboratories & TU Berlin, Germany; { stefan,gilles } @net.t-labs.tu-berlin.de Abstract T raceroute measurements are one of our main instruments to shed light onto the structure and properties of today’ s comple x netw orks such as the Internet. This paper studies the feasibility and infeasibility of inferring the network topology gi ven traceroute data from a worst-case perspecti ve, i.e., without any probabilistic assumptions on, e.g., the nodes’ degree distrib ution. W e attend to a scenario where some of the routers are anonymous, and propose two fundamental axioms that model two basic assumptions on the traceroute data: (1) each trace corresponds to a real path in the network, and (2) the routing paths are at most a factor 1 /α off the shortest paths, for some parameter α ∈ (0 , 1] . In contrast to existing literature that focuses on the cardinality of the set of (often only minimal) inferrable topologies, we argue that a large number of possible topologies alone is often unproblematic, as long as the networks have a similar structure. W e hence seek to characterize the set of topologies inferred with our axioms. W e introduce the notion of star graphs whose colorings capture the differences among inferred topologies; it also allows us to construct inferred topologies explicitly . W e ﬁnd that in general, inferrable topologies can differ signiﬁcantly in many important aspects, such as the nodes’ distances or the number of triangles. These negativ e results are complemented by a discussion of a scenario where the trace set is best possible, i.e., “complete”. It turns out that while some properties such as the node degrees are still hard to measure, a complete trace set can help to determine global properties such as the connectivity . 1 1 Intr oduction Surprisingly little is known about the structure of many important complex networks such as the Internet. One reason is the inherent difﬁculty of performing accurate, large-scale and preferably synchronous measurements from a large number of different vantage points. Another reason are priv acy and information hiding issues: for example, netw ork providers may seek to hide the details of their infrastructure to av oid tailored attacks. Since kno wledge of the network characteristics is crucial for many applications (e.g., RMTP [12], or P aDIS [13]), the research community implements measurement tools to analyze at least the main properties of the network. The results can then, e.g., be used to design more efﬁcient network protocols in the future. This paper focuses on the most basic characteristic of the network: its topology . The classic tool to study topological properties is tracer oute . T raceroute allo ws us to collect traces from a gi ven source node to a set of speciﬁed destination nodes. A trace between two nodes contains a sequence of identiﬁers describing the route trav eled by the packet. Ho wev er , not e very node along such a path is conﬁgured to answer with its identiﬁer . Rather , some nodes may be anonymous in the sense that they appear as stars (‘ ∗ ’) in a trace. Anonymous nodes exacerbate the exploration of a topology because already a small number of anonymous nodes may increase the spectrum of inferrable topologies that correspond to a trace set T . This paper is motiv ated by the observation that the mere number of inferrable topologies alone does not con- tradict the usefulness or feasibility of topology inference; if the set of inferrable topologies is homogeneous in the sense that that the different topologies share many important properties, the generation of all possible graphs can be a voided: an arbitrary representati ve may characterize the underlying network accurately . Therefore, we identify important topological metrics such as diameter or maximal node degree and examine how “close” the possible inferred topologies are with respect to these metrics. 1.1 Related W ork Arguably one of the most inﬂuential measurement studies on the Internet topology was conducted by the Faloutsos brothers [8] who show that the Internet exhibits a ske wed structure: the nodes’ out-degree follows a power -law distribution. Moreover , this property seems to be in v ariant ov er time. These results complement discoveries of similar distrib utions of communication traf ﬁc which is often self-similar , and of the topologies of natural networks such as human respiratory systems. This property allows us to giv e good predictions not only on node degree distributions but also, e.g., on the expected number of nodes at a giv en hop-distance. Since [8] was published, many additional results hav e been obtained, e.g., by conducting a distributed computing approach to increase the number of measurement points [6]. Ho wev er , our understanding remains preliminary , and the topic continues to attract much attention from the scientiﬁc communities. In contrast to these measurement studies, we pursue a more formal approach, and a complete revie w of the empirical results obtained ov er the last years is beyond the scope of this paper . In the ﬁeld of network tomography , topologies are explored using pairwise end-to-end measurements, without the cooperation of nodes along these paths. This approach is quite ﬂexible and applicable in various conte xts, e.g., in social networks [4]. For a good discussion of this approach as well as results for a routing model along shortest and second shortest paths see [4]. For example, [4] sho ws that for sparse random graphs, a relati vely small number of cooperating participants is suf ﬁcient to discover a netw ork fairly well. The classic tool to disco ver Internet topologies is traceroute [7]. Unfortunately , there are sev eral problems with this approach that render topology inference difﬁcult, such as aliasing or load-balancing , which has moti vated researchers to develop new tools such as P aris T racer oute [5, 10]. Another complication stems from the fact that routers may appear as stars in the trace since they are overloaded or since they are conﬁgured not to send out any ICMP responses. The lack of complete information in the trace set renders the accurate characterization of Internet topologies dif ﬁcult. This paper attends to the problem of anonymous nodes and assumes a conservati ve, “worst-case” perspectiv e that does not rely on any assumptions on the underlying network. There are already sev eral works on the subject. Y ao et al. [15] initiated the study of possible candidate topologies for a gi ven trace set and suggested computing 2 the minimal topology , that is, the topology with the minimal number of anonymous nodes, which turns out to be NP-hard. Consequently , different heuristics ha ve been proposed [9, 10]. Our work is motiv ated by a series of papers by Acharya and Gouda. In [3], a network tracing theory model is introduced where nodes are “irregular” in the sense that each node appears in at least one trace with its real identiﬁer . In [1], hardness results are deriv ed for this model. Howe ver , as pointed out by the authors themselves, the irregular node model—where nodes are anonymous due to high loads—is less relev ant in practice and hence they consider strictly anonymous nodes in their follow-up studies [2]. As prov ed in [2], the problem is still hard (in the sense that there are many minimal networks corresponding to a trace set), even with only two anonymous nodes, symmetric routing and without aliasing. In contrast to this line of research on cardinalities, we are interested in the network pr operties . If the inferred topologies share the most important characteristics, the ne gati ve results in [1, 2] may be of little concern. Moreov er , we belie ve that a study limited to minimal topologies only may miss important redundancy aspects of the Internet. Unlike [1, 2], our work is constructi ve in the sense that algorithms can be deriv ed to compute inferred topologies. 1.2 Our Contribution This paper initiates the study and characterization of topologies that can be inferred from a giv en trace set computed with the traceroute tool. While existing literature assuming a worst-case perspectiv e has mainly focused on the cardinality of minimal topologies, we go one step further and examine speciﬁc topological graph properties. W e introduce a formal theory of topology inference by proposing basic axioms (i.e., assumptions on the trace set) that are used to guide the inference process. W e present a novel and we believ e appealing deﬁnition for the isomorphism of inferred topologies which is a ware of traf ﬁc paths; it is motiv ated by the observation that although two topologies look equi valent up to a renaming of anonymous nodes, the same trace set may result in dif ferent paths. Moreov er , we initiate the study of two e xtremes: in the ﬁrst scenario, we only require that each link appears at least once in the trace set; interestingly , howe ver , it turns out that this is often not suf ﬁcient, and we propose a “best case” scenario where the trace set is, in some sense, complete : it contains paths between all pairs of nodes. The main result of the paper is a negati ve one. It is shown that already a small number of anonymous nodes in the network renders topology inference difﬁcult. In particular , we prove that in general, the possible inferrable topologies dif fer in many crucial aspects. W e introduce the concept of the star graph of a trace set that is useful for the characterization of inferred topologies. In particular, colorings of the star graphs allow us to constructi vely derive inferred topologies. (Al- though the general problem of computing the set of inferrable topologies is related to NP-hard problems such as minimal graph coloring and graph isomorphism , some important instances of inferrable topologies can be com- puted ef ﬁciently .) The minimal coloring (i.e., the chromatic number) of the star graph deﬁnes a lower bound on the number of anonymous nodes from which the stars in the traces could originate from. And the number of possible colorings of the star graph—a function of the chr omatic polynomial of the star graph—gi ves an upper bound on the number of inferrable topologies. W e show that this bound is tight in the sense that there are situation where there indeed exist so man y inferrable topologies. Especially , there are problem instances where the cardinality of the set of inferrable topologies equals the Bell number . This insight complements (and generalizes to arbitrary , not only minimal, inferrable topologies) existing cardinality results. Finally , we examine the scenario of fully explor ed networks for which “complete” trace sets are av ailable. As expected, inferrable topologies are more homogenous and can be characterized well with respect to many properties such as node distances. Howe ver , we also ﬁnd that other properties are inherently difﬁcult to estimate. Interestingly , our results indicate that ful l exploration is often useful for global properties (such as connectivity) while it does not help much for more local properties (such as node degree). 1.3 Organization The remainder of this paper is organized as follows. Our theory of topology inference is introduced in Section 2. The main contribution is presented in Sections 3 and 4 where we derive bounds for general trace sets and fully 3 explored networks, respectiv ely . In Section 5, the paper concludes with a discussion of our results and directions for future research. Due to space constraints, some proofs are moved to the appendix. 2 Model Let T denote the set of traces obtained from probing (e.g., by traceroute) a (not necessarily connected and undi- rected) network G 0 = ( V 0 , E 0 ) with nodes or vertices V 0 (the set of routers) and links or edges E 0 . W e assume that G 0 is static during the probing time (or that probing is instantaneous). Each trace T ( u, v ) ∈ T describes a path connecting two nodes u, v ∈ V 0 ; when u and v do not matter or are clear from the context, we simply write T . Moreover , let d T ( u, v ) denote the distance (number of hops) between two nodes u and v in trace T . W e deﬁne d G 0 ( u, v ) to be the corresponding shortest path distance in G 0 . Note that a trace between two nodes u and v may not describe the shortest path between u and v in G 0 . The nodes in V 0 fall into two categories: anonymous nodes and non-anonymous (or shorter: named ) nodes. Therefore, each trace T ∈ T describes a sequence of symbols representing anon ymous and non-anonymous nodes. W e make the natural assumption that the ﬁrst and the last node in each trace T is non-anonymous. Moreover , we assume that traces are gi ven in a form where non-anonymous nodes appear with a unique, anti-aliased identiﬁer (i.e., the multiple IP addresses corresponding to different interfaces of a node are resolved to one identiﬁer); an anonymous node is represented as ∗ (“star”) in the traces. For our formal analysis, we assign to each star in a trace set T a unique identiﬁer i : ∗ i . (Note that except for the numbering of the stars, we allow identical copies of T in T , and we do not make any assumptions on the implications of identical traces: they may or may not describe the same paths.) Thus, a trace T ∈ T is a sequence of symbols taken from an alphabet Σ = I D ∪ ( S i ∗ i ) , where I D is the set of non-anonymous node identiﬁers (IDs): Σ is the union of the (anti-aliased) non-anonymous nodes and the set of all stars (with their unique identiﬁers) appearing in a trace set. The main challenge in topology inference is to determine which stars in the traces may originate from which anonymous nodes. Henceforth, let n = |I D| denote the number of non-anonymous nodes and let s = | S i ∗ i | be the number of stars in T ; similarly , let a denote the number of anonymous nodes in a topology . Let N = n + s = | Σ | be the total number of symbols occurring in T . Clearly , the process of topology inference depends on the assumptions on the measurements. In the following, we postulate the fundamental axioms that guide the reconstruction. First, we make the assumption that each link of G 0 is visited by the measurement process, i.e., it appears as a transition in the trace set T . In other words, we are only interested in inferring the (sub-)graph for which measurement data is av ailable. A X I O M 0 ( Complete Cover ): Each edge of G 0 appears at least once in some trace in T . The next fundamental axiom assumes that traces al ways represent paths on G 0 . A X I O M 1 ( Reality Sampling ): For ev ery trace T ∈ T , if the distance between two symbols σ 1 , σ 2 ∈ T is d T ( σ 1 , σ 2 ) = k , then there exists a path (i.e., a walk without cycles) of length k connecting two (named or anonymous) nodes σ 1 and σ 2 in G 0 . The follo wing axiom captures the consistency of the routing protocol on which the traceroute probing relies. In the current Internet, polic y routing is kno wn to have in impact both on the route length [14] and on the con ver gence time [11]. A X I O M 2 ( α -(Routing) Consistency ): There exists an α ∈ (0 , 1] such that, for every trace T ∈ T , if d T ( σ 1 , σ 2 ) = k for two entries σ 1 , σ 2 in trace T , then the shortest path connecting the two (named or anony- mous) nodes corresponding to σ 1 and σ 2 in G 0 has distance at least d αk e . 4 Note that if α = 1 , the routing is a shortest path routing. Moreov er , note that if α = 0 , there can be loops in the paths, and there are hardly an y topological constraints, rendering almost an y topology inferrable. (For example, the complete graph with one anonymous router is al ways a solution.) A natural axiom to merge traces is the follo wing. A X I O M 3 ( T race Mer ging ): For two traces T 1 , T 2 ∈ T for which ∃ σ 1 , σ 2 , σ 3 , where σ 2 refers to a named node, such that d T 1 ( σ 1 , σ 2 ) = i and d T 2 ( σ 2 , σ 3 ) = j , it holds that the distance between two nodes u and v corresponding to σ 1 and σ 2 , respecti vely , in G 0 , is at most d G 0 ( σ 1 , σ 3 ) ≤ i + j . Any topology G which is consistent with these axioms (when applied to T ) is called inferr able from T . Deﬁnition 2.1 (Inferrable T opologies) . A topology G is ( α -consistently) inferrable fr om a trace set T if axioms A X I O M 0, A X I O M 1, A X I O M 2 (with parameter α ), and A X I O M 3 ar e fulﬁlled. W e will refer by G T to the set of topologies inferrable from T . Please note the following important observ ation. Remark 2.2. While we generally have that G 0 ∈ G T , since T was generated fr om G 0 and A X I O M 0, A X I O M 1, A X I O M 2 and A X I O M 3 ar e fulﬁlled by deﬁnition, there can be situations where an α -consistent trace set for G 0 contradicts A X I O M 0: some edges may not appear in T . If this is the case, we will focus on the inferrable topologies containing the links we know , even if G 0 may have additional, hidden links that cannot be explor ed due to the high α value. The main objectiv e of a topology inference algorithm A L G is to compute topologies which are consistent with these axioms. Concretely , A L G ’ s input is the trace set T together with the parameter α specifying the assumed routing consistency . Essentially , the goal of any topology inference algorithm A L G is to compute a mapping of the symbols Σ (appearing in T ) to nodes in an inferred topology G ; or , in case the input parameters α and T are contradictory , reject the input. This mapping of symbols to nodes implicitly describes the edge set of G as well: the edge set is unique as all the transitions of the traces in T are now unambiguously tied to two nodes. u v * 12 * 34 u v * 14 * 23 Figure 1: T wo non-isomorphic inferred topologies, i.e., different mapping functions lead to these topologies. So far , we hav e ignored an important and non-trivial ques- tion: When are two topologies G 1 , G 2 ∈ G T dif ferent (and hence appear as two independent topologies in G T )? In this pa- per , we pursue the following approach: W e are not interested in purely topological isomorphisms, but we care about the identi- ﬁers of the non-anonymous nodes, i.e., we are interested in the locations of the non-anonymous nodes and their distance to other nodes. For anonymous nodes, the situation is slightly more com- plicated: one might think that as the nodes are anon ymous, their “names” do not matter . Consider howe ver the example in Fig- ure 1: the two inferrable topologies ha ve tw o anonymous nodes, once where {∗ 1 , ∗ 2 } plus {∗ 3 , ∗ 4 } are merged into one node each in the inferrable topology and once where {∗ 1 , ∗ 4 } plus {∗ 2 , ∗ 3 } are merged into one node each in the inferrable topology . In this paper, we regard the two topologies as different, for the following reason: Assume that there are two paths in the network, one u ∗ 2 v (e.g., during day time) and one u ∗ 3 v (e.g., at night); clearly , this trafﬁc has different consequences and hence we want to be able to distinguish between the two topologies described above. In other words, our notion of isomorphism of inferred topologies is path-awar e . It is conv enient to introduce the following M A P function. Essentially , an inference algorithm computes such a mapping. Deﬁnition 2.3 (Mapping Function M A P ) . Let G = ( V , E ) ∈ G T be a topology inferrable fr om T . A topology infer ence algorithm describes a surjective mapping function M A P : Σ → V . F or the set of non-anonymous nodes in Σ , the mapping function is bijective; and each star is mapped to exactly one node in V , but multiple stars may be 5 assigned to the same node. Note that for any σ ∈ Σ , M A P ( σ ) uniquely identiﬁes a node v ∈ V . Mor e speciﬁcally , we assume that M A P assigns labels to the nodes in V : in case of a named node, the label is simply the node’ s identiﬁer; in case of anonymous nodes, the label is ∗ β , wher e β is the concatenation of the sorted indices of the stars whic h ar e merg ed into node ∗ β . W ith this deﬁnition, two topologies G 1 , G 2 ∈ G T dif fer if and only if they do not describe the identical (M A P -) labeled topology . W e will use this M A P function also for G 0 , i.e., we will write M A P ( σ ) to refer to a symbol σ ’ s corresponding node in G 0 . In the remainder of this paper , we will often assume that A X I O M 0 is giv en. Moreov er , note that A X I O M 3 is redundant. Therefore, in our proofs, we will not explicitly cov er A X I O M 0, and it is sufﬁcient to show that A X I O M 1 holds to prov e that A X I O M 3 is satisﬁed. Lemma 2.4. A X I O M 1 implies A X I O M 3. Pr oof. Let T be a trace set, and G ∈ G T . Let σ 1 , σ 2 , σ 3 s.t. ∃ T 1 , T 2 ∈ T with σ 1 ∈ T 1 , σ 3 ∈ T 2 and σ 2 ∈ T 1 ∩ T 2 . Let i = d T 1 ( σ 1 , σ 2 ) and j = d T 2 ( σ 1 , σ 3 ) . Since any inferrable topology G fulﬁlls A X I O M 1, there is a path π 1 of length at most i between the nodes corresponding to σ 1 and σ 2 in G and a path π 2 of length at most j between the nodes corresponding to σ 2 and σ 3 in G . The combined path can only be shorter , and hence the claim follows. 3 Inferrable T opologies What insights can be obtained from topology inference with minimal assumptions, i.e., with our axioms? Or what is the structure of the inferrable topology set G T ? W e ﬁrst make some general observ ations and then examine dif ferent graph metrics in more detail. 3.1 Basic Observations Although the generation of the entire topology set G T may be computationally hard, some instances of G T can be computed efﬁciently . The simplest possible inferrable topology is the so-called canonic graph G C : the topology which assumes that all stars in the traces refer to different anonymous nodes. In other words, if a trace set T contains n = |I D| named nodes and s stars, G C will contain | V ( G C ) | = N = n + s nodes. Deﬁnition 3.1 (Canonic Graph G C ) . The canonic graph is deﬁned by G C ( V C , E C ) where V C = Σ is the set of (anti-aliased) nodes appearing in T (where each star is consider ed a unique anonymous node) and where { σ 1 , σ 2 } ∈ E C ⇔ ∃ T ∈ T , T = ( . . . , σ 1 , σ 2 , . . . ) , i.e., σ 1 follows after σ 2 in some trace T ( σ 1 , σ 2 ∈ T can be either non-anonymous nodes or stars). Let d C ( σ 1 , σ 2 ) denote the canonic distance between two nodes, i.e., the length of a shortest path in G C between the nodes σ 1 and σ 2 . Note that G C is indeed an inferrable topology . In this case, M A P : Σ → Σ is the identity function. The proof appears in the appendix. Theorem 3.2. G C is inferrable fr om T . G C can be computed efﬁciently from T : represent each non-anonymous node and star as a separate node, and for any pair of consecuti ve entries (i.e., nodes) in a trace, add the corresponding link. The time complexity of this construction is linear in the size of T . W ith the deﬁnition of the canonic graph, we can deriv e the following lemma which establishes a necessary condition when two stars cannot represent the same node in G 0 from constraints on the routing paths. This is useful for the characterization of inferred topologies. Lemma 3.3. Let ∗ 1 , ∗ 2 be two stars occurring in some traces in T . ∗ 1 , ∗ 2 cannot be mapped to the same node , i.e., M A P ( ∗ 1 ) 6 = M A P ( ∗ 2 ) , without violating the axioms in the following conﬂict situations: 6 (i) if ∗ 1 ∈ T 1 and ∗ 2 ∈ T 2 , and T 1 describes a too long path between anonymous node M A P ( ∗ 1 ) and non- anonymous node u , i.e., d α · d T 1 ( ∗ 1 , u ) e > d C ( u, ∗ 2 ) . (ii) if ∗ 1 ∈ T 1 and ∗ 2 ∈ T 2 , and ther e exists a trace T that contains a path between two non-anonymous nodes u and v and d α · d T ( u, v ) e > d C ( u, ∗ 1 ) + d C ( v , ∗ 2 ) . Pr oof. The ﬁrst proof is by contradiction. Assume M A P ( ∗ 1 ) = M A P ( ∗ 2 ) represents the same node v of G 0 , and that d α · d T 1 ( v , u ) e > d C ( u, v ) . Then we know from A X I O M 2 that d C ( v , u ) ≥ d G 0 ( v , u ) ≥ d α · d T 1 ( u, v ) e > d C ( v , u ) , which yields the desired contradiction. Similarly for the second proof. Assume for the sake of contradiction that M A P ( ∗ 1 ) = M A P ( ∗ 2 ) represents the same node w of G 0 , and that d α · d T ( u, v ) e > d C ( u, w ) + d C ( v , w ) . Due to the triangle inequality , we hav e that d C ( u, w ) + d C ( v , w ) ≥ d C ( u, v ) and hence, d α · d T ( u, v ) e > d C ( u, v ) , which contradicts the fact that G C is inferrable (Theorem 3.2). Lemma 3.3 can be applied to show that a topology is not inferrable from a giv en trace set because it merges (i.e., maps to the same node) two stars in a manner that violates the axioms. Let us introduce a useful concept for our analysis: the star graph that describes the conﬂicts between stars. Deﬁnition 3.4 (Star Graph G ∗ ) . The star graph G ∗ ( V ∗ , E ∗ ) consists of vertices V ∗ r epr esenting stars in traces, i.e., V ∗ = S i ∗ i . T wo vertices are connected if and only if they must differ accor ding to Lemma 3.3, i.e., {∗ 1 , ∗ 2 } ∈ E ∗ if and only if at least one of the conditions of Lemma 3.3 hold for ∗ 1 , ∗ 2 . Note that the star graph G ∗ is unique and can be computed efﬁciently for a given trace set T : Conditions (i) and (ii) can be checked by computing G C . Howe ver , note that while G ∗ speciﬁes some stars which cannot be merged, the construction is not sufﬁcient: as Lemma 3.3 is based on G C , additional links might be needed to characterize the set of inferrable and α -consistent topologies G T exactly . In other words, a topology G obtained by merging stars that are adjacent in G ∗ is ne ver inferrable ( G 6∈ G T ); howe ver , merging non-adjacent stars does not guarantee that the resulting topology is inferrable. What do star graphs look like? The answer is arbitr arily : the follo wing lemma states that the set of possible star graphs is equi valent to the class of general graphs. This claim holds for any α . The proof appears in the appendix. Lemma 3.5. F or any graph G = ( V , E ) , ther e exists a tr ace set T such that G is the star graph for T . The problem of computing inferrable topologies is related to the vertex colorings of the star graphs. W e will use the following deﬁnition which relates a v ertex coloring of G ∗ to an inferrable topology G by contracting independent stars in G ∗ to become one anonymous node in G . For example, observe that a maximum coloring treating e very star in the trace as a separate anonymous node describes the inferrable topology G C . Deﬁnition 3.6 (Coloring-Induced Graph) . Let γ denote a coloring of G ∗ which assigns colors 1 , . . . , k to the vertices of G ∗ : γ : V ∗ → { 1 , . . . , k } . W e requir e that γ is a pr oper coloring of G ∗ , i.e., that differ ent anonymous nodes ar e assigned differ ent colors: { u, v } ∈ E ∗ ⇒ γ ( u ) 6 = γ ( v ) . G γ is deﬁned as the topology induced by γ . G γ describes the graph G C wher e nodes of the same color ar e contracted: two vertices u and v r epr esent the same node in G γ , i.e., M A P ( ∗ i ) = M A P ( ∗ j ) , if and only if γ ( ∗ i ) = γ ( ∗ j ) . The following two lemmas establish an intriguing relationship between colorings of G ∗ and inferrable topolo- gies. Also note that Deﬁnition 3.6 implies that two different colorings of G ∗ deﬁne two non-isomorphic inferrable topologies. W e ﬁrst sho w that while a coloring-induced topology al ways fulﬁlls A X I O M 1, the routing consistency is sacriﬁced. The proof appears in the appendix. Lemma 3.7. Let γ be a pr oper coloring of G ∗ . The coloring induced topolo gy G γ is a topology fulﬁlling A X I O M 2 with a r outing consistency of α 0 , for some positive α 0 . An inferrable topology always deﬁnes a proper coloring on G ∗ . 7 Lemma 3.8. Let T be a trace set and G ∗ its corr esponding star graph. If a topology G is inferrable fr om T , then G induces a pr oper coloring on G ∗ . Pr oof. For any α -consistent inferrable topology G there exists some mapping function M A P that assigns each symbol of T to a corresponding node in G (cf Deﬁnition 2.3), and this mapping function giv es a coloring on G ∗ (i.e., merged stars appear as nodes of the same color in G ∗ ). The coloring must be proper: due to Lemma 3.3, an inferrable topology can ne ver merge adjacent nodes of G ∗ . The colorings of G ∗ allo w us to deriv e an upper bound on the cardinality of G T . Theorem 3.9. Given a trace set T sampled fr om a network G 0 and G T , the set of topologies inferrable fr om T , it holds that: | V ∗ | X k = γ ( G ∗ ) P ( G ∗ , k ) /k ! ≥ |G T | , wher e γ ( G ∗ ) is the chr omatic number of G ∗ and P ( G ∗ , k ) is the number of colorings of G ∗ with k colors (known as the chromatic polynomial of G ∗ ). Pr oof. The proof follows directly from Lemma 3.8 which shows that each inferred topology has proper colorings, and the fact that a coloring of G ∗ cannot result in two different inferred topologies, as the coloring uniquely describes which stars to merge (Lemma 3.7). In order to account for isomorphic colorings, we need to divide by the number of color permutations. Note that the fact that G ∗ can be an arbitrary graph (Lemma 3.5) implies that we cannot exploit some special properties of G ∗ to compute colorings of G ∗ and γ ( G ∗ ) . Also note that the exact computation of the upper bound is hard, since the minimal coloring as well as the chromatic polynomial of G ∗ (in P ] ) is needed. T o complement the upper bound, we note that star graphs with a small number of conﬂict edges can indeed result in a large number of inferred topologies. Theorem 3.10. F or any α > 0 , ther e is a trace set for whic h the number of non-isomorphic colorings of G ∗ equals |G T | , in particular |G T | = B s , where G T is the set of inferrable and α -consistent topologies, s is the number of stars in T , and B s is the Bell number of s . Such a trace set can originate fr om a G 0 network with one anonymous node only . Pr oof. Consider a trace set T = { ( σ i , ∗ i , σ 0 i ) i =1 ,...,s } (e.g., obtained from exploring a topology G 0 where one anonymous center node is connected to 2 s named nodes). The trace set does not impose any constraints on ho w the stars relate to each other , and hence, G ∗ does not contain any edges at all; e ven when stars are mer ged, there are no constraints on how the stars relate to each other . Therefore, the star graph for T has B s = P s j =0 S ( s,j ) colorings, where S ( s,j ) = 1 /j ! · P j ` =0 ( − 1) `  j `  ( j − ` ) s is the number of ways to group s nodes into j dif ferent, disjoint non-empty subsets (known as the Stirling number of the second kind ). Each of these colorings also describes a distinct inferrable topology as M A P assigns unique labels to anonymous nodes stemming from mer ging a group of stars (cf Deﬁnition 2.3). 3.2 Properties Even if the number of inferrable topologies is large, topology inference can still be useful if one is mainly interested in the properties of G 0 and if the ensemble G T is homogenous with respect to these properties; for example, if “most” of the instances in G T are close to G 0 , there may be an option to conduct an efﬁcient sampling analysis on random representativ es. Therefore, in the following, we will take a closer look ho w much the members of G T dif fer . Important metrics to characterize inferrable topologies are, for instance, the graph size, the diameter D I A M ( · ) , the number of triangles C 3 ( · ) of G , and so on. In the follo wing, let G 1 = ( V 1 , E 1 ) , G 2 = ( V 2 , E 2 ) ∈ G T be two arbitrary representati ves of G T . As one might expect, the graph size can be estimated quite well. 8 Lemma 3.11. It holds that | V 1 | − | V 2 | ≤ s − γ ( G ∗ ) ≤ s − 1 and | V 1 | / | V 2 | ≤ ( n + s ) / ( n + γ ( G ∗ )) ≤ (2 + s ) / 3 . Mor eover , | E 1 | − | E 2 | ≤ 2( s − γ ( G ∗ )) and | E 1 | / | E 2 | ≤ ( ν + 2 s ) / ( ν + 2) ≤ s , where ν denotes the number of edges between non-anonymous nodes. There ar e traces with inferrable topolo gy G 1 , G 2 r eaching these bounds. Observe that inferrable topologies can also differ in the number of connected components. This implies that the shortest distance between two named nodes can dif fer arbitrarily between two representati ves in G T . Lemma 3.12. Let C O M P ( G ) denote the number of connected components of a topology G . Then, | C O M P ( G 1 ) − C O M P ( G 2 ) | ≤ n/ 2 . Ther e ar e instances G 1 , G 2 that r each this bound. Pr oof. Consider the trace set T = { T i , i = 1 . . . b n/ 2 c} in which T i = { n 2 i , ∗ i , n 2 i +1 } . Since i 6 = j ⇒ T i ∩ T j = ∅ , we have | E ∗ | = 0 . T ake G 1 as the 1 -coloring of G ∗ : G 1 is a topology with one anonymous node connected to all named nodes. T ake G 2 as the b n/ 2 c -coloring of the star graph: G 2 has b n/ 2 c distinct connected components (consisting of three nodes). Upper bound: For the sake of contradiction, suppose ∃T s.t. | C O M P ( G 1 ) − C O M P ( G 2 ) | > b n/ 2 c . Let us assume that G 1 has the most connected components: G 1 has at least b n/ 2 c + 1 more connected components than G 2 . Let C refer to a connected component of G 2 whose nodes are not connected in G 1 . This means that C contains at least one anonymous node. Thus, C contains at least two named nodes (since a trace T cannot start or end by a star). There must exist at least b n/ 2 c + 1 such connected component C . Thus G 2 has to contain at least 2( b n/ 2 c + 1) ≥ n + 1 named nodes. Contradiction. An important criterion for topology inference regards the distortion of shortest paths. Deﬁnition 3.13 (Stretch) . The maximal ratio of the distance of two non-anonymous nodes in G 0 and a connected topology G is called the stretch ρ : ρ = max u,v ∈I D ( G 0 ) max { d G 0 ( u, v ) /d G ( u, v ) , d G ( u, v ) /d G 0 ( u, v ) } . From Lemma 3.12 we already know that inferrable topologies can dif fer in the number of connected com- ponents, and hence, the distance and the stretch between nodes can be arbitrarily wrong. Hence, in the follo wing, we will focus on connected graphs only . Ho wev er , ev en if two nodes are connected, their dis- tance can be much longer or shorter than in G 0 . Figure 2 gives an example. Both topologies are in- ferrable from the traces T 1 = ( v , ∗ , v 1 , . . . , v k , u ) and T 2 = ( w , ∗ , w 1 , . . . , w k , u ) . One inferrable topology is the canonic graph G C (Figure 2 left ), whereas the other topology merges the two anon ymous nodes (Fig- ure 2 right ). The distances between v and w are 2( k + 2) and 2 , respecti vely , implying a stretch of k + 2 . u v w k k 1 1 * 1 * 2 u v w k k 1 1 * Figure 2: Due to the lack of a trace between v and w , the stretch of an inferred topology can be large. Lemma 3.14. Let u and v be two arbitrary named nodes in the connected topologies G 1 and G 2 . Then, even for only two stars in the trace set, it holds for the str etch that ρ ≤ ( N − 1) / 2 . Ther e ar e instances G 1 , G 2 that r each this bound. W e now turn our attention to the diameter and the de gree. Lemma 3.15. F or connected topologies G 1 , G 2 it holds that D I A M ( G 1 ) − D I A M ( G 2 ) ≤ ( s − 1) /s · D I A M ( G C ) ≤ ( s − 1)( N − 1) /s and D I A M ( G 1 ) / D I A M ( G 2 ) ≤ s , wher e D I A M de- notes the graph diameter and D I A M ( G 1 ) > D I A M ( G 2 ) . Ther e ar e instances G 1 , G 2 that r each these bounds. Pr oof. Upper bound: As G C does not mer ge an y stars, it describes the network with the largest diameter . Let π be a longest path between two nodes u and v in G C . In the extreme case, π is the only path determining the network diameter and π contains all star nodes. Then, the graph where all s stars are merged into one anonymous node has a minimal diameter of at least D I A M ( G C ) /s . Example meeting the bound: Consider the trace set T = { ( u 1 , . . . , ∗ 1 , . . . , u 2 ) , ( u 2 , . . . , ∗ 2 , . . . , u 3 ) , . . . , ( u s , . . . , ∗ s , . . . , u s +1 ) } with x named nodes and star in the middle between u i and u i +1 (assume x to be ev en, x 9 does not include u i and u i +1 ). It holds that D I A M ( G C ) = s · ( x + 2) whereas in a graph G where all stars are merged, D I A M ( G ) = x + 2 . There are n = s ( x + 1) non-anonymous nodes, so x = ( n − s − 1) /s . Figure 3 depicts an example. Lemma 3.16. F or the maximal node de gr ee D E G , we have D E G ( G 1 ) − D E G ( G 2 ) ≤ 2( s − γ ( G ∗ )) and D E G ( G 1 ) / D E G ( G 2 ) ≤ s − γ ( G ∗ ) + 1 . There ar e instances G 1 , G 2 that r each these bounds. u 1 G C : x/2 * u 2 x/2 * x/2 u 3 x/2 u 3 x/2 u 1 x/2 * u 2 x/2 G: Figure 3: Estimation error f or diameter . Another important topology measure that indicates ho w well meshed the network is, is the number of tri- angles. Lemma 3.17. Let C 3 ( G ) be the number of cycles of length 3 of the graph G . It holds that C 3 ( G 1 ) − C 3 ( G 2 ) ≤ 2 s ( s − 1) , which can be r eached. The r ela- tive err or C 3 ( G 1 ) /C 3 ( G 2 ) can be arbitrarily lar ge un- less the number of links between non-anonymous nodes exceeds n 2 / 4 in whic h case the ratio is upper bounded by 2 s ( s − 1) + 1 . Pr oof. Upper bound: Each node which is part of a triangle has at least two incident edges. Thus, a node v can be part of at most  D E G ( v ) 2  triangles, where D E G ( v ) denotes v ’ s degree. As a consequence the number of triangles containing an anonymous node in an inferrable topology with a anonymous nodes u 1 , . . . u a is at most P a j =1  D E G ( u j ) 2  . Gi ven s , this sum is maximized if a = 1 and D E G ( u 1 ) = 2 s as 2 s is the maximum degree possible due to Lemma 3.16. Thus there can be at most s · (2 s − 1) triangles containing an anonymous node in G 1 . The number of triangles with at least one anonymous node is minimized in G C because in the canonic graph the degrees of the anonymous nodes are minimized, i.e, they are always exactly two. As a consequence there cannot be more than s such triangles in G C . If the number of such triangles in G C is smaller by x , then the number of of triangles with at least one anony- mous node in the topology G 1 is upper bounded by s · (2 s − 1) − x . The difference between the triangles in G 1 and G 2 is thus at most s (2 s − 1) − x − s + x = 2 s ( s − 1) . Example meeting this bound: If the non-anonymous nodes form a complete graph and all star nodes can be merged into one node in G 1 and G 2 = G C , then the difference in the number of triangles matches the upper bound. Consequently it holds for the ratio of triangles with anonymous nodes that it does not exceed ( s (2 s − 1) − x ) / ( s − x ) . Thus the ratio can be inﬁnite, as x can reach s . Howe ver , if the number of links between n non-anonymous nodes exceeds n 2 / 4 then there is at least one triangle, as the densest complete bipartite graph contains at most n 2 / 4 links. 4 Full Exploration So far , we assumed that the trace set T contains each node and link of G 0 at least once. At ﬁrst sight, this seems to be the best we can hope for . Ho wev er , sometimes traces exploring the vicinity of anonymous nodes in different ways yields additional information that help to characterize G T better . This section introduces the concept of fully explor ed networks : T contains sufﬁciently many traces such that the distances between non-anonymous nodes can be estimated accurately . Deﬁnition 4.1 (Fully Explored T opologies) . A topology G 0 is fully e xplor ed by a trace set T if it contains all nodes and links of G 0 and for each pair { u, v } of non-anonymous nodes in t he same component of G 0 ther e e xists a trace T ∈ T containing both nodes u ∈ T and v ∈ T . In some sense, a trace set for a fully explored network is the best we can hope for . Properties that cannot be inferred well under the fully e xplored topology model are infeasible to infer without additional assumptions on G 0 . In this sense, this section provides upper bounds on what can be learned from topology inference. In the follo wing, we will constrain ourselves to routing along shortest paths only ( α = 1 ). 10 Let us ag ain study the properties of the family of inferrable topologies fully e xplored by a trace set. Obviously , all the upper bounds from Section 3 are still valid for fully explored topologies. In the following, let G 1 , G 2 ∈ G T be arbitrary representativ es of G T for a fully explored trace set T . A direct consequence of the Deﬁnition 4.1 concerns the number of connected components and the stretch. (Recall that the stretch is deﬁned with respect to named nodes only , and since α = 1 , a 1-consistent inferrable topology cannot include a shorter path between u and v than the one that must appear in a trace of T .) Lemma 4.2. It holds that C O M P ( G 1 ) = C O M P ( G 2 ) ( = C O M P ( G 0 ) ) and the str etch is 1. The proof for the claims of the following lemmata are analogous to our former proofs, as the main difference is the fact that there might be more conﬂicts, i.e., edges in G ∗ . Lemma 4.3. F or fully explor ed networks it holds that | V 1 | − | V 2 | ≤ s − γ ( G ∗ ) ≤ s − 1 and | V 1 | / | V 2 | ≤ ( n + s ) / ( n + γ ( G ∗ )) ≤ (2 + s ) / 3 . Moreo ver , | E 1 | − | E 2 | ∈ 2( s − γ ( G ∗ )) and | E 1 | / | E 2 | ≤ ( ν + 2 s ) / ( ν + 2) ≤ s , wher e ν denotes the number of links between non-anonymous nodes. Ther e ar e traces with inferrable topology G 1 , G 2 r eaching these bounds. Lemma 4.4. F or the maximal node de gr ee, we have D E G ( G 1 ) − D E G ( G 2 ) ≤ 2( s − γ ( G ∗ )) and D E G ( G 1 ) / D E G ( G 2 ) ≤ s − γ ( G ∗ ) + 1 . There ar e instances G 1 , G 2 that r each these bounds. From Lemma 4.2 we kno w that fully explored scenarios yield a perfect stretch of one. Ho wev er , regarding the diameter , the situation is dif ferent in the sense that distances between anonymous nodes play a role. Lemma 4.5. F or connected topologies G 1 , G 2 it holds that D I A M ( G 1 ) / D I A M ( G 2 ) ≤ 2 , wher e D I A M denotes the graph diameter and D I A M ( G 1 ) > D I A M ( G 2 ) . Ther e ar e instances G 1 , G 2 that r each this bound. Mor eover , there ar e instances with D I A M ( G 1 ) − D I A M ( G 2 ) = s/ 2 . The number of triangles with anonymous nodes can still not be estimated accurately in the fully explored scenario. Lemma 4.6. There exist graphs where C 3 ( G 1 ) − C 3 ( G 2 ) = s ( s − 1) / 2 , and the relative err or C 3 ( G 1 ) /C 3 ( G 2 ) can be arbitrarily lar ge. 5 Conclusion W e understand our work as a ﬁrst step to shed light onto the similarity of inferrable topologies based on most basic axioms and without any assumptions on po wer-la w properties, i.e., in the worst case. Using our formal framework we show that the topologies for a giv en trace set may dif fer signiﬁcantly . Thus, it is impossible to accurately characterize topological properties of complex networks. T o complement the general analysis, we propose the notion of fully explored networks or trace sets, as a “best possible scenario”. As expected, we ﬁnd that fully exploring traces allow us to determine se veral properties of the network more accurately; howe ver , it also turns out that ev en in this scenario, other topological properties are inherently hard to compute. Our results are summarized in Figure 4. Our work opens several directions for future research. On a theoretical side, one may study whether the minimal inferrable topologies considered in, e.g., [1, 2], are more similar in nature. More importantly , while this paper presented results for the general worst-case, it would be interesting to de vise algorithms that compute, for a given trace set , worst-case bounds for the properties under consideration. For example, such approximate bounds would be helpful to decide whether additional measurements are needed. Moreover , maybe such algorithms may even gi ve advice on the locations at which such measurements would be most useful. Acknowledgments W e would like to thank H. B. Acharya and Ste ve Uhlig. 11 Property/Scenario Arbitrary Fully Explored ( α = 1 ) G 1 − G 2 G 1 /G 2 G 1 − G 2 G 1 /G 2 # of nodes ≤ s − γ ( G ∗ ) ≤ ( n + s ) / ( n + γ ( G ∗ )) ≤ s − γ ( G ∗ ) ≤ ( n + s ) / ( n + γ ( G ∗ )) # of links ≤ 2( s − γ ( G ∗ )) ≤ ( ν + 2 s ) / ( ν + 2) ≤ 2( s − γ ( G ∗ )) ≤ ( ν + 2 s ) / ( ν + 2) # of connected components ≤ n/ 2 ≤ n/ 2 = 0 = 1 Stretch - ≤ ( N − 1) / 2 - = 1 Diameter ≤ ( s − 1) /s · ( N − 1) ≤ s s/ 2 ( ¶ ) 2 Max. Deg. ≤ 2( s − γ ( G ∗ )) ≤ s − γ ( G ∗ ) + 1 ≤ 2( s − γ ( G ∗ )) ≤ s − γ ( G ∗ ) + 1 T riangles ≤ 2 s ( s − 1) ∞ ≤ 2 s ( s − 1) / 2 ∞ Figure 4: Summary of our bounds on the properties of inferrable topologies. s denotes the number of stars in the traces, n is the number of named nodes, N = n + s , and ν denotes the number of links between named nodes. Note that trace sets meeting these bounds exist f or all properties f or which we ha ve tight or upper bounds. For the two entries marked with ( ¶ ), only “lower bounds” are derived, i.e., examples that yield at least the corresponding accuracy; as the upper bounds from the arbitrary scenario do not match, how to close the gap remains an open question. Refer ences [1] H. Acharya and M. Gouda. The weak network tracing problem. In Pr oc. Int. Confer ence on Distributed Computing and Networking (ICDCN) , pages 184–194, 2010. [2] H. Acharya and M. Gouda. On the hardness of topology inference. In Pr oc. Int. Confer ence on Distributed Computing and Networking (ICDCN) , pages 251–262, 2011. [3] Hrishikesh B. Acharya and Mohamed G. Gouda. A theory of network tracing. In Pr oc. 11th International Symposium on Stabilization, Safety , and Security of Distributed Systems (SSS) , pages 62–74, 2009. [4] Animashree Anandkumar, A vinatan Hassidim, and Jonathan K elner . T opology discovery of sparse random graphs with fe w participants. In Pr oc. SIGMETRICS , 2011. [5] Brice Augustin, Xavier Cuvellier , Benjamin Orgogozo, F abien V iger, Timur Friedman, Matthieu Latapy , Cl ´ emence Magnien, and Renata T eixeira. A voiding traceroute anomalies with paris traceroute. In Pr oc. 6th A CM SIGCOMM Confer ence on Internet Measur ement (IMC) , pages 153–158, 2006. [6] Mark Buchanan. Data-bots chart the internet. Science , 813(3), 2005. [7] Bill Cheswick, Hal Burch, and Stev e Branigan. Mapping and visualizing the internet. In Pr oc. USENIX Annual T echnical Confer ence (A TEC) , 2000. [8] Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power -law relationships of the internet topology . In Pr oc. SIGCOMM , pages 251–262, 1999. [9] M. Gunes and K. Sarac. Resolving anonymous routers in internet topology measurement studies. In Pr oc. INFOCOM , 2008. [10] Xing Jin, W .-P .K. Y iu, S.-H.G. Chan, and Y ajun W ang. Network topology inference based on end-to-end measurements. IEEE Journal on Selected Ar eas in Communications , 24(12):2182 –2195, 2006. [11] Craig Labovitz, Abha Ahuja, Sriniv asan V enkatachary , and Roger W attenhofer . The impact of internet policy and topology on delayed routing conv ergence. In Pr oc. 20th Annual Joint Confer ence of the IEEE Computer and Communications Societies (INFOCOM) , 2001. [12] S. Paul, K. K. Sabnani, J. C. Lin, and S. Bhattacharyya. Reliable multicast transport protocol (rmtp). IEEE J ournal on Selected Ar eas in Communications , 5(3), 1997. 12 [13] Ingmar Poese, Benjamin Frank, Bernhard Ager , Georgios Smaragdakis, and Anja Feldmann. Improving content deli very using provider -aided distance information. In Pr oc. A CM IMC , 2010. [14] H. T angmunarunkit, R. Govindan, S. Shenker , and D. Estrin. The impact of routing policy on internet paths. In Pr oc. INFOCOM , volume 2, pages 736–742, 2002. [15] Bin Y ao, Ramesh V iswanathan, Fangzhe Chang, and Daniel W addington. T opology inference in the presence of anonymous routers. In Pr oc. IEEE INFOCOM , pages 353–363, 2003. A Deferr ed Proofs A.1 Proof of Theor em 3.2 Fix T . W e have to pro ve that G C fulﬁlls A X I O M 0, A X I O M 1 (which implies A X I O M 3) and A X I O M 2. A X I O M 0: The axiom holds trivially: only edges from the traces are used in G C . A X I O M 1: Let T ∈ T and σ 1 , σ 2 ∈ T . Let k = d T ( σ 1 , σ 2 ) . W e show that G C fulﬁlls A X I O M 1, namely , there exists a path of length k in G C . Induction on k : ( k = 1 :) By the deﬁnition of G C , { σ 1 , σ 2 } ∈ E C thus there exists a path of length one between σ 1 and σ 2 . ( k > 1 :) Suppose A X I O M 1 holds up to k − 1 . Let σ 0 1 , . . . , σ 0 k − 1 be the intermediary nodes between σ 1 and σ 2 in T : T = ( . . . , σ 1 , σ 0 1 , . . . , σ 0 k − 1 , σ 2 , . . . ) . By the induction hypothesis, in G C there is a path of length k − 1 between σ 1 and σ 0 k − 1 . Let π be this path. By deﬁnition of G C , { σ 0 k − 1 , σ 2 } ∈ E C . Thus appending ( σ 0 k − 1 , σ 2 ) to π yields the desired path of length k linking σ 1 and σ 2 : A X I O M 1 thus holds up to k . A X I O M 2: W e have to sho w that d T ( σ 1 , σ 2 ) = k ⇒ d C ( σ 1 , σ 2 ) ≥ d α · k e . By contradiction, suppose that G C does not fulﬁll A X I O M 2 with respect to α . So there exists k 0 < d α · k e and σ 1 , σ 2 ∈ V C such that d C ( σ 1 , σ 2 ) = k 0 . Let π be a shortest path between σ 1 and σ 2 in G C . Let ( T 1 , . . . , T ` ) be the corresponding (maybe repeating) traces cov ering this path π in G C . Let T i ∈ ( T 1 , . . . , T ` ) , and let s i and e i be the corresponding start and end nodes of π in T i . W e will show that this path π implies the existence of a path in G 0 which violates α -consistency . Since G 0 is inferrable, G 0 fulﬁlls A X I O M 2, thus we have: d C ( σ 1 , σ 2 ) = P ` i =1 d T i ( s i , e i ) = k 0 < d α · k e ≤ d G 0 ( σ 1 , σ 2 ) since G 0 is α -consistent. Ho wev er , G 0 also fulﬁlls A X I O M 1, thus d T i ( s i , e i ) ≥ d G 0 ( s i , e i ) . Thus P ` i =1 d G 0 ( s i , e i ) ≤ P ` i =1 d T i ( s i , e i ) < d G 0 ( σ 1 , σ 2 ) : we hav e constructed a path from σ 1 to σ 2 in G 0 whose length is shorter than the distance between σ 1 and σ 2 in G 0 , leading to the desired contradiction. A.2 Proof of Lemma 3.5 First we construct a topology G 0 = ( V 0 , E 0 ) and then describe a trace set on this graph that generates the star graph G = ( V , E ) . The node set V 0 consists of | V | anonymous nodes and | V | · (1 + τ ) named nodes, where τ = d 3 / (2 α ) − 1 / 2 e . The ﬁrst building block of G 0 is a copy of G . T o each node v i in the copy of G we add a chain consisting of 2 + τ nodes, ﬁrst appending τ non-anonymous nodes w ( i,k ) where 1 ≤ k ≤ τ , followed by an anonymous node u i and ﬁnally a named node w ( i,τ +1) . More formally we can describe the link set as E 0 = E ∪ S | V | i =1  { v i , w ( i, 1) } , { w ( i, 1) , w ( i, 2) } , . . . , { w ( i,τ ) , u i } , { u i , w ( i,τ +1) }  . The trace set T consists of the follo wing | V | + | E | shortest path traces: the traces T ` for ` ∈ { 1 , . . . , | V |} , are giv en by T ` ( w ( `,τ ) , w ( `,τ +1) ) (for each node in V ), and the traces T ` for ` ∈ {| V | + 1 , . . . , | V | + | E |} , are gi ven by T ` ( w ( i,τ ) , w ( j,τ ) ) for each link { v i , v j } in E . Note that G 0 = G C as each star appears as a separate anonymous node. The star graph G ∗ corresponding to this trace set contains the | V | nodes ∗ i (corresponding to u i ). In order to prove the claim of the lemma we have to show that two nodes ∗ i , ∗ j are conﬂicting according to Lemma 3.3 if and only if there is a link { v i , v j } in E . Case ( i ) does not apply because the minimum distance between an y two nodes in the canonic graph is at least one, and d α · d T i ( ∗ i , w ( i,τ ) ) e = 1 and d α · d T i ( ∗ i , w ( i,τ +1) ) e = 1 . It remains to examine Case ( ii ) : “ ⇒ ” if M A P ( ∗ i ) = M A P ( ∗ j ) there would be a path of length two between w ( i,τ ) and w ( j,τ ) in the topology generated by M A P ; the trace set howe ver contains a trace T ` ( w ( i,τ ) , w ( j,τ ) ) of length 2 τ + 1 . So d α · d T ` ( w ( i,τ ) , w ( j,τ ) ) e = d α · (2 τ + 1) e = d α · (2 d 3 / (2 α ) − 1 / 2 e + 1 e ) ≥ 3 , which violates the α -consistency (Lemma 3.3 (ii)) and hence {∗ i , ∗ j } ∈ E ∗ and { v i , v j } ∈ E . “ ⇐ ”: if { v i , v j } 6∈ E , there is no trace T ( w ( i,τ ) , w ( j,τ ) ) , thus we hav e to prove that no trace T ` ( w ( i 0 ,τ ) , w ( j 0 ,τ ) ) with i 0 6 = i and j 0 6 = j and j 0 6 = i leads to a conﬂict between ∗ i and 13 ∗ j . W e show that an ev en more general statement is true, namely that for any pair of distinct non-anonymous nodes x 1 , x 2 , where x 1 , x 2 ∈ { v i 0 , v j 0 , w ( i 0 ,k ) , w ( j 0 ,k ) | 1 ≤ k ≤ τ + 1 , i 0 6 = i, j 0 6 = i, j 0 6 = j } , it holds that d α · d C ( x 1 , x 2 ) e ≤ d C ( x 1 , ∗ i ) + d C ( x 2 , ∗ j ) . Since G C = G 0 and the traces contain shortest paths only , the trace distance between two nodes in the same trace is the same as the distance in G C . The following tables contain the rele vant lo wer bounds on distances in G C and µ ( x 1 , x 2 ) = d C ( x 1 , ∗ i ) + d C ( x 2 , ∗ j ) . d C ( · , · ) ≥ v i 0 v j 0 w ( i 0 ,k 1 ) w ( j 0 ,k 1 ) v i 0 0 1 k 1 k 1 + 1 v j 0 1 0 k 1 + 1 k 1 w ( i 0 ,k 2 ) k 2 k 2 + 1 | k 2 − k 1 | k 1 + 1 + k 2 w ( j 0 ,k 2 ) k 2 + 1 k 2 k 1 + 1 + k 2 | k 2 − k 1 | ∗ i τ + 2 τ + 1 2 + τ + k 1 τ − k 1 + 1 ∗ j τ + 2 τ + 2 2 + τ + k 1 2 + τ + k 1 µ ( · , · ) ≥ v i 0 v j 0 w ( i 0 ,k 1 ) w ( j 0 ,k 1 ) v i 0 2 τ + 4 2 τ + 3 4 + 2 τ + k 1 4 + 2 τ + k 1 v j 0 2 τ + 3 2 τ + 4 2 τ + 3 + k 1 3 + 2 τ + k 1 w ( i 0 ,k 2 ) 4 + 2 τ + k 2 4 + 2 τ + k 2 4 + 2 τ + k 1 + k 2 4 + 2 τ + k 1 + k 2 w ( j 0 ,k 2 ) 2 τ − k 2 + 3 2 τ − k 2 + 3 2 τ + 3 + k 1 − k 2 2 τ + k 1 − k 2 + 3 T able 1: Proof of Lemma 3.5: lower bounds for the distances in G C , and lower bounds for µ ( x 1 , x 2 ) = d C ( x 1 , ∗ i ) + d C ( x 2 , ∗ j ) . If x 1 6 = w ( j 0 ,k 2 ) then it holds for all x 1 , x 2 that d T ` ( x 1 , x 2 ) ≤ 2 τ + 1 whereas µ ( x 1 , x 2 ) = d C ( x 1 , ∗ i ) + d C ( x 2 , ∗ j ) ≥ 2 τ + 2 . In all other cases it holds at least that d C ( x 1 , x 2 ) < µ ( x 1 , x 2 ) . Thus d α · d C ( x 1 , x 2 ) e ≤ d C ( x 1 , ∗ i ) + d C ( x 2 , ∗ j ) . Consequently , we hav e conﬂicts if and only if { v i , v j } ∈ E , which concludes the proof. A.3 Proof of Lemma 3.7 s k k k T 1 * 1 t m 1 * 2 * 3 * 4 m 2 m 3 1 k‘ k‘ k‘ k‘ T 3 T 3 T 4 T 5 T 2 Figure 5: V isualization f or proof of Lemma 3.7. Solid lines denote links, dashed lines denote paths (of annotated length). W e have to sho w that the paths in the traces correspond to paths in G γ . Let T ∈ T , and σ 1 , σ 2 ∈ T . Let π be the sequence of nodes in T connecting σ 1 and σ 2 . This is also a path in G γ : since α > 0 , for any two symbols σ 1 , σ 2 ∈ T , it holds that M A P ( σ 1 ) 6 = M A P ( σ 2 ) as α > 0 . W e now construct an example showing that the α 0 for which G γ fulﬁlls A X I O M 2 can be arbitrarily small. Con- sider the graph represented in Figure 5. Let T 1 = ( s, . . . , t ) , T 2 = ( s, ∗ 1 , . . . , m 1 ) , T 3 = ( m 1 , . . . , ∗ 2 , m 2 ) , T 4 = ( m 2 , ∗ 3 , . . . , m 3 ) , T 5 = ( m 3 , . . . , ∗ 4 , t ) . W e assume α = 1 . By changing parameters k = d C ( s, t ) and k 0 = d C ( m 1 , ∗ 1 ) = d C ( m 1 , ∗ 2 ) = d C ( m 3 , ∗ 3 ) = d C ( m 3 , ∗ 4 ) , we can modulate the links of the corresponding star graph G ∗ . Using d T 1 ( s, t ) = k , observe that k > 2 ⇔ {∗ 1 , ∗ 4 } ∈ E ∗ . Similarly , k > 2( k 0 + 1) ⇔ {∗ 1 , ∗ 3 } ∈ E ∗ ∧ {∗ 2 , ∗ 4 } ∈ E ∗ and k > 2( k 0 + 2) ⇔ {∗ 1 , ∗ 2 } ∈ E ∗ ∧ {∗ 3 , ∗ 4 } ∈ E ∗ . T aking k = 2 k 0 + 4 , we thus have E ∗ = {{∗ 1 , ∗ 3 } , {∗ 2 , ∗ 4 } , {∗ 1 , ∗ 4 }} . Thus, we here construct a situation where ∗ 1 and ∗ 2 as well as ∗ 3 and ∗ 4 can be merged without breaking the consistency requirement, b ut where merging both simultaneously leads to a topology G 0 that is only 4 /k -consistent, since d G 0 ( s, t ) = 4 . This ratio can be made arbitrarily small provided we choose k 0 = ( k − 4) / 2 . A.4 Proof of Lemma 3.11 In the worst-case, each star in the trace represents a dif ferent node in G 1 , so the maximal number of nodes in any topology in G T is the total number of non-anonymous nodes plus the total number of stars in T . This number of 14 nodes is reached in the topology G C . According to Deﬁnition 3.4, only non-adjacent stars in G ∗ can represent the same node in an inferrable topology . Thus, the stars in trace T must originate from at least γ ( G ∗ ) dif ferent nodes. As a consequence | V 1 | − | V 2 | ≤ s − γ ( G ∗ ) , which can reach s − 1 for a trace set T = { T i = ( v , ∗ i , w ) | 1 ≤ i ≤ s } . Analogously , | V 1 | / | V 2 | ≤ ( n + s ) / ( n + γ ( G ∗ )) ≤ (2 + s ) / 3 . Observe that each occurrence of a node in a trace describes at most two edges. If all anonymous nodes are merged into γ ( G ∗ ) nodes in G 1 and are separate nodes in G 2 the difference in the number of edges is at most 2( s − γ ( G ∗ )) . Analogously , | E 1 | / | E 2 | ≤ ( ν + 2 s ) / ( ν + 2) ≤ s . The trace set T = { T i = ( v , ∗ i , w ) | 1 ≤ i ≤ s } reaches this bound. A.5 Proof of Lemma 3.14 An “lower bound” example follows from Figure 2. Essentially , this is also the worst case: note that the difference in the shortest distance between a pair of nodes u and v in G 1 and G 2 is only greater than 0 if the shortest path between them in volv es at least one anonymous node. Hence the shortest distance between such a pair is two. The longest shortest distance between the same pair of nodes in another inferred topology visits all nodes in the network, i.e., its length is bounded by N − 1 . A.6 Proof of Lemma 3.16 Each occurrence of a node in a trace describes at most two links incident to this node. For the degree difference we only have to consider the links incident to at least one anonymous node, as the number of links between non- anonymous nodes is the same in G 1 and G 2 . If all anonymous nodes can be mer ged into γ ( G ∗ ) nodes in G 1 and all anonymous nodes are separate in G 2 the difference in the maximum degree is thus at most 2( s − γ ( G ∗ )) , as there can be at most s − γ ( G ∗ ) + 1 nodes mer ged into one node and the minimal maximum degree of a node in G 2 is two. This bound is tight, as the trace set T i = { v i , ∗ , w i } for 1 ≤ i ≤ s containing s stars can be represented by a graph with one anon ymous node of degree 2 s or by a graph with s anonymous nodes of degree two each. For the ratio of the maximal degree we can ignore links between non-anonymous nodes as well, as these only decrease the ratio. The highest number of links incident at node v with one endpoint in the set of anonymous nodes is s − γ ( G ∗ ) + 1 for non-anonymous nodes and 2( s − γ ( G ∗ ) + 1) for anon ymous nodes, whereas the lowest number is tw o. A.7 Proof of Lemma 4.4 The proof for the upper bound is analogous to the case without full exploration. T o prov e that this bound can be reached, we need to add traces to the trace set to ensure that all pairs of named nodes appear in the trace but does not change the de grees of anonymous nodes. T o this end we add a named node u for each pair { v , w } that is not in the trace set yet to G 0 and a trace T = { v , u, w } . This does not increase the maximum degree and guarantees full exploration. A.8 Proof of Lemma 4.5 W e ﬁrst prove the upper bound for the relativ e case. Note that the maximal distance between two anonymous nodes M A P ( ∗ 1 ) and M A P ( ∗ 2 ) in an inferred topology component cannot be larger than twice the distance of two named nodes u and v : from Deﬁnition 4.1 we kno w that there must be a trace in T connecting u and v , and the maximal distance δ of a pair of named nodes is given by the path of the trace that includes u and v . Therefore, and since any trace starts and ends with a named node, any star can be at a distance at a distance δ / 2 from a named node. Therefore, the maximal distance between M A P ( ∗ 1 ) and M A P ( ∗ 2 ) is δ / 2 + δ / 2 to get to the corresponding closest named nodes, plus δ for the connection between the named nodes. As according to Lemma 4.2, the distance between named nodes is the same in all inferred topologies, the diameter of inferred topologies can v ary at most by a factor of two. W e now construct an example that reaches this bound. Consider a topology consisting of a center node c and four rays of length k . Let u 1 , u 2 , u 3 , u 4 be the “end nodes” of each ray . W e assume that all these nodes are named. 15 No w add two chains of anonymous nodes of length 2 k + 1 between nodes u 1 and u 2 , and between nodes u 3 and u 4 to the topology . The trace set consists of the minimal trace set to obtain a fully explored topology: six traces of length 2 k + 1 between each pair of end nodes u 1 , u 2 , u 3 , u 4 . Now we add two traces of length 2 k + 1 between nodes u 1 and u 2 , and between nodes u 3 and u 4 . These traces explore the anon ymous chains and ha ve the follo wing shape: T 7 = ( u 1 , ∗ 1 , . . . , ∗ k , σ , ∗ k +1 , . . . , ∗ 2 k , u 2 ) and T 8 = ( u 3 , ∗ 2 k +1 , . . . , ∗ 3 k , σ 0 , ∗ 3 k +1 , . . . , ∗ 4 k , u 4 ) , where σ and σ 0 are stars. Let G 1 = G C and G 2 be the inferrable graph where σ and σ 0 are merged. The resulting diameters are D I A M ( G 1 ) = 4 k +2 and D I A M ( G 2 ) = 2 k +1 . Since s = 4 k +2 , the dif ference can thus be as large as s/ 2 . Note that this construction also yields the bound of the relati ve difference: D I A M ( G 1 ) / D I A M ( G 2 ) = (4 k + 2) / (2 k + 1) = 2 . A.9 Proof of Lemma 4.6 Gi ven the number of stars s , we construct a trace set T with tw o inferrable graphs such that in one graph the number of triangles with anonymous nodes is s ( s − 1) / 2 and in the other graph there are no such triangles. As a ﬁrst step we add s traces T i = ( v i , ∗ i , w ) to the trace set T , where 1 ≤ i ≤ s . T o make this trace set fully explored we add traces for each pair v i , v j to T as a second step, i.e., traces T i,j = ( v i , v j ) for 1 ≤ i ≤ s and 1 ≤ j ≤ s . The resulting trace set contains s stars and none of the stars are in conﬂict with each other . Thus the graph G 1 merging all stars into one anonymous node is inferrable from this trace and the number of triangles where the anonymous node is part of is s ( s − 1) / 2 . Let G 2 be the canonic graph of this trace set. This graph does not contain an y triangles with anonymous nodes and hence the dif ference C ( G 1 ) − C ( G 2 ) is s ( s − 1) / 2 . T o see that the ratio can be unbounded look at the trace set { ( v , ∗ 1 , w ) , ( u, ∗ 2 , w ) , ( u, v ) } . This set is fully explored since all pairs of named nodes appear in a trace. The graph where the two stars are merged has one triangle and the canonic graph has no triangle. 16

Misleading Stars: What Cannot Be Measured in the Internet?

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment