Safety Verification of Deep Neural Networks

Safety V eriﬁcation of Deep Neural Networks ? Xiaowei Huang, Marta Kwiatk owska, Sen W ang and Min W u Department of Computer Science, Univ ersity of Oxford Abstract. Deep neural networks have achieved impressiv e experimental results in image classiﬁcation, but can surprisingly be unstable with respect to adversar- ial perturbations, that is, minimal changes to the input image that cause the net- work to misclassify it. W ith potential applications including perception modules and end-to-end controllers for self-driving cars, this raises concerns about their safety . W e dev elop a novel automated veriﬁcation framework for feed-forward multi-layer neural networks based on Satisﬁability Modulo Theory (SMT). W e focus on safety of image classiﬁcation decisions with respect to image manipu- lations, such as scratches or changes to camera angle or lighting conditions that would result in the same class being assigned by a human, and deﬁne safety for an individual decision in terms of in variance of the classiﬁcation within a small neighbourhood of the original image. W e enable exhausti ve search of the region by employing discretisation, and propagate the analysis layer by layer . Our method works directly with the network code and, in contrast to existing meth- ods, can guarantee that adversarial examples, if they exist, are found for the giv en region and family of manipulations. If found, adversarial e xamples can be sho wn to human testers and / or used to ﬁne-tune the network. W e implement the tech- niques using Z3 and ev aluate them on state-of-the-art networks, including regu- larised and deep learning netw orks. W e also compare ag ainst existing techniques to search for adversarial e xamples and estimate network robustness. 1 Introduction Deep neural networks have achiev ed impressiv e experimental results in image classiﬁ- cation, matching the cognitive ability of humans [23] in complex tasks with thousands of classes. Many applications are en visaged, including their use as perception modules and end-to-end controllers for self-driving cars [15]. Let R n be a vector space of images (points) that we wish to classify and assume that f : R n → C , where C is a (ﬁnite) set of class labels, models the human perception capability , then a neural network classiﬁer is a function ˆ f ( x ) which approximates f ( x ) from M training examples { ( x i , c i ) } i = 1 ,.., M . For example, a perception module of a self-driving car may input an image from a camera and must correctly classify the type of object in its view , irrespecti ve of aspects such as the angle of its vision and image imperfections. Therefore, though they clearly in- clude imperfections, all four pairs of images in Figure 1 should arguably be classiﬁed as automobiles, since they appear so to a human e ye. ? This work is supported by the EPSRC Programme Grant on Mobile Autonomy (EP / M019918 / 1). Part of this work was done while MK was visiting the Simons Institute for the Theory of Computing. Classiﬁers employed in vision tasks are typically multi-layer networks, which prop- agate the input image through a series of linear and non-linear operators. They are high-dimensional, often with millions of dimensions, non-linear and potentially dis- continuous: even a small network, such as that trained to classify hand-written images of digits 0-9, has over 60,000 real-v alued parameters and 21,632 neurons (dimensions) in its ﬁrst layer . At the same time, the networks are trained on a ﬁnite data set and expected to generalise to previously unseen images. T o increase the probability of cor- rectly classifying such an image, regularisation techniques such as dropout are typically used, which improv es the smoothness of the classiﬁers, in the sense that images that are close (within  distance) to a training point are assigned the same class label. automobile to bird automobile to frog automobile to airplane automobile to horse Fig. 1. Automobile images (classiﬁed correctly) and their perturbed images (classiﬁed wrongly) Unfortunately , it has been observed in [13,36] that deep neural networks, includ- ing highly trained and smooth networks optimised for vision tasks, are unstable with respect to so called adversarial perturbations . Such adversarial perturbations are (min- imal) changes to the input image, often imperceptible to the human eye, that cause the network to misclassify the image. Examples include not only artiﬁcially generated ran- dom perturbations, but also (more worryingly) modiﬁcations of camera images [22] that correspond to resizing, cropping or change in lighting conditions. They can be devised without access to the training set [29] and are transferable [19], in the sense that an ex- ample misclassiﬁed by one network is also misclassiﬁed by a network with a di ﬀ erent architecture, even if it is trained on di ﬀ erent data. Figure 1 giv es adversarial pertur- bations of automobile images that are misclassiﬁed as a bird, frog, airplane or horse by a highly trained state-of-the-art network. This obviously raises potential safety con- cerns for applications such as autonomous driving and calls for automated veriﬁcation techniques that can verify the correctness of their decisions. Safety of AI systems is receiving increasing attention, to mention [33,10], in view of their potential to cause harm in safety-critical situations such as autonomous driving. T ypically , decision making in such systems is either solely based on machine learning, through end-to-end controllers, or in volv es some combination of logic-based reasoning and machine learning components, where an image classiﬁer produces a classiﬁcation, say speed limit or a stop sign, that serves as input to a controller . A recent trend towards “explainable AI” has led to approaches that learn not only how to assign the classiﬁca- tion labels, but also additional explanations of the model, which can take the form of a justiﬁcation explanation (why this decision has been reached, for example identify- ing the features that supported the decision) [17,31]. In all these cases, the safety of a decision can be reduced to ensuring the correct behaviour of a machine learning com- ponent. Howe ver , safety assurance and veriﬁcation methodologies for machine learning are little studied. The main di ﬃ culty with image classiﬁcation tasks, which play a critical role in per- ception modules of autonomous driving controllers, is that they do not have a formal speciﬁcation in the usual sense: ideally , the performance of a classiﬁer should match the perception ability and class labels assigned by a human. Traditionally , the correct- ness of a neural network classiﬁer is expressed in terms of risk [37], deﬁned as the probability of misclassiﬁcation of a given image, weighted with respect to the input distribution µ of images. Similar (statistical) robustness properties of deep neural net- work classiﬁers, which compute the average minimum distance to a misclassiﬁcation and are independent of the data point, hav e been studied and can be estimated using tools such as DeepFool [25] and cle verhans [27]. Howe ver , we are interested in the safety of an individual decision , and to this end focus on the key property of the clas- siﬁer being invariant to perturbations at a given point . This notion is also kno wn as pointwise robustness [18,12] or local adv ersarial robustness [21]. Contributions. In this paper we propose a general framew ork for automated v eriﬁ- cation of safety of classiﬁcation decisions made by feed-forward deep neural networks. Although we work concretely with image classiﬁers, the techniques can be generalised to other settings. For a giv en image x (a point in a vector space), we assume that there is a (possibly inﬁnite) region η around that point that incontrovertibly supports the de- cision, in the sense that all points in this re gion must have the same class. This region is speciﬁed by the user and can be given as a small diameter , or the set of all points whose salient features are of the same type. W e next assume that there is a family of operations ∆ , which we call manipulations, that specify modiﬁcations to the image under which the classiﬁcation decision should remain in variant in the region η . Such manipulations can represent, for example, camera imprecisions, change of camera angle, or replacement of a feature. W e deﬁne a network decision to be safe for input x and region η with respect to the set of manipulations ∆ if applying the manipulations on x will not result in a class change for η . W e employ discretisation to enable a ﬁnite exhaustive search of the high- dimensional region η for adversarial misclassiﬁcations. The discretisation approach is justiﬁed in the case of image classiﬁers since they are typically represented as v ectors of discrete pixels (vectors of 8 bit RGB colours). T o achiev e scalability , we propagate the analysis layer by layer , mapping the re gion and manipulations to the deeper layers. W e show that this propagation is sound, and is complete under the additional assumption of minimality of manipulations, which holds in discretised settings. In contrast to existing approaches [36,28], our framew ork can guarantee that a misclassiﬁcation is found if it exists. Since we reduce veriﬁcation to a search for adv ersarial examples, we can achieve safety veriﬁcation (if no misclassiﬁcations are found for all layers) or falsiﬁcation (in which case the adversarial e xamples can be used to ﬁne-tune the network or sho wn to a human tester). W e implement the techniques using Z3 [8] in a tool called DL V (Deep Learning V er- iﬁcation) [2] and ev aluate them on state-of-the-art networks, including regularised and deep learning networks. This includes image classiﬁcation networks trained for clas- sifying hand-written images of digits 0-9 (MNIST), 10 classes of small colour images (CIF AR10), 43 classes of the German Tra ﬃ c Sign Recognition Benchmark (GTSRB) [35] and 1000 classes of colour images used for the well-known imageNet large-scale visual recognition challenge (ILSVRC) [4]. W e also perform a comparison of the DL V falsiﬁcation functionality on the MNIST dataset against the methods of [36] and [28], focusing on the search strategies and statistical robustness estimation. The perturbed images in Figure 1 are found automatically using our tool for the network trained on the CIF AR10 dataset. This in vited paper is an extended and improved v ersion of [20], where an extended version including appendices can also be found. 2 Background on Neural Netw orks W e consider feed-forward multi-layer neural networks [14], henceforth abbreviated as neural networks. Perceptrons (neurons) in a neural network are arranged in disjoint layers, with each perceptron in one layer connected to the next layer, but no connection between perceptrons in the same layer . Each layer L k of a network is associated with an n k -dimensional vector space D L k ⊆ R n k , in which each dimension corresponds to a perceptron. W e write P k for the set of perceptrons in layer L k and n k = | P k | is the number of perceptrons (dimensions) in layer L k . Formally , a (feed-forwar d and deep) neural network N is a tuple ( L , T , Φ ), where L = { L k | k ∈ { 0 , ..., n }} is a set of layers such that layer L 0 is the input layer and L n is the output layer , T ⊆ L × L is a set of sequential connections between layers such that, except for the input and output layers, each layer has an incoming connection and an outgoing connection, and Φ = { φ k | k ∈ { 1 , ..., n }} is a set of activation functions φ k : D L k − 1 → D L k , one for each non-input layer . Layers other than input and output layers are called the hidden layers. The network is fed an input x (point in D L 0 ) through its input layer , which is then propagated through the layers by successiv e application of the activ ation functions. An activation for point x in layer k is the value of the corresponding function, denoted α x , k = φ k ( φ k − 1 ( ...φ 1 ( x ))) ∈ D L k , where α x , 0 = x . For perceptron p ∈ P k we write α x , k ( p ) for the value of its acti v ation on input x . F or e very acti vation α x , k and layer k 0 < k , we deﬁne Pr e k 0 ( α x , k ) = { α y , k 0 ∈ D L k 0 | α y , k = α x , k } to be the set of activ ations in layer k 0 whose corresponding activ ation in layer L k is α x , k . The classiﬁcation decision is made based on the activ ations in the output layer by , e.g., assigning to x the class arg max p ∈ P n α x , n ( p ). For simplicity , we use α x , n to denote the class assigned to input x , and thus α x , n = α y , n expresses that two inputs x and y have the same class . The neural network classiﬁer N represents a function ˆ f ( x ) which approximates f ( x ) : D L 0 → C , a function that models the human perception capability in labelling im- ages with labels from C , from M training examples { ( x i , c i ) } i = 1 ,.., M . Image classiﬁcation networks, for example conv olutional networks, may contain many layers, which can be non-linear, and work in high dimensions, which for the image classiﬁcation prob- lems can be of the order of millions. Digital images are represented as 3D tensors of pixels (width, height and depth, the latter to represent colour), where each pixel is a dis- crete value in the range 0..255. The training process determines real values for weights used as ﬁlters that are conv olved with the activ ation functions. Since it is di ﬃ cult to approximate f with few samples in the sparsely populated high-dimensional space, to increase the probability of classifying correctly a pre viously unseen image, v arious re g- ularisation techniques such as dropout are employed. They improve the smoothness of the classiﬁer, in the sense that points that are  -close to a training point (potentially inﬁnitely many of them) classify the same. In this paper , we work with the code of the network and its trained weights. 3 Safety Analysis of Classiﬁcation Decisions In this section we deﬁne our notion of safety of classiﬁcation decisions for a neural net- work, based on the concept of a manipulation of an image, essentially perturbations that a human observer would classify the same as the original image. Safety is deﬁned for an individual classiﬁcation decision and is parameterised by the class of manipulations and a neighbouring region around a given image. T o ensure ﬁniteness of the search of the region for adversarial misclassiﬁcations, we introduce so called “ladders”, nonde- terministically branching and iterated application of successi ve manipulations, and state the conditions under which the search is exhausti ve. Safety and Robustness Our method assumes the existence of a (possibly inﬁnite) region η around a data point (image) x such that all points in the region are indistin- guishable by a human, and therefore ha ve the same true class. This re gion is understood as supporting the classiﬁcation decision and can usually be inferred from the type of the classiﬁcation problem. For simplicity , we identify such a region via its diameter d with respect to some user-speciﬁed norm, which intuitiv ely measures the closeness to the point x . As deﬁned in [18], a network ˆ f approximating human capability f is said to be not r obust at x if there exists a point y in the region η = { z ∈ D L 0 | || z − x || ≤ d } of the input layer such that ˆ f ( x ) , ˆ f ( y ). The point y , at a minimal distance from x , is known as an adversarial example . Our deﬁnition of safety for a classiﬁcation decision (abbreviated safety at a point ) follows he same intuition, except that we work layer by layer , and therefore will identify such a region η k , a subspace of D L k , at each layer L k , for k ∈ { 0 , ..., n } , and successi vely reﬁne the regions through the deeper layers. W e justify this choice based on the observ ation [11,23,24] that deep neural networks are thought to compute progressiv ely more powerful in variants as the depth increases. In other words, they gradually transform images into a representation in which the classes are separable by a linear classiﬁer . Assumption 1 F or each activation α x , k of point x in layer L k , the r egion η k ( α x , k ) con- tains activations that the human observer believes to be so close to α x , k that the y should be classiﬁed the same as x. Intuitiv ely , safety for network N at a point x means that the classiﬁcation decision is robust at x against perturbations within the region η k ( α x , k ). Note that, while the pertur- bation is applied in layer L k , the classiﬁcation decision is based on the activ ation in the output layer L n . Deﬁnition 1. [General Safety] Let η k ( α x , k ) be a re gion in layer L k of a neural network N such that α x , k ∈ η k ( α x , k ) . W e say that N is safe for input x and re gion η k ( α x , k ) , written as N , η k | = x , if for all activations α y , k in η k ( α x , k ) we have α y , n = α x , n . W e remark that, unlike the notions of risk [37] and robustness of [18,12], we work with safety for a speciﬁc point and do not account for the input distribution, but such expectation measures can be considered, see Section 6 for comparison. Manipulations A key concept of our framework is the notion of a manipulation , an operator that intuiti vely models image perturbations, for example bad angles, scratches or weather conditions, the idea being that the classiﬁcation decisions in a region of im- ages close to it should be in variant under such manipulations. The choice of the type of manipulation is dependent on the application and user -deﬁned, reﬂecting knowledge of the classiﬁcation problem to model perturbations that should or should not be allo wed. Judicious choice of families of such manipulations and appropriate distance metrics is particularly important. For simplicity , we work with operators δ k : D L k → D L k ov er the activ ations in the vector space of layer k , and consider the Euclidean ( L 2 ) and Manhattan ( L 1 ) norms to measure the distance between an image and its perturbation through δ k , but the techniques generalise to other norms discussed in [18,19,12]. More speciﬁcally , applying a manipulation δ k ( α x , k ) to an activ ation α x , k will result in another activ ation such that the values of some or all dimensions are changed. W e therefore represent a manipulation as a hyper-rectangle, deﬁned for two activ ations α x , k and α y , k of layer L k by r ec ( α x , k , α y , k ) = × p ∈ P k [ min ( α x , k ( p ) , α y , k ( p )) , ma x ( α x , k ( p ) , α y , k ( p ))] . The main chal- lenge for veriﬁcation is the fact that the region η k contains potentially an uncountable number of activ ations. Our approach relies on discretisation in order to enable a ﬁnite exploration of the re gion to discover and / or rule out adv ersarial perturbations. For an acti vation α x , k and a set ∆ of manipulations, we denote by r ec ( ∆, α x , k ) the polyhedron which includes all hyper-rectangles that result from applying some manip- ulation in ∆ on α x , k , i.e., r ec ( ∆, α x , k ) = S δ ∈ ∆ rec ( α x , k , δ ( α x , k )). Let ∆ k be the set of all possible manipulations for layer L k . T o ensure region co verage, we deﬁne valid manip- ulation as follows. Deﬁnition 2. Given an activation α x , k , a set of manipulations V ( α x , k ) ⊆ ∆ k is valid if α x , k is an interior point of rec ( V ( α x , k ) , α x , k ) , i.e., α x , k is in r ec ( V ( α x , k ) , α x , k ) and does not belong to the boundary of rec ( V ( α x , k ) , α x , k ) . Figure 2 presents an example of valid manipulations in two-dimensional space: each arrow represents a manipulation, each dashed box represents a (hyper-)rectangle of the corresponding manipulation, and activ ation α x , k is an interior point of the space from the dashed boxes. Since we work with discretised spaces, which is a reasonable assumption for im- ages, we introduce the notion of a minimal manipulation. If applying a minimal manip- ulation, it su ﬃ ces to check for misclassiﬁcation just at the end points, that is, α x , k and δ k ( α x , k ). This allows an exhaustiv e, albeit impractical, exploration of the region in unit steps. A manipulation δ 1 k ( α y , k ) is ﬁner than δ 2 k ( α x , k ), written as δ 1 k ( α y , k ) ≤ δ 2 k ( α x , k ), if any activ ation in the hyper-rectangle of the former is also in the hyper-rectangle of the latter . It is implied in this deﬁnition that α y , k is an activ ation in the hyper-rectangle of δ 2 k ( α x , k ). Moreov er , we write δ k , k 0 ( α x , k ) for φ k 0 ( ...φ k + 1 ( δ k ( α x , k ))), representing the corresponding activ ation in layer k 0 ≥ k after applying manipulation δ k on the activ ation α x , k , where δ k , k ( α x , k ) = δ k ( α x , k ).  1  1  2  2  3  3  4  4 ↵ x,k ↵ x,k Fig. 2. Example of a set { δ 1 , δ 2 , δ 3 , δ 4 } of valid manipulations in a 2-dimensional space Deﬁnition 3. A manipulation δ k on an activation α x , k is minimal if ther e does not exist manipulations δ 1 k and δ 2 k and an activation α y , k such that δ 1 k ( α x , k ) ≤ δ k ( α x , k ) , α y , k = δ 1 k ( α x , k ) , δ k ( α x , k ) = δ 2 k ( α y , k ) , and α y , n , α x , n and α y , n , δ k , n ( α x , k ) . Intuitiv ely , a minimal manipulation does not hav e a ﬁner manipulation that results in a di ﬀ erent classiﬁcation. Howe ver , it is possible to have di ﬀ erent classiﬁcations before and after applying the minimal manipulation , i.e., it is possible that δ k , n ( α x , k ) , α x , n . It is not hard to see that the minimality of a manipulation implies that the class change in its associated hyper-rectangle can be detected by checking the class of the end points α x , k and δ k ( α x , k ). Bounded V ariation Recall that we apply manipulations in layer L k , but check the classiﬁcation decisions in the output layer . T o ensure ﬁnite, exhaustive coverage of the region, we introduce a continuity assumption on the mapping from space D L k to the output space D L n , adapted from the concept of bounded variation [9]. Gi ven an activ ation α x , k with its associated region η k ( α x , k ), we deﬁne a “ladder” on η k ( α x , k ) to be a set ld of activ ations containing α x , k and ﬁnitely many , possibly zero, acti vations from η k ( α x , k ). The acti vations in a ladder can be arranged into an increasing order α x , k = α x 0 , k < α x 1 , k < ... < α x j , k such that ev ery activ ation α x t , k ∈ ld appears once and has a successor α x t + 1 , k such that α x t + 1 , k = δ k ( α x t , k ) for some manipulation δ k ∈ V ( α x t , k ). For the greatest element α x j , k , its successor should be outside the region η k ( α x , k ), i.e., α x j + 1 , k < η k ( α x , k ). Giv en a ladder ld , we write l d ( t ) for its t + 1-th acti vation, ld [0 .. t ] for the preﬁx of ld up to the t + 1-th activ ation, and l a st ( ld ) for the greatest element of ld . Figure 3 giv es a diagrammatic explanation on the ladders. Deﬁnition 4. Let L ( η k ( α x , k )) be the set of ladders in η k ( α x , k ) . Then the total variation of the r e gion η k ( α x , k ) on the neural network with r espect to L ( η k ( α x , k )) is V ( N ; η k ( α x , k )) = sup ld ∈L ( η k ( α x , k )) X α x t , k ∈ ld \{ la st ( ld ) } di ﬀ n ( α x t , n , α x t + 1 , n ) wher e di ﬀ n : D L n × D L n → { 0 , 1 } is given by di ﬀ n ( α x , n , α y , n ) = 0 if α x , n = α y , n and 1 otherwise. W e say that the re gion η k ( α x , k ) is a bounded variation if V ( N ; η k ( α x , k )) < ∞ , and ar e particularly inter ested in the case when V ( N ; r k ( α y , k )) = 0 , which is called a 0-variation .  k  k  k  k  k  k  k  k  k  k  k  k ↵ x,k = ↵ x 0 ,k ↵ x,k = ↵ x 0 ,k ↵ x 1 ,k ↵ x 1 ,k ↵ x 2 ,k ↵ x 2 ,k ↵ x j ,k ↵ x j ,k ↵ x j +1 ,k ↵ x j +1 ,k ⌘ k ( ↵ x,k ) ⌘ k ( ↵ x,k ) Fig. 3. Examples of ladders in re gion η k ( α x , k ). Starting from α x , k = α x 0 , k , the activ ations α x 1 , k ...α x j , k form a ladder such that each consecutiv e activ ation results from some valid manipulation δ k applied to a pre vious activ ation, and the ﬁnal activ ation α x j , k is outside the region η k ( α x , k ). The set L ( η k ( α x , k )) is complete if, for any ladder ld ∈ L ( η k ( α x , k )) of j + 1 activations, any element ld ( t ) for 0 ≤ t ≤ j , and any manipulation δ k ∈ V ( ld ( t )), there exists a ladder ld 0 ∈ L ( η k ( α x , k )) such that ld 0 [0 .. t ] = ld [0 .. t ] and ld 0 ( t + 1) = δ k ( ld ( t )). Intuitiv ely , a complete ladder is a complete tree, on which each node represents an activ ation and each branch of a node corresponds to a valid manipulation. From the root α x , k , ev ery path of the tree leading to a leaf is a ladder . Moreover , the set L ( η k ( α x , k )) is covering if the polyhedra of all activ ations in it cover the re gion η k ( α x , k ), i.e., η k ( α x , k ) ⊆ [ ld ∈L ( η k ( α x , k )) [ α x t , k ∈ ld \{ la st ( ld ) } rec ( V ( α x t , k ) , α x t , k ) . (1) Based on the above, we hav e the following deﬁnition of safety with respect to a set of manipulations. Intuitiv ely , we iteratively and nondeterministically apply manipula- tions to explore the region η k ( α x , k ), and safety means that no class change is observed by successiv e application of such manipulations. Deﬁnition 5. [Safety wrt Manipulations] Given a neural network N , an input x and a set ∆ k of manipulations, we say that N is safe for input x with respect to the region η k and manipulations ∆ k , written as N , η k , ∆ k | = x , if the r e gion η k ( α x , k ) is a 0-variation for the set L ( η k ( α x , k )) of its ladders, which is complete and co vering. It is straightforward to note that general safety in the sense of Deﬁnition 1 implies safety wrt manipulations, in the sense of Deﬁnition 5. Theorem 1. Given a neural network N , an input x, and a r egion η k , we have that N , η k | = x implies N , η k , ∆ k | = x for any set of manipulations ∆ k . In the opposite direction, we require the minimality assumption on manipulations. Theorem 2. Given a neural network N , an input x, a r e gion η k ( α x , k ) and a set ∆ k of manipulations, we have that N , η k , ∆ k | = x implies N , η k | = x if the manipulations in ∆ k ar e minimal. Theorem 2 means that, under the minimality assumption o ver the manipulations, an exhaustive search through the complete and covering ladder tree from L ( η k ( α x , k )) can ﬁnd adversarial examples, if any , and enable us to conclude that the network is safe at a gi ven point if none are found. Though computing minimal manipulations is not practical, in discrete spaces by iterating ov er increasingly reﬁned manipulations we are able to rule out the existence of adversarial examples in the region. This contrasts with partial exploration according to, e.g., [25,12]; for comparison see Section 7. 4 The V eriﬁcation Framework In this section we propose a novel framework for automated veriﬁcation of safety of classiﬁcation decisions, which is based on search for an adversarial misclassiﬁcation within a giv en region. The ke y distinctiv e distinctiv e features of our framework com- pared to existing work are: a guarantee that a misclassiﬁcation is found if it exists; the propagation of the analysis layer by layer ; and working with hidden layers, in addi- tion to input and output layers. Since we reduce veriﬁcation to a search for adversarial examples, we can achiev e safety veriﬁcation (if no misclassiﬁcations are found for all layers) or falsiﬁcation (in which case the adversarial examples can be used to ﬁne-tune the network or sho wn to a human tester). 4.1 Layer -by-Layer Analysis W e ﬁrst consider ho w to propag ate the analysis layer by layer , which will in volv e r eﬁn- ing manipulations through the hidden layers. T o facilitate such analysis, in addition to the acti vation function φ k : D L k − 1 → D L k we also require a mapping ψ k : D L k → D L k − 1 in the opposite direction, to represent how a manipulated activ ation of layer L k a ﬀ ects the activ ations of layer L k − 1 . W e can simply take ψ k as the in verse function of φ k . In order to propagate safety of regions η k ( α x , k ) at a point x into deeper layers, we assume the existence of functions η k that map activ ations to regions, and impose the following restrictions on the functions φ k and ψ k , shown diagrammatically in Figure 4. Deﬁnition 6. The functions { η 0 , η 1 , ..., η n } and { ψ 1 , ..., ψ n } mapping activations to r e- gions ar e such that 1. η k ( α x , k ) ⊆ D L k , for k = 0 , ..., n, 2. α x , k ∈ η k ( α x , k ) , for k = 0 , ..., n, and 3. η k − 1 ( α i , k − 1 ) ⊆ ψ k ( η k ( α x , k )) for all k = 1 , ..., n. Intuitiv ely , the ﬁrst two conditions state that each function η k assigns a region around the activ ation α x , k , and the last condition that mapping the region η k from layer L k to L k − 1 via ψ k should cov er the region η k − 1 . The aim is to compute functions η k + 1 , ..., η n based on η k and the neural network. The size and complexity of a deep neural network generally means that determining whether a giv en set ∆ k of manipulations is minimal is intractable. T o partially counter this, we deﬁne a r eﬁnement relation between safety wrt manipulations for consecutiv e layers in the sense that N , η k , ∆ k | = x is a reﬁnement of N , η k − 1 , ∆ k − 1 | = x if all ma- nipulations δ k − 1 in ∆ k − 1 are reﬁned by a sequence of manipulations δ k from the set ∆ k . Therefore, although we cannot theoretically conﬁrm the minimality of ∆ k , they are re- ﬁned layer by layer and, in discrete settings, this process can be bounded from below by the unit step. Moreover , we can work gradually from a speciﬁc layer inwards until an adversarial e xample is found, ﬁnishing processing when reaching the output layer . The reﬁnement framework is giv en in Figure 5. The arrows represent the implication ⌘ 0 ( ↵ x, 0 ) ⌘ 0 ( ↵ x, 0 ) ↵ x, 0 ↵ x, 0 ↵ x,k ↵ x,k ⌘ k ( ↵ x,k ) ⌘ k ( ↵ x,k ) ↵ x,n ↵ x,n ⌘ n ( ↵ x,n ) ⌘ n ( ↵ x,n ) layer 0 layer k layer n ⌘ k  1 ( ↵ x,k  1 ) ⌘ k  1 ( ↵ x,k  1 ) ↵ x,k  1 ↵ x,k  1 k k layer k-1 D L 0 D L 0 D L k  1 D L k  1 D L k D L k D L n D L n  k  k Fig. 4. Layer by layer analysis according to Deﬁnition 6 relations between the safety notions and are labelled with conditions if needed. The goal of the reﬁnements is to ﬁnd a chain of implications to justify N , η 0 | = x . The fact that N , η k | = x implies N , η k − 1 | = x is due to the constraints in Deﬁnition 6 when ψ k = φ − 1 k . The fact that N , η k | = x implies N , η k , ∆ k | = x follows from Theorem 1. The implication from N , η k , ∆ k | = x to N , η k | = x under the condition that ∆ k is minimal is due to Theorem 2. W e now deﬁne the notion of r eﬁnability of manipulations between layers. Intu- itiv ely , a manipulation in layer L k − 1 is reﬁnable in layer L k if there exists a sequence of manipulations in layer L k that implements the manipulation in layer L k − 1 . Deﬁnition 7. A manipulation δ k − 1 ( α y , k − 1 ) is reﬁnable in layer L k if there exist activa- tions α x 0 , k , ..., α x j , k ∈ D L k and valid manipulations δ 1 k ∈ V ( α x 0 , k ) , ..., δ j k ∈ V ( α x j − 1 , k ) such that α y , k = α x 0 , k , δ k − 1 , k ( α y , k − 1 ) = α x j , k , and α x t , k = δ t k ( α x t − 1 , k ) for 1 ≤ t ≤ j. Given a neural network N and an input x, the manipulations ∆ k ar e a reﬁnement by layer of η k − 1 , ∆ k − 1 and η k if, for all α y , k − 1 ∈ η k − 1 ( α z , k − 1 ) , all its valid manipulations δ k − 1 ( α y , k − 1 ) ar e r eﬁnable in layer L k . N, ⌘ 0 | = x N, ⌘ 0 | = x N, ⌘ 1 | = x N, ⌘ 1 | = x N, ⌘ 2 | = x N, ⌘ 2 | = x N, ⌘ k | = x N, ⌘ k | = x N, ⌘ 1 ,  1 | = x N, ⌘ 1 ,  1 | = x N, ⌘ 2 ,  2 | = x N, ⌘ 2 ,  2 | = x N, ⌘ k ,  k | = x N, ⌘ k ,  k | = x  k  k is minimal … … Fig. 5. Reﬁnement framew ork W e hav e the following theorem stating that the reﬁnement of safety notions is im- plied by the “reﬁnement by layer” relation. Theorem 3. Assume a neural network N and an input x. F or all layers k ≥ 1 , if manip- ulations ∆ k ar e r eﬁnement by layer of η k − 1 , ∆ k − 1 and η k , then we have that N , η k , ∆ k | = x implies N , η k − 1 , ∆ k − 1 | = x . W e note that any adversarial e xample of safety wrt manipulations N , η k , ∆ k | = x is also an adversarial example for general safety N , η k | = x . Howe ver , an adversarial example α x , k for N , η k | = x at layer k needs to be checked to see if it is an adversarial example of N , η 0 | = x , i.e. for the input layer . Recall that Pr e k 0 ( α x , k ) is not necessarily unique. This is equiv alent to checking the emptiness of Pr e 0 ( α x , k ) ∩ η 0 ( α x , 0 ). If we start the analysis with a hidden layer k > 0 and there is no speciﬁcation for η 0 , we can instead consider checking the emptiness of { α y , 0 ∈ Pr e 0 ( α x , k ) | α y , n , α x , n } . 4.2 The V eriﬁcation Method W e summarise the theory developed thus f ar as a search-based recursi ve veriﬁcation procedure giv en belo w . The method is parameterised by the region η k around a giv en point and a family of manipulations ∆ k . The manipulations are speciﬁed by the user for the classiﬁcation problem at hand, or alternatively can be selected automatically , as described in Section 4.4. The vector norm to identify the region can also be speciﬁed by the user and can v ary by layer . The method can start in any layer , with analysis propagated into deeper layers, and terminates when a misclassiﬁcation is found. If an adversarial example is found by manipulating a hidden layer, it can be mapped back to the input layer , see Section 4.5. Algorithm 1 Given a neural network N and an input x, recur sively perform the fol- lowing steps, starting fr om some layer l ≥ 0 . Let k ≥ l be the curr ent layer under consideration. 1. determine a r e gion η k such that if k > l then η k and η k − 1 satisfy Deﬁnition 6; 2. determine a manipulation set ∆ k such that if k > l then ∆ k is a reﬁnement by layer of η k − 1 , ∆ k − 1 and η k accor ding to Deﬁnition 7; 3. verify whether N , η k , ∆ k | = x , (a) if N , η k , ∆ k | = x then i. r eport that N is safe at x with respect to η k ( α x , k ) and ∆ k , and ii. continue to layer k + 1 ; (b) if N , η k , ∆ k 6| = x , then r eport an adversarial example . W e implement Algorithm 1 by utilising satisﬁability modulo theory (SMT) solvers. The SMT problem is a decision problem for logical formulas with respect to combina- tions of background theories expressed in classical ﬁrst-order logic with equality . For checking reﬁnement by layer , we use the theory of linear real arithmetic with existen- tial and univ ersal quantiﬁers, and for v eriﬁcation within a layer (0-variation) we use the same theory but without univ ersal quantiﬁcation. The details of the encoding and the ap- proach taken to compute the regions and manipulations are included in Section 4.4. T o enable practical veriﬁcation of deep neural networks, we employ a number of heuristics described in the remainder of this section. 4.3 Featur e Decomposition and Discovery While Theorem 1 and 2 provide a ﬁnite way to verify safety of neural network clas- siﬁcation decisions, the high-dimensionality of the region η k ( α x , k ) can make any com- putational approach impractical. W e therefore use the concept of a feature to parti- tion the region η k ( α x , k ) into a set of features, and exploit their independence and low- dimensionality . This allo ws us to work with state-of-the-art networks that ha ve hun- dreds, and ev en thousands, of dimensions. Intuitiv ely , a feature deﬁnes for each point in the high-dimensional space D L k the most explicit salient feature it has, e.g., the red-coloured frame of a street sign in Fig- ure 10. Formally , for each layer L k , a feature function f k : D L k → P ( D L k ) assigns a small region for each activ ation α x , k in the space D L k , where P ( D L k ) is the set of subspaces of D L k . The region f k ( α x , k ) may hav e lo wer dimension than that of D k . It has been argued, in e.g. [16] for natural images, that natural data, for example natural images and sound, forms a high-dimensional manifold, which embeds tangled manifolds to represent their features. Feature manifolds usually ha ve lo wer dimension than the data manifold, and a classiﬁcation algorithm is to separate a set of tangled manifolds. By assuming that the appearance of features is independent, we can manipulate them one by one regardless of the manipulation order , and thus reduce the problem of size O (2 d 1 + ... + d m ) into a set of smaller problems of size O (2 d 1 ) , ..., O (2 d m ). The analysis of activ ations in hidden layers, as performed by our method, provides an opportunity to discover the featur es automatically . Moreov er , deﬁning the feature f k on each activ ation as a single region corresponding to a speciﬁc feature is without loss of generality: although an activ ation may include multiple features, the indepen- dence relation between features suggests the existence of a total relation between these features. The function f k essentially deﬁnes for each activ ation one particular feature, subject to certain criteria such as explicit knowledge, but features can also be explored in parallel. Every feature f k ( α y , k ) is identiﬁed by a pre-speciﬁed number dim s k , f of dimensions. Let d im s k ( f k ( α y , k )) be the set of dimensions selected according to some heuristic. Then we hav e that f k ( α y , k )( p ) = ( η k ( α x , k )( p ) , if p ∈ d im s k ( f k ( α y , k )) [ α y , k ( p ) , α y , k ( p )] otherwise. (2) Moreov er , we need a set of features to partition the region η k ( α x , k ) as follows. Deﬁnition 8. A set { f 1 , ..., f m } of r e gions is a partition of η k ( α x , k ) , written as π ( η k ( α x , k )) , if d im s k , f ( f i ) ∩ d im s k , f ( f j ) = ∅ for i , j ∈ { 1 , ..., m } and η k ( α x , k ) = × m i = 1 f i . Giv en such a partition π ( η k ( α x , k )), we deﬁne a function act s ( x , k ) by act s ( x , k ) = { α y , k ∈ x | x ∈ π ( η k ( α x , k )) } (3) which contains one point for each feature. Then, we reduce the checking of 0-v ariation of a region η k ( α x , k ) to the following problems: – checking whether the points in act s ( x , k ) ha ve the same class as α x , k , and – checking the 0-variation of all features in π ( η k ( α x , k )). In the abov e procedure, the checking of points in act s ( x , k ) can be conducted ei- ther by following a pre-speciﬁed sequential order ( single-path search) or by exhaus- tiv ely searching all possible orders ( multi-path search). In Section 5 we demonstrate that single-path search according to the prominence of features can enable us to ﬁnd adversarial e xamples, while multi-path search may ﬁnd other examples whose distance to the original input image is smaller . 4.4 Selection of Regions and Manipulations The procedure summarised in Algorithm 1 is typically in voked for a giv en image in the input layer , but, providing insight about hidden layers is a vailable, it can start from any layer L l in the network. The selection of re gions can be automated, as described belo w . For the ﬁrst layer to be considered, i.e., k = l , the region η k ( α x , k ) is deﬁned by ﬁrst selecting the subset of d im s k dimensions from P k whose activ ation values are furthest away from the av erage activ ation value of the layer 1 . Intuiti vely , the knowledge repre- sented by these activ ations is more e xplicit than the kno wledge represented by the other dimensions, and manipulations over more explicit knowledge are more likely to result in a class change. Let avg k = ( P p ∈ P k α x , k ( p )) / n k be the average activ ation value of layer L k . W e let d im s k ( η k ( α x , k )) be the ﬁrst dim s k dimensions p ∈ P k with the greatest v alues | α x , k ( p ) − avg | among all dimensions, and then deﬁne η k ( α x , k ) = × p ∈ dim s k ( η k ( α x , k )) [ α x , k ( p ) − s p ∗ m p , α x , k ( p ) + s p ∗ m p ] (4) i.e., a d im s k -polytope containing the activ ation α x , k , where s p represents a small span and m p represents the number of such spans. Let V k = { s p , m p | p ∈ d im s k ( η k ( α x , k )) } be a set of variables. Let d be a function mapping from d im s k ( η k ( α x , k )) to {− 1 , 0 , + 1 } such that { d ( p ) , 0 | p ∈ d im s k ( η k ( α x , k )) } , ∅ , and D ( d im s k ( η k ( α x , k ))) be the set of such functions. Let a manipulation δ d k be δ d k ( α y , k )( p ) =          α y , k ( p ) − s p if d ( p ) = − 1 α y , k ( p ) if d ( p ) = 0 α y , k ( p ) + s p if d ( p ) = + 1 (5) 1 W e also considered other approaches, including computing deri vativ es up to sev eral layers, but for the experiments we conduct the y are less e ﬀ ectiv e. for acti vation α y , k ∈ η k ( α x , k ). That is, each manipulation changes a subset of the dimen- sions by the span s p , according to the directions given in d . The set ∆ k is deﬁned by col- lecting the set of all such manipulations. Based on this, we can deﬁne a set L ( η k ( α x , k )) of ladders, which is complete and cov ering. Determining the r egion η k according to η k − 1 Giv en η k − 1 ( α x , k − 1 ) and the functions φ k and ψ k , we can automatically determine a region η k ( α x , k ) satisfying Deﬁnition 6 using the following approach. According to the function φ k , the activ ation v alue α x , k ( p ) of perceptron p ∈ P k is computed from activ ation values of a subset of perceptrons in P k − 1 . W e let V ar s ( p ) ⊆ P k − 1 be such a set of perceptrons. The selection of dimensions in d im s k ( η k ( α x , k )) depends on d im s k − 1 ( η k − 1 ( α x , k − 1 )) and φ k , by requiring that, for ev ery p 0 ∈ d im s k − 1 ( η k − 1 ( α x , k − 1 )), there is at least one dimension p ∈ d im s k ( η k ( α x , k )) such that p 0 ∈ V ar s ( p ). W e let d im s k ( η k ( α x , k )) = { arg max p ∈ P k { | α x , k ( p ) − avg k | | p 0 ∈ V ar s ( p ) } | p 0 ∈ d im s k − 1 ( η k − 1 ( α x , k − 1 )) } (6) Therefore, the restriction of Deﬁnition 6 can be expressed with the follo wing formula: ∀ α y , k − 1 ∈ η k ( α x , k − 1 ) : α y , k − 1 ∈ ψ k ( η k ( α x , k )) . (7) W e omit the details of rewriting α y , k − 1 ∈ η k ( α x , k − 1 ) and α y , k − 1 ∈ ψ k ( η k ( α x , k )) into Boolean expressions, which follow from standard techniques. Note that this e xpres- sion includes variables in V k , V k − 1 and α y , k − 1 . The variables in V k − 1 are ﬁx ed for a giv en η k − 1 ( α x , k − 1 ). Because such a region η k ( α x , k ) always exists, a simple iterati ve procedure can be inv oked to gradually increase the size of the re gion represented with variables in V k to ev entually satisfy the expression. Determining the manipulation set ∆ k according to η k ( α x , k ), η k − 1 ( α x , k − 1 ), and ∆ k − 1 The values of the v ariables V k obtained from the satisﬁability of Eqn (7) yield a deﬁni- tion of manipulations using Eqn (5). Ho wev er , the obtained v alues for span v ariables s p do not necessarily satisfy the “reﬁnement by layer” relation as deﬁned in Deﬁnition 7. Therefore, we need to adapt the v alues for the v ariables V k while, at the same time, retaining the region η k ( α x , k ). T o do so, we could rewrite the constraint in Deﬁnition 7 into a formula, which can then be solved by an SMT solver . But, in practice, we notice that such precise computations easily lead to o verly small spans s p , which in turn result in an unacceptable amount of computation needed to verify the relation N , η k , ∆ k | = x . T o reduce computational cost, we work with a weak er “reﬁnable in layer L k ” notion, parameterised with respect to precision ε . Given two activ ations α y , k and α m , k , we use d i st ( α y , k , α m , k ) to represent their distance. Deﬁnition 9. A manipulation δ k − 1 ( α y , k − 1 ) is reﬁnable in layer L k with pr ecision ε > 0 if ther e exists a sequence of activations α x 0 , k , ..., α x j , k ∈ D L k and valid manipulations δ 1 k ∈ V ( α x 0 , k ) , ..., δ d k ∈ V ( α x j − 1 , k ) such that α y , k = α x 0 , k , δ k − 1 , k ( α y , k − 1 ) ∈ rec ( α x j − 1 , k , α x j , k ) , d i st ( α x j − 1 , k , α x j , k ) ≤  , and α x t , k = δ t k ( α x t − 1 , k ) for 1 ≤ t ≤ j. Given a neural network N and an input x, the manipulations ∆ k ar e a reﬁnement by layer of η k , η k − 1 , ∆ k − 1 with pr ecision ε if, for all α y , k − 1 ∈ η k − 1 ( α x , k − 1 ) , all its le gal manipulations δ k − 1 ( α y , k − 1 ) ar e r eﬁnable in layer L k with pr ecision ε . Comparing with Deﬁnition 7, the abov e deﬁnition replaces δ k − 1 , k ( α y , k − 1 ) = α x j , k with δ k − 1 , k ( α y , k − 1 ) ∈ r ec ( α x j − 1 , k , α x j , k ) and d i st ( α x j − 1 , k , α x j , k ) ≤ ε . Intuitiv ely , instead of requiring a manipulation to reach the acti vation δ k − 1 , k ( α y , k − 1 ) precisely , this deﬁnition allows for each δ k − 1 , k ( α y , k − 1 ) to be within the hyper-rectangle r ec ( α x j − 1 , k , α x j , k ). T o ﬁnd suitable values for V k according to the approximate “reﬁnement-by-layer” relation, we use a variable h to represent the maximal number of manipulations of layer L k used to express a manipulation in layer k − 1. The value of h (and variables s p and n p in V k ) are automatically adapted to ensure the satisﬁability of the following formula, which expresses the constraints of Deﬁnition 9: ∀ α y , k − 1 ∈ η k ( α x , k − 1 ) ∀ d ∈ D ( d im s k ( η k ( α x , k − 1 ))) ∀ δ d k − 1 ∈ V k − 1 ( α y , k − 1 ) ∃ α y 0 , k , ..., α y h , k ∈ η k ( α x , k ) : α y 0 , k = α y , k ∧ V h − 1 t = 0 α y t + 1 , k = δ d k ( α y t , k ) ∧ W h − 1 t = 0 ( δ d k − 1 , k ( α y , k ) ∈ r ec ( α y t , k , α y t + 1 , k ) ∧ d i st ( α y t , k , α y t + 1 , k ) ≤ ε ) . (8) It is noted that s p and m p for p ∈ d im s k ( η k ( α x , k )) are employed when expressing δ d k . The manipulation δ d k is obtained from δ d k − 1 by considering the corresponding relation between dimensions in d im s k ( η k ( α x , k )) and d im s k − 1 ( η k − 1 ( α x , k − 1 )). Adversarial examples shown in Figures 8, 9, and 10 were found using single-path search and automatic selection of regions and manipulations. 4.5 Mapping Back to Input Layer When manipulating the hidden layers, we may need to map back an activ ation in layer k to the input layer to obtain an input image that resulted in misclassiﬁcation, which in volves computation of Pr e 0 ( α y , k ) described next. T o check the 0-variation of a re gion η k ( α x , k ), we need to compute di ﬀ n ( α x , n , α y , n ) for many points α y , x in η k ( α x , k ), where di ﬀ n : D L n × D L n → { 0 , 1 } is gi ven by di ﬀ n ( α x , n , α y , n ) = 0 if α x , n = α y , n and 1 otherwise. Because α x , n is known, we only need to compute α y , n . W e can compute α y , n by ﬁnding a point α y , 0 ∈ Pr e 0 ( α y , k ) and then using the neural network to predict the value α y , n . It should be noted that, although Pre 0 ( α y , k ) may include more than one point, all points hav e the same class, so any point in Pr e 0 ( α y , k ) is su ﬃ cient for our purpose. T o compute α y , 0 from α y , k , we use functions ψ k , ψ k − 1 , ..., ψ 1 and compute points α y , k − 1 , α y , k − 2 , ..., α y , 0 such that α y , j − 1 = ψ j ( α y , j ) ∧ α y , j − 1 ∈ η j − 1 ( α x , j − 1 ) for 1 ≤ j ≤ k . The computation relies on an SMT solver to encode the functions ψ k , ψ k − 1 , ..., ψ 1 if they are piecewise linear functions, and by taking the corresponding in verse functions directly if they are sigmoid functions. It is possible that, for some 1 ≤ j ≤ k , no point can be found by SMT solver , which means that the point α y , k does not ha ve an y corresponding point in the input layer . W e can safely discard these points. The maxpooling function ψ j selects from every m ∗ m dimensions the maximal element for some m > 0. The computation of the maxpooling layer ψ j − 1 is combined with the computation of the next layer ψ j , that is, ﬁnding α y , j − 2 with the following e xpression ∃ α x , j − 1 : α y , j − 2 = ψ j − 1 ( ψ j ( α y , j )) ∧ α y , j − 1 ∈ η j − 1 ( α x , j − 1 ) ∧ α y , j − 2 ∈ η j − 2 ( α x , j − 2 ) This is to ensure that in the expression α y , j − 2 = ψ j − 1 ( ψ j ( α y , j )) we can reuse m ∗ m − 1 elements in α x , j − 2 and only need to replace the maximal element. Figures 8, 9, and 10 show images obtained by mapping back from the ﬁrst hidden layer to the input layer . 5 Experimental Results The proposed framework has been implemented as a software tool called DL V (Deep Learning V eriﬁcation) [2] written in Python, see Appendix of [20] for details of input parameters and how to use the tool. The SMT solver we employ is Z3 [8], which has Python APIs. The neural netw orks are built from a widely-used neural networks library Keras [3] with a deep learning package Theano [6] as its back end. W e validate DL V on a set of experiments performed for neural networks trained for classiﬁcation based on a predeﬁned multi-dimensional surface (small size networks), as well as image classiﬁcation (medium size networks). These networks respectiv ely use two representativ e types of layers: fully connected layers and con volutional layers. They may also use other types of layers, e.g., the ReLU layer, the pooling layer , the zero-padding layer, and the dropout layer . The ﬁrst three demonstrate the single-path search functionality on the Euclidean ( L 2 ) norm, whereas the fourth (GTSRB) multi- path search for the L 1 and L 2 norms. The experiments are conducted on a MacBook Pro laptop, with 2.7 GHz Intel Core i5 CPU and 8 GB memory . T wo-Dimensional P oint Classiﬁcation Network T o demonstrate exhausti ve veriﬁca- tion facilitated by our framework, we consider a neural network trained for classifying points above and belo w a two-dimensional curve shown in red in Figure 6 and Figure 7. The network has three fully-connected hidden layers with the ReLU activ ation func- tion. The input layer has two perceptrons, ev ery hidden layer has 20 perceptrons, and the output layer has two perceptrons. The network is trained with 5,000 points sampled from the provided two-dimensional space, and has an accurac y of more than 99%. For a giv en input x = (3 . 59 , 1 . 11), we start from the input layer and deﬁne a region around this point by taking unit steps in both directions η 0 ( α x , 0 ) = [3 . 59 − 1 . 0 , 3 . 59 + 1 . 0] × [1 . 11 − 1 . 0 , 1 . 11 + 1 . 0] = [2 . 59 , 4 . 59] × [0 . 11 , 2 . 11] The manipulation set ∆ 0 is shown in Figure 6: there are 9 points, of which the point in the middle represents the acti vation α x , 0 and the other 8 points represent the activ ations resulting from applying one of the manipulations in ∆ 0 on α x , 0 . Note that, although there are class changes in the region η 0 ( α x , 0 ), the manipulation set ∆ 0 is not able to detect such changes. Therefore, we hav e that N , η 0 , ∆ 0 | = x . Now consider layer k = 1. T o obtain the re gion η 1 ( α x , 1 ), the tool selects two dimen- sions p 1 , 17 , p 1 , 19 ∈ P 1 in layer L 1 with indices 17 and 19 and computes η 1 ( α x , 1 ) = [ α x , 1 ( p 1 , 17 ) − 3 . 6 , α x , 1 ( p 1 , 17 ) + 3 . 6] × [ α x , 1 ( p 1 , 19 ) − 3 . 52 , α x , 1 ( p 1 , 19 ) + 3 . 52] The manipulation set ∆ 1 , after mapping back to the input layer with function ψ 1 , is giv en as Figure 7. Note that η 1 and η 0 satisfy Deﬁnition 6, and ∆ 1 is a reﬁnement by layer of Fig. 6. Input layer Fig. 7. First hidden layer η 0 , ∆ 0 and η 1 . W e can see that a class change can be detected (represented as the red coloured point). Therefore, we hav e that N , η 1 , ∆ 1 6| = x . Image Classiﬁcation Network for the MNIST Handwritten Image Dataset The well-known MNIST image dataset contains images of size 28 × 28 and one channel and the network is trained with the source code given in [5]. The trained network is of medium size with 600,810 parameters, has an accuracy of more than 99%, and is state- of-the-art. It has 12 layers, within which there are 2 con volutional layers, as well as layers such as ReLU, dropout, fully-connected layers and a softmax layer . The images are preprocessed to make the v alue of each pixel within the bound [0 , 1]. Giv en an image x , we start with layer k = 1 and the parameter set to at most 150 dimensions (there are 21632 dimensions in layer L 1 ). All η k , ∆ k for k ≥ 2 are computed according to the simple heuristic mentioned in Section 4.2 and satisfy Deﬁnition 6 and Deﬁnition 7. For the region η 1 ( α x , 1 ), we allo w changes to the activ ation value of each selected dimension that are within [-1,1]. The set ∆ 1 includes manipulations that can change the activ ation v alue for a subset of the 150 dimensions, by incrementing or decrementing the value for each dimension by 1. The experimental results show that for most of the examples we can ﬁnd a class change within 100 dimensional changes in layer L 1 , by comparing the number of pixels that have changed, and some of them can hav e less than 30 dimensional changes. Figure 8 presents examples of such class changes for layer L 1 . W e also experiment on images with up to 40 dimensional changes in layer L 1 ; the tool is able to check the entire network, reaching the output layer and claiming that N , η k , ∆ k | = x for all k ≥ 1. While training of the network takes half an hour , ﬁnding an adversarial example takes up to se veral minutes. Image Classiﬁcation Network for the CIF AR-10 Small Image Dataset W e work with a medium size neural network, trained with the source code from [1] for more than 12 hours on the well-known CIF AR10 dataset. The inputs to the network are images of size 32 × 32 with three channels. The trained network has 1,250,858 real-valued pa- 8 to 0 2 to 1 4 to 2 2 to 3 9 to 4 6 to 5 4 to 6 9 to 7 0 to 8 7 to 9 Fig. 8. Adversarial examples for a neural netw ork trained on MNIST rameters and includes con volutional layers, ReLU layers, max-pooling layers, dropout layers, fully-connected layers, and a softmax layer . As an illustration of the type of perturbations that we are in vestigating, consider the images in Figure 9, which correspond to the parameter setting of up to 25, 45, 65, 85, 105, 125, 145 dimensions, respectively , for layer k = 1. The manipulations change the activ ation values of these dimensions. Each image is obtained by mapping back from the ﬁrst hidden layer and represents a point close to the boundary of the correspond- ing region. The relation N , η 1 , ∆ 1 | = x holds for the ﬁrst 7 images, but fails for the last one and the image is classiﬁed as a truck. Intuiti vely , our choice of the region η 1 ( α x , 1 ) identiﬁes the subset of dimensions with most extreme activ ations, taking advantage of the analytical capability of the ﬁrst hidden layer . A higher number of selected dimen- sions implies a larger region in which we apply manipulations, and, more importantly , suggests a more dramatic change to the knowledge represented by the acti v ations when moving to the boundary of the re gion. Fig. 9. An illustrative example of mapping back to input layer from the Cifar-10 mataset: the last image classiﬁes as a truck. W e also work with 500 dimensions and otherwise the same experimental parameters as for MNIST . Figure 13 in Appendix of [20] gives 16 pairs of original images (clas- siﬁed correctly) and perturbed images (classiﬁed wrongly). W e found that, while the manipulations lead to human-recognisable modiﬁcations to the images, the perturbed images can be classiﬁed wrongly by the network. For each image, ﬁnding an adversarial example ranges from seconds to 20 minutes. Image Classiﬁcation Network f or the ImageNet Dataset W e also conduct e xperi- ments on a lar ge image classiﬁcation network trained on the popular ImageNet dataset. The images are of size 224 × 224 and have three channels. The network is the model of the 16-layer network [34], called VGG16, used by the VGG team in the ILSVRC- 2014 competition, downloaded from [7]. The trained network has 138,357,544 real- valued parameters and includes conv olutional layers, ReLU layers, zero-padding lay- ers, dropout layers, max-pooling layers, fully-connected layers, and a softmax layer . The experimental parameters are the same as for the previous two experiments, except that we work with 20,000 dimensions. Sev eral additional pairs of original and perturbed images are included in Figure 14 in Appendix of [20]. In Figure 10 we also gi ve two examples of street sign images. The image on the left is reported unsafe for the second layer with 6346 dimensional changes (0.2% of the 3,211,264 dimensions of layer L 2 ). The one on the right is reported safe for 20,000 dimensional changes of layer L 2 . It appears that more complex manipula- tions, in volving more dimensions (perceptrons), are needed in this case to cause a class change. Fig. 10. Street sign images. Found an adversarial example for the left image (class changed into bird house), but cannot ﬁnd an adversarial example for the right image for 20,000 dimensions. 5.1 The German T ra ﬃ c Sign Recognition Benchmark (GTSRB) W e ev aluate DL V on the GTSRB dataset (by resizing images into size 32*32), which has 43 classes. Figure 11 presents the results for the multi-path search. The ﬁrst case (approx. 20 minutes to manipulate) is a stop sign (conﬁdence 1.0) changed into a speed limit of 30 miles, with an L 1 distance of 0.045 and L 2 distance of 0.19. The conﬁdence of the manipulated image is 0.79. The second, easy , case (seconds to manipulate) is a speed limit of 80 miles (conﬁdence 0.999964) changed into a speed limit of 30 miles, with an L 1 distance of 0.004 and L 2 distance of 0.06. The conﬁdence of the manipulated image is 0.99 (a very high conﬁdence of misclassiﬁcation). Also, a “go right” sign can be easily manipulated into a sign classiﬁed as “go straight”. Figure 16 in [20] presents additional adversarial examples obtained when selecting single-path search. 6 Comparison W e compare our approach with two existing approaches for ﬁnding adversarial exam- ples, i.e., fast gradient sign method (FGSM) [36] and Jacobian saliency map algorithm “stop” to “30m speed limit” “80m speed limit” to “30m speed limit” “go right” to “go straight” Fig. 11. Adversarial examples for the network trained on the GTSRB dataset by multi- path search (JSMA) [28]. FGSM calculates the optimal attack for a linear approximation of the net- work cost, whereas DL V explores a proportion of dimensions in the feature space in the input or hidden layers. JSMA ﬁnds a set of dimensions in the input layer to manipulate, according to the linear approximation (by computing the Jacobian matrix) of the model from current output to a nominated target output. Intuitiv ely , the di ﬀ erence between DL V’ s manipulation and JSMA is that DL V manipulates over features discov ered in the activ ations of the hidden layer, while JSMA manipulates according to the partial deriv ativ es, which depend on the parameters of the network. Experiment 1. W e randomly select an image from the MNIST dataset. Figure 12 shows some intermediate and ﬁnal images obtained by running the three approaches: FGSM, JSMA and DL V . FGSM has a single parameter ,  , where a greater  represents a greater perturbation along the gradient of cost function. Giv en an  , for each input exam- ple a perturbed example is returned and we test whether it is an adversarial e xample by checking for misclassiﬁcation against the original image. W e gradually increase the pa- rameter  = 0 . 05 , 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4, with the last image (i.e.,  = 0 . 4) witnessing a class change, see the images in the top row of Figure 12. FGSM can e ﬃ ciently manipulate a set of images, but it requires a relati vely large manipulation to ﬁnd a misclassiﬁcation. For the JSMA approach, we conduct the experiment on a setting with parameters  = 0 . 1 and θ = 1 . 0. The parameter  = 0 . 1 means that we only consider adv ersarial e x- amples changing no more than 10% of all the pixels, which is su ﬃ cient here. As stated in [29], the parameter θ = 1 . 0, which allows a maximum change to ev ery pixel, can en- sure that fewer pixels need to be changed. The approach tak es a series of manipulations to gradually lead to a misclassiﬁcation, see the images in the middle row of Figure 12. The misclassiﬁed image has an L 2 (Euclidean) distance of 0.17 and an L 1 (Manhattan) distance of 0.03 from the original image. While JSMA can ﬁnd adversarial examples with smaller distance from the original image, it takes longer to manipulate a set of images. Both FGSM and JSMA follo w their speciﬁc heuristics to deterministically explore the space of images. Ho wever , in some cases, the heuristics may omit better adversarial examples. In the experiment for DL V , instead of giving features a speciﬁc order and manipulating them sequentially , we allow the program to nondeterministically choose features. This is currently done by MCTS (Monte Carlo T ree Search), which has a theo- retical guarantee of con ver gence for inﬁnite sampling. Therefore, the high-dimensional space is explored by following many di ﬀ erent paths. By taking the same manipulation Fig. 12. FGSM vs. JSMA vs. DL V , where FGSM and JSMA search a single path and DL V multiple paths. T op ro w: Original image (7) perturbed deterministically by FGSM with  = 0 . 05 , 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4, with the ﬁnal image (i.e.,  = 0 . 4) misclassiﬁed as 9. Middle row: Original image (7) perturbed deterministically by JSMA with  = 0 . 1 and θ = 1 . 0. W e show ev en numbered images of the 12 produced by JSMA, with the ﬁnal image misclassiﬁed as 3. Bottom row: Original image (7) perturbed nondetermin- istically by DL V , for the same manipulation on a single pixel as that of JSMA (i.e., s p ∗ m p = 1 . 0) and working in the input layer , with the ﬁnal image misclassiﬁed as 3. on a single pixel as that of JSMA (i.e., s p ∗ m p = 1 . 0) and working on the input layer, DL V is able to ﬁnd another perturbed image that is also classiﬁed as 3 but has a smaller distance ( L 2 distance is 0.14 and L 1 distance is 0.02) from the original image, see the images in the last row of Figure 12. In terms of the time taken to ﬁnd an adversarial example, DL V may tak e longer than JSMA, since it searches over many di ﬀ erent paths. FGSM (  = 0 . 1) (0 . 2) (0 . 4) DL V ( d im s l = 75) (150) (450) JSMA ( θ = 0 . 1) (0 . 4) L 2 0.08 0.15 0.32 0.19 0.22 0.27 0.11 0.11 L 1 0.06 0.12 0.25 0.04 0.06 0.09 0.02 0.02 % 17.5% 70.9% 97.2% 52.3% 79% 98% 92% 99% T able 1. FGSM vs. DL V (on a single path) vs. JSMA Experiment 2. T able 1 gives a comparison of robustness ev aluation of the three appraoches on the MNIST dataset. For FGSM, we vary the input parameter  accord- ing to the values { 0 . 1 , 0 . 2 , 0 . 4 } . For DL V , we select regions as deﬁned in Section 4.4 on a single path (by deﬁning a speciﬁc order on the features and manipulating them sequentially) for the ﬁrst hidden layer . The experiment is parameterised by v arying the maximal number of dimensions to be changed, i.e., d im s l ∈ { 75 , 150 , 450 } . For each input image, an adversarial example is returned, if found, by manipulating fewer than the maximal number of dimensions. When the maximal number has been reached, DL V will report failure and return the last perturbed example. For JSMA, the experiment is conducted by letting θ take the v alue in the set { 0 . 1 , 0 . 4 } and setting  to 1 . 0. W e collect three statistics, i.e., the av erage L 1 distance over the adversarial exam- ples, the av erage L 2 distance ov er the adversarial examples, and the success rate of ﬁnding adversary examples. Let L d ( x , δ ( x )) for d ∈ { 1 , 2 } be the distance between an input x and the returned perturbed image δ ( x ), and di ﬀ ( x , δ ( x )) ∈ { 0 , 1 } be a Boolean value representing whether x and δ ( x ) ha ve di ﬀ erent classes. W e let L d = P x in test set di ﬀ ( x , δ ( x )) × L d ( x , δ ( x )) P x in test set di ﬀ ( x , δ ( x )) and % = P x in test set di ﬀ ( x , δ ( x )) the number of examples in test set W e note that the approaches yield di ﬀ erent perturbed examples δ ( x ). The test set size is 500 images selected randomly . DL V tak es 1-2 minutes to manip- ulate each input image in MNIST . JSMA takes about 10 minutes for each image, but it works for 10 classes, so the running time is similar to that of DL V . FGSM works with a set of images, so it is the fastest per image. For the case when the success rates are very high, i.e., 97.2% for FGSM with  = 0 . 4, 98% for DL V with d im s l = 450, and 99% for JSMA with θ = 0 . 4, JSMA has the smallest a verage distances, followed by DL V , which has smaller average distances than FGSM on both L 1 and L 2 distances. W e mention that a smaller distance leading to a misclassiﬁcation may result in a lower rate of transferability [29], meaning that a misclassiﬁcation can be harder to wit- ness on another model trained on the same (or a small subset of) data-set. 7 Related W ork AI safety is recognised an an important problem, see e.g., [33,10]. An early veriﬁcation approach for neural networks was proposed in [30], where, using the notation of this paper , safety is deﬁned as the existence, for all inputs in a region η 0 ∈ D L 0 , of a corre- sponding output in another region η n ⊆ D L n . They encode the entire netw ork as a set of constraints, approximating the sigmoid using constraints, which can then be solved by a SA T solver , b ut their approach only works with 6 neurons (3 hidden neurons). A similar idea is presented in [32]. In contrast, we work layer by layer and obtain much greater scalability . Since the ﬁrst version of this paper appeared [20], another constraint-based method has been proposed in [21] which improves on [30]. While they consider more general correctness properties than this paper , the y can only handle the ReLU activ ation functions, by extending the Simplex method to work with the piece wise linear ReLU functions that cannot be expressed using linear programming. This necessitates a search tree (instead of a search path as in Simplex), for which a heuristic search is proposed and shown to be complete. The approach is demonstrated on networks with 300 ReLU nodes, but as it encodes the full network it is unclear whether it can be scaled to work with practical deep neural networks: for example, the MNIST network has 630,016 ReLU nodes. They also handle continuous spaces directly without discretisation, the beneﬁts of which are not yet clear , since it is argued in [19] that linear behaviour in high-dimensional spaces is su ﬃ cient to cause adversarial e xamples. Concerns about the instability of neural networks to adv ersarial examples were ﬁrst raised in [13,36], where optimisation is used to identify misclassiﬁcations. A method for computing the perturbations is also proposed, which is based on box-constrained optimisation and is approximate in view of non-con ve xity of the search space. This work is followed by [19], which introduced the much faster FGSM method, and [22], which employed a compromise between the two (iterativ e, but with a smaller number of iterations than [36]). In our notation, [19] uses a deterministic, iterative manipula- tion δ ( x ) = x +  sign ( O x J ( x , α x , n )) , where x is an image in matrix representation,  is a hyper-parameter that can be tuned to get di ﬀ erent manipulated images, and J ( x , α x , n ) is the cross-entropy cost function of the neural network on input x and class α x , n . There- fore, their approach will test a set of discrete points in the region η 0 ( α x , 0 ) of the input layer . Therefore these manipulations will test a lasso-type ladder tree (i.e., a ladder tree without branches) L ( η k ( α x , k )), which does not satisfy the cov ering property . In [26], instead of working with a single image, an ev olutionary algorithm is employed for a population of images. For each individual image in the current population, the manip- ulation is the mutation and / or crossover . While mutations can be nondeterministic , the manipulations of an indi vidual image are also following a lasso-type ladder tree which is not covering. W e also mention that [38] uses several distortions such as JPEG com- pression, thumbnail resizing, random cropping, etc, to test the robustness of the trained network. These distortions can be understood as manipulations. All these attacks do not lev erage any speciﬁc properties of the model family , and do not guarantee that the y will ﬁnd a misclassiﬁed image in the constraint region, e ven if such an image exists. The notion of robustness studied in [18] has some similarities to our deﬁnition of safety , except that the authors work with values averaged over the input distribution µ , which is di ﬃ cult to estimate accurately in high dimensions. As in [36,22], they use opti- misation without con ver gence guarantees, as a result computing only an approximation to the minimal perturbation. In [12] pointwise robustness is adopted, which corresponds to our general safety; they also use a constraint solver but represent the full constraint system by reduction to a con vex LP problem, and only verify an approximation of the property . In contrast, we work directly with activ ations rather than an encoding of ac- tiv ation functions, and our method exhaustively searches through the complete ladder tree for an adversarial example by iterative and nondeterministic application of manip- ulations. Further , our deﬁnition of a manipulation is more ﬂexible, since it allows us to select a subset of dimensions, and each such subset can hav e a di ﬀ erent region diameter computed with respect to a di ﬀ erent norm. 8 Conclusions This paper presents an automated veriﬁcation framework for checking safety of deep neural networks that is based on a systematic exploration of a region around a data point to search for adversarial manipulations of a giv en type, and propagating the analysis into deeper layers. Though we focus on the classiﬁcation task, the approach also generalises to other types of networks. W e have implemented the approach using SMT and v ali- dated it on sev eral state-of-the-art neural network classiﬁers for realistic images. The results are encouraging, with adversarial examples found in some cases in a matter of seconds when working with few dimensions, b ut the veriﬁcation process itself is expo- nential in the number of features and has prohibitive comple xity for larger images. The performance and scalability of our method can be signiﬁcantly improved through par- allelisation. It would be interesting to see if the notions of regularity suggested in [24] permit a symbolic approach, and whether an abstraction reﬁnement framework can be formulated to improv e the scalability and computational performance. Acknowledgements . This paper has greatly beneﬁted from discussions with sev- eral researchers. W e are particularly grateful to Martin Fraenzle, Ian Goodfellow and Nicolas Papernot. References 1. CIF AR10 model for Keras. https: // github.com / fchollet / keras / blob / master / e xamples / cifar10 cnn.py. 2. DL V. https: // github.com / v erideep / dlv . 3. Keras. https: // k eras.io. 4. Large scale visual recognition challenge. http: // www .image-net.or g / challenges / LSVRC / . 5. MNIST CNN network. https: // github.com / fchollet / k eras / blob / master / examples / mnist cnn.py. 6. Theano. http: // deeplearning.net / software / theano / . 7. VGG16 model for K eras. https: // gist.github .com / baraldilorenzo / 07d7802847aaad0a35d3. 8. Z3. http: // rise4fun.com / z3. 9. Luigi Ambrosio, Nicola Fusco, and Diego Pallara. Functions of bounded variation and free discontinuity pr oblems . Oxford Mathematical Monographs. Oxford Univ ersity Press, 2000. 10. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man ´ e. Concrete problems in AI safety . CoRR , abs / 1606.06565, 2016. 11. Fabio Anselmi, Joel Z. Leibo, Lorenzo Rosasco, Jim Mutch, Andrea T acchetti, and T omaso Poggio. Unsupervised learning of in variant representations. Theoretical Computer Science , 633:112–121, 2016. 12. Osbert Bastani, Y ani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. Measuring neural net robustness with constraints. CoRR , abs / 1605.07262, 2016. T o appear in NIPS. 13. Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pa vel Laskov , Giorgio Giacinto, and Fabio Roli. Ev asion attacks against machine learning at test time. In ECML / PKDD 2013 , pages 387–402, 2013. 14. Christopher M Bishop. Neural networks for pattern r ecognition . Oxford univ ersity press, 1995. 15. Mariusz Bojarski, Da vide Del T esta, Daniel Dworak owski, Bernhard Firner, Beat Flepp, Prasoon Goyal, La wrence D. Jackel, Mathew Monfort, Urs Muller , Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. End to end learning for self-driving cars. , 2016. 16. Gunnar E. Carlsson, T igran Ishkhanov , V in de Silva, and Afra Zomorodian. On the local behavior of spaces of natural images. International J ournal of Computer V ision , 76(1), 2008. 17. Lisa Anne Hendricks Dong Huk Park, Zeynep Akata, Bernt Schiele, T rev or Darrell, and Marcus Rohrbach. Attentive explanations: Justifying decisions and pointing to the e vidence. arxiv .or g / abs / 1612.04757 , 2016. 18. Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classiﬁers’ robustness to adversarial perturbations. CoRR , abs / 1502.02590, 2015. 19. Ian J. Goodfellow , Jonathon Shlens, and Christian Szegedy . Explaining and harnessing ad- versarial e xamples. CoRR , abs / 1412.6572, 2014. 20. Xiaowei Huang, Marta Kwiatkowska, Sen W ang, and Min W u. Safety veriﬁcation of deep neural networks. https: // arxiv .or g / abs / 1610.06940 , 2016. 21. Guy Katz, Clark Barrett, David Dill, Kyle Julian, and Mykel Kochenderfer . Reluplex: An e ﬃ cient SMT solver for v erifying deep neural networks. In CA V 2017 , 2017. T o appear . 22. Alex ey Kurakin, Ian Goodfellow , and Samy Bengio. Adversarial examples in the physical world. , 2016. 23. Y ann LeCun, Y oshua Bengio, and Geo ﬀ rey Hinton. Deep learning. Nature , 521:436–444, 2015. 24. St ´ ephane Mallat. Understanding deep con volutional networks. Philosohical T ransactions of the Royal Society A , 2016. 25. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR , abs / 1511.04599, 2015. 26. Anh Nguyen, Jason Y osinski, and Je ﬀ Clune. Deep neural networks are easily fooled: High conﬁdence predictions for unrecognizable images. In Computer V ision and P attern Recog- nition (CVPR ’15) , 2015. 27. Nicolas P apernot, Ian Goodfellow , Ryan Sheatsle y , Reuben Feinman, and P atrick McDaniel. clev erhans v1.0.0: an adv ersarial machine learning library . arXiv pr eprint arXiv:1610.00768 , 2016. 28. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Pr oceedings of the 1st IEEE Eur opean Symposium on Security and Privacy , 2015. 29. Nicolas Papernot, Patrick Drew McDaniel, Ian J. Goodfello w , Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial e xamples. CoRR , abs / 1602.02697, 2016. 30. Luca Pulina and Armando T acchella. An abstraction-reﬁnement approach to veriﬁcation of artiﬁcial neural networks. In CA V 2010 , pages 243–257, 2010. 31. Marco T ulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Ex- plaining the predictions of any classiﬁer . In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2016) , 2016. 32. Karsten Scheibler , Leonore Winterer , Ralf Wimmer , and Bernd Becker . T ow ards v eriﬁcation of artiﬁcial neural networks. In 18th W orkshop on Methoden und Beschreib ungssprachen zur Modellierung und V eriﬁkation von Schaltungen und Systemen” (MBMV) , pages 30–40, 2015. 33. Sanjit A. Seshia and Dorsa Sadigh. T o wards veriﬁed artiﬁcial intelligence. CoRR , abs / 1606.08514, 2016. 34. Karen Simonyan and Andrew Zisserman. V ery deep con volutional networks for large-scale image recognition. , 2014. 35. J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. computer: Benchmarking machine learning algorithms for tra ﬃ c sign recognition. Neural Networks , 32:323–332, 2012. 36. Christian Szegedy , W ojciech Zaremba, Ilya Sutskever , Joan Bruna, Dumitru Erhan, Ian Goodfellow , and Rob Fergus. Intriguing properties of neural networks. In International Confer ence on Learning Repr esentations (ICLR-2014) , 2014. 37. Vladimir V apnik. Principles of risk minimization for learning theory . In Advances in Neural Information Processing Systems 4, [NIPS Conference , Den ver , Colorado, USA, December 2-5, 1991] , pages 831–838, 1991. 38. Stephan Zheng, Y ang Song, Thomas Leung, and Ian Goodfellow . Improving the robustness of deep neural networks via stability training. In CVPR 2016 , 2016. A Input Parameters and Experimental Setup The DL V tool accepts as input a netw ork N and an image x , and has the following input parameters: – an integer l ∈ [0 , n ] indicating the starting layer L l , – an integer d im s l ≥ 1 indicating the maximal number of dimensions that need to be considered in layer L l , – the values of variables s p and m p in V l ; for simplicity , we ask that, for all dimensions p that will be selected by the automated procedure, s p and m p hav e the same v alues, – the precision ε ∈ [0 , ∞ ), – an integer d im s k , f indicating the number of dimensions for each feature; for sim- plicity , we ask that every feature has the same number of dimensions and dim s k , f = d im s k 0 , f for all layers k and k 0 , and – type of search: either heuristic (single-path) or Monte Carlo Tree Search (MCTS) (multi-path). A.1 T wo-Dimensional P oint Classiﬁcation Network – l = 0 – d im s l = 2, – s p = 1 . 0 and m p = 1 . 0, – ε = 0 . 1, and – d im s k , f = 2 A.2 Network f or the MNIST Dataset – l = 1 – d im s l = 150, – s p = 1 . 0 and m p = 1 . 0, – ε = 1 . 0, and – d im s k , f = 5 A.3 Network f or the CIF AR-10 Dataset – l = 1 – d im s l = 500, – s p = 1 . 0 and m p = 1 . 0, – ε = 1 . 0, and – d im s k , f = 5 A.4 Network f or the GTSRB Dataset – l = 1 – d im s l = 1000, – s p = 1 . 0 and m p = 1 . 0, – ε = 1 . 0, and – d im s k , f = 5 A.5 Network f or the ImageNet Dataset – l = 2 – d im s l = 20 , 000, – s p = 1 . 0 and m p = 1 . 0, – ε = 1 . 0, and – d im s k , f = 5 B Additional Adversarial Examples Found for the CIF AR-10, ImageNet, and MNIST Networks Figure 13 and Figure 14 present additional adversarial e xamples for the CIF AR-10 and ImageNet networks by single-path search. Figure 15 presents adversarial examples for the MNIST network by multi-path search. automobile to bird automobile to frog automobile to airplane automobile to horse airplane to dog airplane to deer airplane to truck airplane to cat truck to frog truck to cat ship to bird ship to airplane ship to truck horse to cat horse to automobile horse to truck Fig. 13. Adversarial examples for a neural netw ork trained on the CIF AR-10 dataset by single-path search C Additional Adversarial Examples f or the German T ra ﬃ c Sign Recognition Benchmark (GTSRB) Figure 16 presents adversarial e xamples obtained when selecting single-path search. labrador to life boat rhodesian ridgeback to malinois boxer to rhodesian ridgeback great pyrenees to kuv asz Fig. 14. Adversarial Examples for the VGG16 Network T rained on the imageNet Dataset By Single-Path Search 9 to 4 8 to 3 5 to 3 4 to 9 5 to 3 7 to 3 9 to 4 9 to 4 2 to 3 1 to 8 8 to 5 0 to 3 7 to 2 8 to 3 3 to 2 9 to 7 3 to 2 4 to 9 6 to 4 3 to 5 9 to 4 0 to 2 2 to 3 9 to 8 4 to 2 Fig. 15. Adversarial examples for the network trained on the MNIST dataset by multi- path search speed limit 50 (pro- hibitory) to speed limit 80 (prohibitory) restriction ends (other) to restriction ends (80) no ov ertaking (trucks) (prohibitory) to speed limit 80 (prohibitory) giv e w ay (other) to priority road (other) priority road (other) to speed limit 30 (pro- hibitory) speed limit 70 (pro- hibitory) to speed limit 120 (prohibitory) no ov ertaking (pro- hibitory) to go straight (mandatory) speed limit 50 (pro- hibitory) to stop (other) road narro ws (danger) to construction (danger) restriction ends 80 (other) to speed limit 80 (prohibitory) no ov ertaking (trucks) (prohibitory) to speed limit 80 (prohibitory) no ov ertaking (pro- hibitory) to restriction ends (ov ertaking (trucks)) (other) priority at next intersec- tion (danger) to speed limit 30 (prohibitory) unev en road (danger) to tra ﬃ c signal (danger) danger (danger) to school crossing (danger) Fig. 16. Adversarial examples for the GTSRB dataset by single-path search D Architectur es of Neural Networks Figure 17, Figure 18, Figure 19, and Figure 20 present architectures of the networks we work with in this paper . The network for the ImageNet dataset is from [34]. Fig. 17. Architecture of the neural network for two-dimensional point classiﬁcation Fig. 18. Architecture of the neural network for the MNIST dataset Fig. 19. Architecture of the neural network for the CIF AR-10 dataset Fig. 20. Architecture of the neural network for the GTSRB dataset

Safety Verification of Deep Neural Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment