Deriving Sorting Algorithms

Deriving Sorting Algorithms Jos ´ e Bacela r Almeida and Jorge Sousa Pin to Departamen to de Inform ´ atica Univ ersidade do Minho 4710- 057 Braga , P ortuga l { jba,jsp } @di. uminho.p t T ec hn. Rep ort DI-PURe-06.04.01 2006, April PURe Program Understandin g a nd Re-engineering: C alculi and Ap p lications (Pro ject POSI/ICHS /44304/2 002 ) Departamen to de Inform´ atica da Univ ersidade do Minho Campus de Gualtar — Braga — P ortugal DI-PURe-06.04.01 Deriving Sor ting Al gorithms by Jos ´ e Bacel ar Almei da and Jorge Sousa Pin to Abstract This pap er shows ho w 3 w ell-kno wn sorting algorithms can b e deriv ed by simila r sequences of transformation steps from a common sp eciﬁcation. Eac h deriv ation uses an auxiliary alg orithm based on in sertion int o an in termediate stru ctur e. The pro ofs giv en in v olv e b oth indu ctive and coinductiv e reasoning, wh ic h are here expressed in the s ame pr ogram calculation framework, based on u nicit y prop erties. Deriving Sorting Algorithms Jos ´ e Bacela r Almeida and Jorge Sous a Pin to { jba,jsp } @di. uminho.p t Departamento d e Inform´ atica Universidade do Minho 4710-057 Braga, P ortugal Abstract. This pap er prop oses new deriv ations of three well -known sorting algorithms, in their functional formulatio n. The ap p roac h w e use is based on th ree main ingredients: ﬁrst, the al- gorithms are derive d from a simpler algorithm, i.e. the speciﬁcation is already a solution to the p roblem (in t his sense our deriv ations are program transformatio ns). Secondly , a mixture of indu ctive and coinductive arguments are u sed in a uniform, algebraic style in our reasoning. Finally , the a pproach uses structural in v ariants so as to strengthen the equ ational reasoning with logical arguments that cannot b e captured in the algebraic framew ork. 1 In tro duction This p ap er p resen ts new deriv ations of three well-kno wn sortin g algorithms, in th e f u nctional setting. Our approac h can b e summarized as follo w s: 1. It is based on pr o gr am tr ansformatio n in the sense that we dep art from a sp eciﬁcation that is already a (not very eﬃcien t) algorithm for s olving the prob lem. T raditional d eriv ations of sorting algorithms (bu ilding on the wo rk of Bur stall and Darlington) formalize the “is sorted” prop ert y on lists. Ins tead, we tak e the insertion sort algorithm to b e a sp eciﬁcation of sorting, and d eriv e, by sequences of correct steps, more eﬃcien t algorithms fr om it. 2. The algorithms that we derive follo w the derive and c onquer strategy and as such are not structurally recurs iv e on th eir argument s. F or th is reason a com bination of ind u ctiv e and coinductive reasoning must b e used. W e adhere h ere to the equational st yle of rea- soning usually kn o wn to functional p rogrammers as pr o gr am c alculation , which relies on uniqueness prop erties of certain recursion patterns. Although the pro ofs are indep enden t of this c hoice, we ﬁnd that th is allo ws for greater uniformity b et w een the ind uctiv e and coinductiv e arguments. 3. In t wo of our three deriv ations, the equational reasoning m ust b e strengthened by using in v arian ts on certain in termediate data-structures, since some of the equalities one needs to pro v e are not un iv ersal f or a giv en data-t y p e. F or instance, it is not true that th e in-order tra v ersal of any b inary tree pro du ces a sorted list. This is ho w ev er true for trees pr o duced in a certain wa y . As far as we kno w there is ve ry little work on p r ogram calculation strengthened with in v ariant s. 4. The algorithms are derived as hylomorph isms , i.e. as explicit comp ositions of a recur - siv e fun ction with a co-recursive one, with an in termediate data-structure of a tr ee t yp e, whic h can b e defor este d to pro du ce th e standard form ulation of the algorithms. Sorting algorithms ha v e b een deﬁn ed as hylo morphims elsewhere [1 ]. The pap er is structured as f ollo ws: Sectio n 2 reviews stand ard material on sorting in th e functional setting, including the algorithms that will b e considered in the main sections of the pap er. Secti on 3 contai ns bac kgroun d material on p rogram calculation, b ased on un icit y (or univ ersal) prop erties of recurs ion p attern op erators. S ection 4 then introd uces t w o generic algorithms f or sorting, based on insertion into an in termediate structur e of a con tainer t yp e (in a left wards and right w ards fashion resp ectiv ely). Sections 5, 6, and 7 p resen t th e deriv a- tions of merge sort, quic ksort, and heapsort, which are based on instan tiations of the generic algorithms. Finally w e conclude the pap er in Section 8. 2 Sorting Homomorphisms and D ivide-and- conquer Algorithms Consider a v ery s imple algorithm for s ortin g a list, usu ally kno wn u nder th e name of insertion sort . W e giv e it here written in Haske ll. isort [] = [] isort (x:xs) = insert x (isort xs) where insert inserts an elemen t in a sorted list. Th is is certainly a n atural w a y of sorting a list in a traditional functional language: sin ce the list stru cturally consists of a head element x and a tail su b list xs , it is natur al to recursivel y s ort xs and then com bine this sorted list with x . This pattern of r ecursion can b e captured by th e foldr op erator, r esulting in the follo wing deﬁnition where explicit recursion has b een remo v ed. isort = foldr insert [] Actually , an y sorting f unction is a list homomo rphism [3,6], which means that if the initial unsorted list is sp lit at an y p oin t and the t w o resulting sublists are recursivel y sorted, there exists a binary op erator ⊙ that can com b ine the t w o results to giv e the ﬁ nal s orted list. iso rt ( l 1 + + l 2 ) = ( isort l 1 ) ⊙ ( iso rt l 2 ) This ⊙ op erator is of course the (linear time) function of type [ a ] → [ a ] → [ a ] that merges t w o sorted lists: merge [] l = l merge l [] = l merge (h1:t1 ) (h2:t2) | (h1<=h2) = h1:(me rge t1 (h2:t2)) | otherwise = h2:(merge (h1:t1) t2) The op erator ⊙ is asso ciativ e with the empt y list as un it, forming a monoid o v er lists. It is also commutativ e. insert can b e deﬁ ned in terms of ⊙ as follo w s insert x l = [ x ] ⊙ l (1) Insertion sort ru ns in qu adratic time. Most well-kno w eﬃcien t sorting algorithms p erform recursion twic e , on sub sequences obtained from the input sequence, and then combine th e results (for this reason they are called divide-and-c onquer algorithms). As suc h, they do n ot ﬁt the simple iteration p attern captur ed b y foldr . In the follo w ing w e describ e three diﬀerent divide-and-conquer algorithms. He apsort. T he principle b ehind heapsort is to tra v erse the list to obtain, in linear time, its minim um elemen t y and a p air of lists o f app ro ximately equal size, conta ining the remaining elemen ts (function haux ). Th e lists are then recursively sorted and merged together, and y pasted at the head of the resulting list. haux x [] = (x,[],[ ]) haux x (y:ys) = let (z,l,r ) = haux y ys in if x (Eit her (a,b,b) ())) -> b -> BTree a unfoldBT ree g x = case (g x) of Right () -> Empty Left (y,l,r) -> Node y (unfold BTree g l) (unfol dBTree g r) foldBTre e :: (a -> b -> b -> b) -> b -> BTree a -> b foldBTre e f e Empty = e foldBTre e f e (Node x l r) = f x (foldBT ree f e l) (foldBTree f e r) unfoldLT ree :: (b -> (Eit her (b,b) (Maybe a))) -> b -> LTree a unfoldLT ree g x = case (g x) of Right y -> Leaf y Left (l,r) -> Branch (unfoldLT ree g l) (unfoldL Tree g r) foldLTre e :: (b->b->b) -> ((Maybe a)->b) -> LTree a -> b foldLTre e f e (Leaf x) = e x foldLTre e f e (Branch l r) = f (fold LTree f e l) (foldLTree f e r) T able 1. T yp es and recur s ion patterns for bin ary trees The fold recursion p attern can b e generaliz ed for any regular t yp e; in the conte xt of the algebraic theory of data-t yp es folds are datatyp e-generic (in the sense that they are parameterized by the base fun ctor of the t yp e), and usu ally called c atamorp hisms . The resu lt of a fold on a n o de of some tree data-t yp e is a com bin ation of th e results of r ecursiv ely pro cessing eac h subtree (and the conten ts of the no d e, if not empt y). The d ual notion is the unf old (also called anamorph ism ): a function that constru cts (p os- sibly inﬁn ite) trees in the most natural wa y , in th e sense that the subtrees of a no d e are recursiv ely constructed by unf olding. In the presen t pap er we will n eed to w ork w ith tw o ﬂa vours of binary trees: leaf-lab elled (for merge sort) and no de-lab elled trees (for the remaining algorithms). These t yp es, and the corresp ondin g recursion patterns, are d eﬁned in T able 1. In p rinciple, a fold is a recursive function whose domain is a t yp e d eﬁned as a least ﬁxp oin t (an initial algebra), and an unf old is a recursive f unction whose co domain is deﬁn ed as a greatest ﬁx p oin t (a ﬁnal coalgebra). How ev er, in lazy languages suc h as Haskel l, least and greatest ﬁxp oin ts coincide, and are simply called recursive t yp es. A t an abstract lev el, folds (as well as other str u ctured form s of recursion, su c h as pr imitiv e recursion) enjoy an initialit y prop ert y among the algebras o f the b ase functor of the d omain t yp e. In concrete terms, this m ak es p ossible the u s e of induction as a pro of tec hnique. Du ally , unfolds are ﬁn al coalge bras; tec hn iques for r easoning ab out unf olds include ﬁxp oint induction and c oinduction [5]. Unicity. The pr o gr am c alculation approac h is b ased on the u s e of initialit y an d ﬁnalit y directly as an equatio nal pro of principle. Both prop erties can b e formulated in the same f ramew ork, as universal or unicity prop erties. In this pap er w e generally adh ere to the equational st yle for pro ofs, bu t often resort to induction f or the sak e of simp licit y (in particular when n one of the sides of the equalit y one w an ts to prov e is d irectly expressed using a recur s ion pattern, applying a u nicit y prop ert y may requ ire substanti al m an ip ulation of the expressions). See [4] for a study of program calculation carr ied out pu rely b y using fusion, includin g an adequate treatmen t of strictness conditions. W e giv e b elo w the unicit y prop erties that we sh all require in the rest of the pap er, for the f oldr , unfoldL T ree , and u nfoldBT ree op erators. A wea k er fusion la w for foldr is also sho wn, whic h is easily deriv ed from unicit y . f = foldr g e ⇔ { u n icit y-foldr }    f [ ] = e for all x, xs, f ( x : xs ) = g x ( f xs ) h ◦ foldr g e = foldr g ′ e ′ ⇐ { foldr -fusion }  h e = e ′ h ◦ ( g x ) = ( g ′ x ) ◦ h f = unfoldL T ree g ⇔ { u n icit y-unfoldL T ree } for all x, f x = case ( g x ) of Right y → Leaf y Left ( l , r ) → Bra nch ( f l ) ( f r ) f = unfoldBT ree g ⇔ { u n icit y-unfoldBT ree } for all x, f x = cas e ( g x ) of Right () → Empty Left ( y , l , r ) → Nod e y ( f l ) ( f r ) Hylomorph isms. The comp osition of a fold o ver a r egular t yp e T with an unfold of that t yp e is a recursive fun ction whose recursion tree is shap ed in the same wa y as T . Such a deﬁnition can b e defor este d [10], i.e. the construction of the intermediate data-structures can b e eliminated, yielding a direct recursive deﬁnition. As an example, the deﬁnition h = ( foldL T ree f e ) ◦ ( unfoldL T ree g ) can b e deforested to giv e: h x = case (g x) of Right y -> e y Left (l,r) -> f (h l) (h r) This corresp onds to a new generic recurs ion pattern, called a hylomorph ism . Hylomor- phisms do not p ossess a unicit y prop ert y , but they are still useful f or reasoning ab out pro- grams, using the prop erties of th eir fold and unf old comp onents. In p articular, hylomorphisms are useful for capturing the structure of functions that are not dir ectly deﬁned by structured recursion or co-recursion, as is the case of the divide-and-conquer sorting algorithms: the unfold comp onen t tak es the u n sorted list and constructs a tree; the fold iterates o v er this structure to pro duce the sorted list. The sorting algorithms introduced in the previous sec- tion were studied as h ylomorhpisms in [1]. In the present pap er we use this hylomorphic structure to calculate these algorithms from a common sp eciﬁcation. 4 Sorting b y Insertion In the rest of the pap er we will rep eatedly apply the follo wing prin ciples. Consider a type constructor C and the follo win g functions: istC : a → C a → C a C2list : C a → [ a ] The idea is that C a is a con tainer t yp e for elemen ts of type a (typicall y a tree-shap ed type); istC inserts an element in a contai ner to giv e a new cont ainer; and C2lis t conv erts a con tainer in to a sorted list of t yp e a . A generic s orting algorithm can then b e deﬁned, with a con tainer acti ng as in termediate data-structure. The idea is that elemen ts are inserted one b y one b y folding o ver the list; a sorted list is then obtained usin g C2list . ε :: C a is an app ropriate “empt y v alue”. iso rtC = C2list ◦ ( foldr istC ε ) (2) It is easy to see that the algorithm is correct if the intermediate data-structure conta ins exactly th e same ele ment s as the initia l list, and C2 list someho w pro duces a sorted list from the elements in the intermediate s tr ucture. This can b e formalized b y constructing a pro of of equiv alence to insertion sort, whic h giv es necessary conditions for the algorithm to b e correct. C2list ◦ ( foldr istC ε ) = foldr insert [ ] ⇔ { u n icit y-foldr }  C2list ( foldr ist C ε [ ]) = [ ] C2list ( foldr ist C ε ( x : xs )) = i nsert x ( C2list ( foldr istC ε xs )) ⇔ { d ef. foldr }  C2list ε = [ ] C2list ( istC x ( foldr istC ε x s )) = insert x ( C2list ( foldr istC ε xs )) Alternativ ely one can u se f usion, whic h leads to stronger conditions: C2list ◦ ( foldr ist C ε ) = foldr insert [ ] ⇐ { foldr fu sion }  C2list ε = [ ] C2list ◦ ( istC x ) = ( insert x ) ◦ C2lis t Th us for eac h concrete cont ainer t yp e it is suﬃcien t to p r o v e equation 3 and one of 4 or 5 to establish that the corresp onding function iso rtC is indeed a sorting algorithm: C2list ε = [ ] (3) C2list ◦ ( istC x ) = ( insert x ) ◦ C2list (4) C2list ( istC x ( foldr istC ε xs )) = insert x ( C2list ( foldr istC ε xs )) (5) Note that together, equations 3 and 4 m ean that C2list is a h omomorphism b etw een th e structures ( C a, istC , ε ) and ([ a ] , in sert , [ ]). Observe that the abov e algo rithm constructs the intermediate stru cture by inserting the elemen ts fr om right to left. A tail-rec ursive v ersion of iso rtC can b e d eriv ed by a standard transformation based on fus ion [2]. T his will constru ct the in termediate structure in a right- w ards fashion. W e start by writing a sp eciﬁcatio n for this f unction iso rtC t . iso rtL T = L T2list ◦ buildL T buildL T = foldr istL T ( Leaf No thing ) L T2list = foldL T ree ( ⊙ ) t where t Nothing = [ ] t ( Just x ) = [ x ] istL T x ( Leaf Nothing ) = Leaf ( Just x ) istL T x ( Leaf ( Just y )) = Branch ( Leaf ( Just x )) ( Leaf ( Just y )) istL T x ( Branch l r ) = Branch ( istL T x r ) l T able 2. Sorting by insertion in a leaf tree iso rtC t : [ a ] → C a → [ a ] iso rtC t l y = ( is o rt l ) ⊙ ( C2list y ) The tail-recursive fun ction uses an extra accumula tor argument of the chose n con tainer type. In the call iso rtC t l y , l is the list that remains to b e sorted, and the accumulat or y co nta ins elemen ts already in serted in the conta iner. Th e right-hand side of the equalit y states how the ﬁnal result can b e obtained using insertion sort and the con v ersion of y to a list. The follo win g deﬁn ition satisﬁes th e sp eciﬁcation (pr o of is giv en in App endix A.1). iso rtC t = f oldr is tC ′ C2list where istC ′ x f y = f ( istC x y ) Then iso rtC t l ε = iSo rt l holds as an immediate consequen ce of the sp eciﬁcatio n and eq. (3) ab o v e. An alternativ e version of this can b e deﬁned, which separate s the tail-recursiv e con- struction of the intermediate stru cture from its conv ersion to a sorted list (note ap ε f = f ε ): iso rtC ′ = C2li st ◦ ( a p ε ) ◦ ( foldr istC ′ id ) (6) It is straig ht forward to establish th at iso rtC ′ l = iSo rtC t l ε , thus isortC ′ = i So rt . In the n ext s ections, the con tainer typ e and its empty v alue, together w ith the fun ctions istC and C2list , will b e instant iated to pro duce three diﬀerent insertion-based algorithms, using sc hemes 2 and 6 . Eac h algorithm will b e pro ved correct by cal culating eqs. 3, and 4 or 5 ab o v e. The next step will b e to transf orm eac h algorithm in to a hylomorphism that can th en b e deforested, resulting in a w ell-kno w n sorting algorithm. F or this, it will suﬃce to transform the function that constructs the in termediate tree int o co-recursive f orm . 5 A Deriv ation of Merge Sort Our ﬁrst concrete sorting algorithm based on insertion into an in termediate str u cture uses le af-lab el le d b inary trees. This is giv en in T able 2. W e remark that to co ver the case of the empt y list, a Ma yb e type is used in the lea v es of the trees. Prop osition 1. i so rtL T is a sorting algorith m. Pr o of. W e instanti ate eqs. (3) and (4). Note that the empt y v alue h ere is ε = Leaf Nothing . L T2list ( Leaf Nothing ) = [ ] L T2list ◦ ( ist L T x ) = ( insert x ) ◦ L T2list The ﬁrst equ alit y if tru e b y deﬁn ition; th e second can b e pro v ed by induction, or alter- nativ ely usin g fusion. The latter pro of is giv en in App endix A.2. T ogether these equations establish that L T2list is a homomorphism b et ween the str uctures ( L T ree a, i stL T , Leaf Nothing ) and ([ a ] , insert , [ ]) . It is also easy to see that the in termediate tree is b alanc e d : the diﬀerence b et w een the heigh ts of the subtrees of a n o de is nev er greater than one, since subtrees are s w app ed at eac h insertion step. Note that th e in sertion function istL T was carefully designed w ith eﬃciency in mind, which gran ts execution in time O ( N lg N ); other s olutions w ould still lead to sorting algorithms, alb eit less eﬃcien t. Prop osition 2. The tr e es c onstructe d by buildL T ar e b alanc e d. Pr o of. It can b e p r o v ed by ind uction on the structure of the argument list that either the subtrees of the constructed tree ha v e the same height, or the h eigh t of the left subtree is greater than the heigh t of the right su b tree b y one unit. The function istL T pr eserv es this in v arian t. The next transformation step applies to th e fu nction that constructs the in termediate tree. An alternativ e wa y of constructing a balanced tree is b y unfolding : the initial list is trav ersed and its element s placed alternately in t w o s u bsequences, which are then used as arguments to recursiv ely construct the s u btrees. Note that the sequences will hav e appr o ximately th e same length. F or singular and empty lists, lea v es are returned. unfoldmso rt = unfoldL T ree g where g [ ] = Right Nothing g [ x ] = Right ( Just x ) g xs = Left ( maux xs ) maux [ ] = ([ ] , [ ]) maux ( h : t ) = ( h : b, a ) where ( a, b ) = maux t Prop osition 3. The ab ove function c onstructs the same interme diate tr e es as those obtaine d by folding over the ar gument list: foldr istL T ( Leaf Nothing ) = unfoldmso rt Pr o of. W e use the un icit y prop ert y of leaf-tree unfolds: iso rtH = H2list ◦ buildH buildH = foldr istH Empty H2list = foldr aux [ ] where aux x l r = x : ( l ⊙ r ) istH x Empty = Node x Empty Empty istH x ( No de y l r ) | x < y = Node x ( istH y r ) l | otherwise = Node y ( istBST x r ) l T able 3. Sorting by insertion in a heap foldr i stL T ( Leaf Nothing ) = unfoldL T ree g ⇔ { u n icit y-unfoldL T ree } for all x, ( foldr ist L T ( Leaf Nothing )) x = case ( g x ) of Right y → Leaf y Left ( l , r ) → Bra nch ( f l ) ( f r ) ⇔ { by cases }                Leaf Nothing = Leaf Nothing if x = [ ] Leaf ( Just h ) = Leaf ( Jus t h ) if x = [ h ] istL T h 1 ( foldr ist L T ( Leaf Nothing ) ( h 2 : t )) = B ranch ( foldr istL T ( Le af Nothing ) l ) ( foldr istL T ( Le af Nothing ) r ) if x = h 1 : h 2 : t where ( l, r ) = maux ( h 1 : h 2 : t ) And the la st equalit y can b e easily pr o v ed by induction on th e structure of t . Substituting this in the deﬁn ition of iso rtL T yields a h ylomorphism th at is of course still equiv alen t to insertion sort. It is immediate to see that this can b e deforested, and the result is m er ge sort: L T2list ◦ unf oldmso rt = msort 6 A Deriv ation of H eapsort In the he apsort algorithm, one computes the minim um of th e list prior to the r ecursiv e calls. This will d etermin e that eac h no de of the in termediate stru cture (the recursion tree) this minim um for some tree; it is th us a binary no de-lab el le d tree. W e rep eat the program tak en for the deriv ation of the merge sort: w e design a function that inserts a single elemen t in the intermediate tr ee ( istH ), iterate this fu nction o ver the initial list ( buildH ) and then p ro vide a fun ction that reco v ers th e ordered list from the tree ( H2list ). These functions are sho wn in T able 3. Prop osition 4. i so rtH is a sorting algorithm. Pr o of. W e instanti ate eqs. (3) and (5). W e set ǫ = Empt y , and th us eq. (3) results directly from the d eﬁnition. F or eq. (5), we need to pr o v e th at for every list l , insert x ( H2list ( buildH l )) = H2list ( ist H x ( buildH l )) . In order to prov e this, w e rely on the fact that trees generated b y buildH are alwa ys he aps , i.e. the ro ot elemen t is the least of the tree. The complete deriv ation is p resent ed in app end ix B (Prop ositions 8 and 1). Note that in order to pro v e the correctness of this algo rithm, we cannot rely on th e strongest hyp othesis giv en by eq. 4 (obtained from the us e of the fus ion la w) as we hav e done for merge sort. The reason for this is that, for an arbitrary tree t , insert x ( H2list t ) 6 = H2list ( istH x t ) . On the other hand , the weak er requisite giv en by eq. 5 (obtained by the u se of unicit y or induction) retains the information that w e r estrict our atten tion to trees constructed by buildBST , and these will satisfy the required equalit y . W e also n ote that the intermediate tree is again balanced (essen tially b y th e same argumen t used for merge s ort). This means that this sorting algorithm also executes in time O ( N lg N ). It r emains to s ho w that the int ermediate tree can b e constructed coindu ctiv ely . F or that, consider the follo w ing fu nction: unfoldhso rt = unfoldBT ree g where g [ ] = Right () g ( x : xs ) = Left ( haux x x s ) haux x [ ] = ( x, [ ] , [ ]) haux x ( y : y s ) | x < m = ( x, m : b, a ) | otherwise = ( m, x : b, a ) where ( m, a, b ) = haux y y s Prop osition 5. The ab ove function c onstructs the same interme diate tr e es as those obtaine d by folding over the ar gument list: buildH = unf oldhso rt Pr o of. unfoldhso rt = buildH ⇔ { by u nicit y − un foldBT ree }    buildH [ ] = Empt y buildH ( x : xs ) = No de z ( buildH a ) ( buildH b ) where ( z , a, b ) = haux x x s ⇔ { d eﬁ nitions }    Empt y = Empt y istH x ( buildH xs ) = No de z ( buildH a ) ( bui ldH b ) where ( z , a, b ) = haux x x s The second equalit y is prov ed by structur al induction on xs . F or the base case ( xs = [ ]), it follo ws directly from ev aluating the deﬁnitions. F or the inductive step ( xs = y : y s ), let u s assume that ( z , a, b ) = ( haux y y s ). The deﬁn ition of haux tell us that ( z ′ , a ′ , b ′ ) = ( haux x ( y : y s )) = ( ( x, z : b, a ) if x < z , ( z , x : b, a ) if x ≥ z . iso rtBST = BST2list ◦ buildBST buildBST = ( ap Empty ) ◦ b Acc where ap x f = f x bAcc = foldr aux id aux x f a = f ( is tBST x a ) BST2list = foldBT ree aux [ ] where aux x l r = l + +( x : r ) istBST x ( Empty ) = Node x Empty Empty istBST x ( Nod e y l r ) | x < y = Node y ( istBST x l ) r | otherwise = Node y l ( istBST x r ) T able 4. Sorting by insertion in a binary searc h tree Th us, istH x ( buil dH ( y : y s )) = { def. buildH } istH x ( ist H y ( buildH y s )) = { induction hypotheses } istH x ( No de z ( buildH a ) ( bui ldH b )) = { def. istH } ( No de x ( istH z ( buildH b )) ( buildH a ) if x < z , No de z ( istH x ( buildH b )) ( buildH a ) if x ≥ z . = { def. of z ′ , a ′ , b ′ } No de z ′ ( buildH a ′ ) ( buil dH b ′ ) As w ould b e exp ected, the h ylomorphism obtained replacing bui ldH b y unfoldhso rt can b e deforested, and the result is the original hso rt . H2list ◦ unfoldhsort = hsort 7 A Deriv ation of Quicksort In the quic ksort algorithm, the activit y p erf ormed prior to the recursive calls is diﬀerent from that in h eapsort: instead of ﬁ nding the minimum of the list, the h ead of the list is u sed as a pivot for splitting the tail. Again, th e int ermediate structur e is a no de-lab el le d binary tree. But no w, its ordering prop erties are diﬀeren t — the constructed trees will b e binary se ar ch tr e es , and it suﬃces to tra v erse these trees in-or der to pro duce the desired sorted list. F ollo w ing the same line as in the deriv ation of the previous algo rithms, w e d eﬁne an algorithm that iterativ ely in serts elements f r om a list in to a binary tree and th en reconstructs the list b y the in-or der tra versal. This algorithm is giv en in T able 4. Observe th at this algorithm iterates on th e initial list f rom left to righ t (w e m ay think of it as using th e Hask ell foldl op erator, but we write it as a higher-ord er function u sing foldr , to exploit the applicat ion of the rules presente d earlier). This will b ecome eviden t b elo w when w e r ep lace this fu nction b y one that constructs th e in termediate tree corecursiv ely . F or the correctness argument, we kno w that th e order of tra versal for the initial list is irrelev ant (as sho wn in Section 4). Prop osition 6. i so rtBST is a sort ing algorithm. Pr o of. W e ins tan tiate eqs. 3 and 5. W e set ǫ = Empt y , and thus Equatio n 3 results directly from the d eﬁnition. F or Equ ation 5, w e need to prov e that for ev ery list, insert x ( BS T2list ( buildBST l )) = BST 2list ( istB ST x ( buil dBST l )) . In ord er to p ro v e this, we rely on th e fact that trees generated b y buildBS T are alwa ys bin ary searc h trees. The complete deriv ation is presented in app endix B (Prop ositions 10 and 2). T o obtain th e w ell-kno w n qu ic ksort algorithm, we need to replace the iterated insertion function buildBS T by an unfold. unfoldqso rt = unfoldBT ree g where g [ ] = Right () g ( x : xs ) = Left ( qaux x x s ) qaux x [ ] = ( x, [ ] , [ ]) qaux x ( y : y s ) | y < x = ( x, y : b, a ) | otherwise = ( m, a, y : b ) where ( a, b ) = qaux x y s Prop osition 7. The ab ove function c onstructs the same interme diate tr e es as those obtaine d by folding over the ar gument list: buildBST = unfoldqso rt Pr o of. unfoldqso rt = buildBST ⇔ { u n icit y-unfolBT ree }    buildBST [ ] = Empt y buildBST ( x : xs ) = No de x ( bA cc a Empty ) ( bAcc b Empt y ) where ( a, b ) = q aux x xs ⇔ { d eﬁ nitions }    bAcc [ ] Empt y = Empty bAcc ( x : xs ) Empty = Node x ( bAcc a Empt y ) ( bAcc b Empt y ) where ( a, b ) = q aux x xs ⇔ { simp liﬁcation }    Empt y = Empty bAcc xs ( No de x Empt y Em pty ) = No d e x ( bA cc a Empty ) ( bAcc b Empt y ) where ( a, b ) = q aux x xs W e pro v e the second equalit y in a sligh tly strengthened form ulation. F or ev ery tree ( Node x l r ) and list xs , bAcc xs ( No de x l r ) = Node x ( bAcc a l ) ( bAcc b r ) where ( a, b ) = q aux x xs By induction on th e s tructure of xs . F or the base case ( xs = [ ]), it follo ws directly from ev aluating th e deﬁnitions. F or the indu ctiv e s tep ( xs = y : y s ), w e r eason b y cases. If x < y , then bAcc ( y : y s ) ( No de x l ′ r ′ ) = No de x ( bAcc a ′ l ′ ) ( bAcc b ′ r ′ ) where ( a ′ , b ′ ) = qaux x ( y : y s ) ⇔ { d eﬁ nition, x < y } bAcc y s ( istB ST y ( No de x l ′ r ′ ) = No de x ( bAcc a l ′ ) ( bAcc ( y : b ) r ′ ) where ( a, b ) = qaux x y s ⇔ { d eﬁ nition, x < y } bAcc y s ( No de x l ′ ( istBST y r ′ ) = No de x ( bAcc a l ′ ) ( bAcc b ( i stBST y r ′ )) where ( a, b ) = qaux x y s ⇔ { ind uction hypopth eses } bAcc y s ( No de x l ′ ( istBST y r ′ ) = bAcc y s ( No de x l ′ ( istBST y r ′ )) where ( a, b ) = qaux x y s Similarly for the case ( x ≥ y ). T his concludes the pro of. W e conclud e with the statement that the hylomorphism obtained is, as intended, the forested v ersion of the original quic ksort algorithm. BST2list ◦ unfoldqso rt = qso rt 8 Conclusion This p ap er illustrates the strengths of the “program calculatio n” st yle of reasoning, in particu- lar the simplicit y of u sing the unicit y pr op erty of u nfolds as an alternativ e to usin g coindu ctiv e principles based on bissimulations, and more generally the stru ctural asp ects of pro ofs. In- ductiv e pro ofs are ho wev er often m uc h s im p ler to carry out than u sing th e equational style, so w e are n ot dogmatic ab out the st yle in whic h pro ofs are presente d. Apart fr om the pro ofs of correctness whic h as far as we kn o w are new, the con tributions of th is pap er include (tw o versions of ) a ge neric sorting algorithm, of whic h 3 concretizatio ns are used. The r ole play ed b y stru ctural inv arian ts in this study should also b e emphasized. Ev en when th ey are not crucial to the calculations, in v ariant s pr o vide a muc h more natural setting for conducting them. Morevoer, eﬃciency prop erties of the algorithms, which w e ha v e left out of th is study , can only b e established using wel l-b alancing i nv arian ts on th e in termediate trees (these in v arian ts can easily b e pro v ed by induction for b oth i so rtL T and iso rtH , whic h run in time O ( N lg N )). Another application of inv ariants w ould come up in a generic pr o gr amming setting: the C2list f unctions would ha v e a single deﬁnition for every tree t yp e: the function w ould merge together the lists resulting f rom recursive calls with the (wrapp ed) con ten ts of no des and lea ves. F or eac h co ncrete intermediate t yp e, th e structural inv arian ts would then allo w us to reﬁne the d eﬁnition in to the one giv en in this pap er. This stu dy op ens the w a y to a ric her int erpla y b et ween inv ariants and recursion patterns – a topic that is not explored in this pap er, but is b eing curr en tly inv estigated by the authors. Finally , w e ha ve le ft completely out of the pap er a study of stability of the sorting algo- rithms, an imp ortant p rop erty in th e presence of data-t yp es for whic h the order is not total. Some of the algorithms derived are stable and others are not, whic h means that und er this premise, whic h in v alidates commuta tivit y of ⊙ , they are n ot all equiv alen t. References 1. Lex Augusteijn. Sorting morphisms. In S. S wierstra, P . Henriques, and J. Oliveira, editors, A dvanc e d F unctional Pr o gr amming , LNCS T utorials, pages 1–27. Springer-V erlag, 1998. 2. Richard Bird. The promotion and accumulation strategies in transformational programming. ACM T r ans. Pr o gr am. L ang. Syst. , 6(4):487– 504, 1984. 3. Richard Bird. A n In tro duction to the Theory of Lists. In M. Bro y , editor, L o gic of Pr o gr amming and Calculi of Descr ete Design . Springer-V erlag, 1987. 4. Alcino Cunh a and Jorge Sousa Pinto. P oint-free program t ran sformation. F undamenta Informatic ae , 66(4), A p ril-Ma y 2005. Sp ecial Issue on Program T ransformation. 5. J. Gibbon s and G. Hu tton. Proof Methods for Structu red Corecursiv e Programs. In Pr o c e e dings of the 1st Sc ottish F unctional Pr o gr amming Workshop , 1999. 6. Jeremy Gibb ons. The Third Homomorphism Theorem. Journal of F unctional Pr o gr ammi ng , 1995. F unc- tional Pearl. 7. Jeremy Gibb ons. Calcula ting Functional Programs. In Pr o c e e dings of ISRG/SER G R ese ar ch Col lo quium . School of Computing and Mathematical Sciences, Oxford Brookes Universit y , 1997. 8. Lambert Meertens. P aramorphisms. T echnical Rep ort RUU-CS-90-4, Utrech t Universit y , Department of Computer Science, 1990. 9. Erik Meijer, Maarten F okkinga, and Ross Paterso n. F unctional programming with bananas, lenses, en- velo p es and barb ed wire. In J. H ughes, editor, Pr o c e e dings of t he 5th ACM Confer enc e on F unctional Pr o gr ammi ng L anguages and Computer A r chite ctur e (FPCA’91) , volume 523 of LNCS . Springer-V erlag, 1991. 10. P . W adler. Deforestation: T ransforming p rograms to eliminate trees. In ESOP ’ 88: the se c ond Eur op e an Symp osium on Pr o gr ammi ng , pages 344–358, 1988. A Pro o fs and Calculations A.1 The function iso rtC t = f oldr is tC ′ C2list where istC ′ x f y = f ( istC x y ) satisﬁes the s p eciﬁcation iso rtC t : [ a ] → C a → [ a ] iso rtC t l y = ( is o rt l ) ⊙ ( C2list y ) Pr o of. The sp eciﬁcat ion can b e rewritten as iso rtC t l y = ( iso rt l ) ⊕ y or iso rtC t = ( ⊕ ) ◦ isort with the ⊕ op erator deﬁned as s ⊕ y = s ⊙ ( C2list y ) This app eals to the use of the fusion law since is o rt is d eﬁned as a fold. iso rtC t = ( ⊕ ) ◦ isort ⇔ { d eﬁ nitions } foldr i stC ′ C2list = ( ⊕ ) ◦ ( foldr insert [ ]) ⇐ { foldr fusion, with ⊕ str ict }  ( ⊕ ) [ ] = C2list ( ⊕ ) ◦ ( insert x ) = ( ist C ′ x ) ◦ ( ⊕ ) ⇔ { η -expansion }  [ ] ⊕ y = C2list y ( insert x l ) ⊕ y = istC ′ x (( ⊕ ) l ) y ⇔ { d ef. ⊕ , prop erties of ⊙ , def. istC ’ }  C2list y = C2list y ( insert x l ) ⊕ y = l ⊕ ( istC x y ) ⇔ { eq.(1 ), def. ⊕} [ x ] ⊙ l ⊙ ( C2list y ) = l ⊙ ( C2list ( istC x y )) ⇔ { eq. (4) } [ x ] ⊙ l ⊙ ( C2list y ) = l ⊙ ( insert x ( C2list y )) ⇔ { eq.(1 ), prop erties of ⊙} [ x ] ⊙ l ⊙ ( C2list y ) = [ x ] ⊙ l ⊙ ( C2list y ) A.2 W e pr o v e L T2list ◦ ( ist L T x ) = ( insert x ) ◦ L T2list ﬁrst b y calculation, and then using indu ction. paraLTre e :: ((LTree a)->b->(LTree a)->b->b)-> ((Maybe a)->b)- > LTree a-> b paraLTre e f g (Leaf x) = g x paraLTre e f g (Branch l r) = f l (paraLT ree f g l) r (paraLTree f g r) h = paraL T ree f g ⇔ { un icit y-p araL T ree } 8 < : h ◦ Leaf = g for all l , r, h ( Branch l r ) = f l ( h l ) r ( h r ) h ◦ ( paraL T ree f g ) = paraL T ree a b ⇔ { paraL T ree-fusion } h strict ∧ h ◦ g = b ∧ h ( f l l ′ r r ′ ) = a l ( h l ′ ) r ( h r ′ ) T able 5. The list paramorphism recursion pattern and la ws Pr o of by Calculation. It is easy to see that the insertion function istL T cannot b e written as a fold o v er trees, since it uses one of the subtrees unchanged (insertion will pro ceed recursively in the other subtree). Th is is a t yp ical example of a s ituation wh ere iteration is not su ﬃ- cien t: primitive recursion is required. This has b een studied as the p ar amorphism r ecursion pattern [8]. The operator in T able 5 emb o dies this pattern for leaf-trees. T he corresp onding unicit y prop ert y and fu sion la w [9] are also sh o wn in the table. The function istL T x can no w b e written as the follo wing p aramorp hism of leaf tr ees istL T x = paraL T ree f g where g Nothing = Leaf ( Just x ) g ( Just y ) = Branch ( Leaf ( Just x )) ( Leaf ( Just y )) f l l ′ r r ′ = B ranch r ′ l W e use the follo wing strategy: w e apply fusion to pr o v e the left-hand side of the equalit y equiv alen t to a new p aramorphism; subsequently w e pro v e by u nicit y that the righ t-hand side of the equ alit y is also equ iv alen t to this paramorph ism. L T2ist ◦ ( istL T x ) = paraL T ree a b ⇔ { d ef. of ist L T x as a paramorphism } L T2list ◦ ( paraL T ree f g ) = pa raL T ree a b ⇐ { p araL T ree-fusion, with L T2lis t strict }  L T2list ◦ g = b L T2list ( f l l ′ r r ′ ) = a l ( L T2list l ′ ) r ( L T2list r ′ ) ⇔ { η -expansion, def. f , g }    L T2list ( Leaf ( Just x )) = b No thing L T2list ( Bra nch ( Leaf ( Just x )) ( Leaf ( Just y ))) = b ( Just y ) L T2list ( Bra nch r ′ l ) = a l ( L T2list l ′ ) r ( L T2list r ′ ) ⇔ { d ef. L T2li st }    [ x ] = b Nothing [ x ] ⊙ [ y ] = b ( Just y ) ( L T2list r ′ ) ⊙ ( L T2list l ) = a l ( L T2list l ′ ) r ( L T2list r ′ ) W e are th us led to deﬁn e b Nothing = [ x ] b ( Just y ) = [ x ] ⊙ [ y ] a l l ′′ r r ′′ = r ′′ ⊙ ( L T2li st l ) It remains to prov e ( insert x ) ◦ L T2list = paraL T ree a b . Again w e p ro ceed by using fusion; the tric k is no w to wr ite the fold L T2list as a paramorphism (t his is alw a ys p ossible since it is a p articular case). ( insert x ) ◦ L T2list = paraL T ree a b ⇔ { u n icit y-paraL T ree, with ( ins ert x ) s trict }  ( insert x ) ◦ L T2lis t ◦ Leaf = b ( insert x ) ( L T2list ( Branch l r )) = a l ( insert x ( L T2list l )) r ( insert x ( L T2list r )) ⇔ { d ef. of L T2list }  ( insert x ) ◦ g = b insert x ( f l ( L T2li st l ) r ( L T2list r )) = a l ( insert x ( L T2list l )) r ( insert x ( L T2list r )) where g Nothing = [ ] g ( Just y ) = [ y ] f l l ′ r r ′ = l ′ ⊙ r ′ ⇔ { η -expansion, def. of f , g , a, b }    insert x [ ] = [ x ] insert x [ y ] = [ x ] ⊙ [ y ] insert x ( L T2list l ⊙ L T2list r ) = ( insert x ( L T2list r )) ⊙ ( L T2list l ) ⇔ { (1) an d p rop erties of ⊙}    [ x ] = [ x ] [ x ] ⊙ [ y ] = [ x ] ⊙ [ y ] [ x ] ⊙ ( L T2list l ) ⊙ ( L T2lis t r ) = [ x ] ⊙ ( L T2list l ) ⊙ ( L T2list r ) Pr o of by Induction. 1. c = Leaf Nothing L T2list ( is tL T x ( Leaf Nothing )) = ins ert x ( L T2lis t ( Leaf Nothing )) ⇔ { def. is tL T , L T2list } L T2list ( Leaf ( Just x )) = insert x [ ] ⇔ { def. L T2list , insert } [ x ] = [ x ] 2. c = Leaf ( Just y ) L T2list ( is tL T x ( Leaf ( Just y ))) = insert x ( L T2list ( Leaf ( Just y ))) ⇔ { def. is tL T , L T2list } L T2list ( Bra nch ( Leaf ( Just x )) ( Leaf ( Just y ))) = i nsert x [ y ] ⇔ { def. L T2list , S p ec. theorem } ( L T2list ( Leaf ( Just x ))) ⊙ ( L T2li st ( Leaf ( Just y ))) = ( wrap x ) ⊙ [ y ] ⇔ { def. L T2list , wrap } [ x ] ⊙ [ y ] = [ x ] ⊙ [ y ] 3. c = B ranch l r L T2list ( is tL T x ( Branch l r )) = insert x ( L T2list ( Branch l r )) ⇔ { def. is tL T , L T2list } L T2list ( Bra nch ( is tL T x r ) l ) = insert x (( L T 2list l ) ⊙ ( L T2list r )) ⇔ { def. L T2list , Sp ec. theorem } ( L T2list ( istL T x r )) ⊙ ( L T2list l ) = ( wrap x ) ⊙ (( L T2lis t l ) ⊙ ( L T2li st r )) ⇔ { induction, commut. ⊙} ( insert x ( L T2list r )) ⊙ ( L T 2list l ) = ( wrap x ) ⊙ (( L T2list r ) ⊙ ( L T2li st l )) ⇔ { Sp ec. theorem, asso c. ⊙} ( wrap x ) ⊙ ( L T2list r ) ⊙ ( L T2li st l ) = ( wrap x ) ⊙ ( L T2list r ) ⊙ ( L T2lis t l ) B T ree In v arian ts In order to prov e certain equalities, it is con v enien t to in tro duce a notion of invariant th at captures prop erties satisﬁed by the in termediate stru ctures. These inv arian ts are deﬁn ed struc- turally on the data t yp es. F or ev ery pr edicate p : A → Bo ol , w e consider the follo wing ind u ctiv e pr ed icates: ( AllL p [ ]) ∀ x, xs. ( p x ) ∧ ( A llL p xs ) ⇒ ( AllL p ( x : x s )) ( AllT p Empty ) ∀ x, l , r. ( p x ) ∧ ( Al lT p l ) ∧ ( Al lT p r ) ⇒ ( All T p ( No de x l r ) ( BST Empt y ) ∀ x, l , r. ( AllT ( < x ) l ) ∧ ( AllT ( ≥ x ) r ) ∧ ( BST l ) ∧ ( BST r ) ⇒ ( BST ( Node x l r )) ( HEAP Empt y ) ∀ x, l , r. ( AllT ( ≥ x ) l ) ∧ ( AllT ( ≥ x ) r ) ∧ ( HEAP l ) ∧ ( HEAP r ) ⇒ ( HEAP ( No de x l r )) Let us start stating some simple p rop erties concerning lists and trees. Lemma 1. F or every values x, y and lists l 1 .l 2 , we have: 1. x < y ⇒ i nsert x ( l 1 + +[ y ] + + l 2 ) = ( ins ert x l 1 ) + +[ y ] + + l 2 2. ( AllL ( ≤ x ) l 1 ) ⇒ insert x ( l 1 + + l 2 ) = l 1 + +( insert x l 2 ) 3. ( ∀ x.p x ⇒ q x ) ⇒ AllL p l 1 ⇒ A llL q l 1 4. ( AllL p ( l 1 + + l 2 )) ⇔ ( All L p l 1 ) ∧ ( AllL p l 2 ) 5. ( AllL ( < x ) l 1 ) ⇒ ins ert x l 1 = x : l 1 Pr o of. Simple indu ction on l 1 . Lemma 2. F or every tr e e t and value x , 1. ( AllT p t ) ⇒ ( All L p ( BST2list t )) 2. ( AllT p t ) ⇒ ( All L p ( H2list t )) 3. ( p x ) ∧ ( All T p t ) ⇒ ( Al lT p ( istBST x t ) 4. ( p x ) ∧ ( All T p t ) ⇒ ( Al lT p ( istH x t ) Pr o of. Induction on t . W e are no w able to pro v e the required p r op erties. F or heapsort, we explore the fac t th at the in termediate structur e is a heap (its ro ot kee ps th e least element). F or the heapsort algorithm, we explore the fact that the int ermediate tree is a heap. Prop osition 8. ( HEAP t ) ⇒ ins ert x ( H2list t ) = H2list ( istH t ) Pr o of. B y induction on the structure of t . The base ca se follo w s immediately from the deﬁ- nitions. F or the indu ction step we h av e: insert x ( H2list ( Node y l r )) = { def. H2list } insert x ( y : ( H2list l ) ⊙ ( H2list r )) = { def. insert } ( x : y : (( H2list l ) ⊙ ( H2list r )) if x < y , y : ( insert x (( H2list l ) ⊙ ( H2list r ))) if x ≥ y , = { lemma 1 (5) } ( x : ( insert y (( H2list l ) ⊙ ( H2list r ))) i f x < y , y : ( insert x (( H2list l ) ⊙ ( H2list r ))) if x ≥ y , = { comm u tativit y and asso ciativit y of ⊙} ( x : (([ y ] ⊙ ( H2list r )) ⊙ ( H2list l )) if x < y , y : (([ x ] ⊙ ( H2list r )) ⊙ ( H2list l )) if x ≥ y , = { induction hypotheses } ( x : (( H2list ( ist H y l )) ⊙ ( H2list r )) if x < y , y : (( H2list ( istH x l )) ⊙ ( H2list r )) if x ≥ y , = { def. H2list } ( H2list ( No de x ( istH y l ) r ) if x < y , H2list ( No de y ( istH x l ) r ) if x ≥ y , = { def. istH } H2list ( istH x ( Node y l r ) T o p ro v e that the in termediate tree is actually a heap, we pro v e th at insertion of element s preserve s the in v ariant. Prop osition 9. F or every value x and tr e e t , ( HEAP t ) ⇒ ( HEAP ( is tH x t ))) . Pr o of. Induction on t . The b ase case follo ws immediately from the deﬁ n itions. F or the ind u c- tion step we h a v e: ( HEAP ( istH x ( Nod e y l r ))) ⇔ { d ef. istH } ( ( HEAP ( Node x ( istH y r ) l )) if x < y , ( HEAP ( Node y ( istH x r ) l )) if x ≥ y , In fact, w hen x < y we ha v e:            ( AllT ( ≥ x ) ( istH y l )) by lemma 2 (2) and ( HEAP ( No de y l r )) ( AllT ( ≥ x ) r ) b y ( HEAP ( Node y l r )) ( HEAP ( istH x r )) b y induction h yp otheses and ( HEAP ( No de y l r )) ( HEAP l ) b y ( HEAP ( Node y l r )) W e reason similarly when x ≥ y . And no w, the required result follo ws directly by ind uction. Corollary 1. F or every list l , ( HEAP ( bui ldH l )) Pr o of. Simple indu ction on l . F or the qu ic ksort algorithm, w e explore the fact that the intermediate tree is a binary searc h tree. Prop osition 10. F or every value x and tr e e t , ( BST t ) ⇒ insert x ( BST2list t ) = BST2list ( istBST t ) Pr o of. B y induction on the structure of t . The base ca se follo w s immediately from the deﬁ- nitions. F or the indu ction step we h av e: insert x ( BST2list ( No de y l r )) = { def. BST2list } insert x (( BST2list l ) + +[ y ] + +( BST2list r )) = { lemma 1 (1,2), 2 (1) and hypotheses ( BST ( No de y l r )) } ( ( insert x ( BST2list l )) + +[ y ] + + ( BST2list r ) if x < y , ( BST2list l ) + +[ y ] + +( insert x ( BST2list r )) if x ≥ y , = { induction hypotheses } ( ( BST2list ( istBS T x l )) + +[ y ] + +( BST2list r ) if x < y , ( BST2list l ) + +[ y ] + +( BST2list ( istBST x r )) if x ≥ y , = { def. BST2list } ( BST2list ( No de y ( istBST x l ) r ) if x < y , BST2list ( No de y l ( is tBST x r )) if x ≥ y . = { def. istBST } BST2list ( istBST x ( Node y l r )) Again, w e note that the insertion function preserve s the in v ariant. Prop osition 11. F or every value x and tr e e t , ( BST t ) ⇒ ( BST ( istBST x t ))) . Pr o of. Induction on t . The b ase case follo ws immediately from the deﬁ n itions. F or the ind u c- tion step we h a v e: ( BST ( istBST x ( Node y l r ))) ⇔ { d ef. istBST } ( ( BST ( No de y ( istBST x l ) r )) if x < y , ( BST ( No de y l ( is tBST x r ))) if x ≥ y , In fact, w hen x < y we ha v e:            ( AllT ( < y ) ( istB ST x l )) b y lemma 2 (2) ( AllT ( ≥ y ) r ) b y ( BST ( No de y l r )) ( BST ( istBST x l )) b y induction h yp otheses and ( BST ( Nod e y l r )) ( BST r ) b y ( BST ( No de y l r )) W e reason similarly when x ≥ y . And the required result follo w s d irectly b y induction. Corollary 2. F or every list l , ( BST ( buildBST l )) Pr o of. Simple indu ction on l . C Alternativ e Deriv ation In this app en dix w e presen t a sligh t v ariation on the strategy for deriving the sorting algo- rithms. This v ariation clariﬁes th e r ole of the inv arian ts on int ermediate structures in the correctness argumen t of these algorithms. When w e compare th e pro of eﬀort required to establish the correctness of the “sorting b y insertion” algorithms, we n ote that there is signiﬁcant diﬀerence b etw een iso rtL T and the other t wo algorithms ( iso rtH and i so rtBST ). As explained in the main text, this is b ecause the correctness for the last t w o algorithms dep end on prop erties of the int ermediate structur e. Ho wev er, w e can explain that diﬀerence at a more abstract lev el — one migh t argue that iso rtL T is closer to the sp eciﬁcation of a generic insertion sort presen ted at Section 4. T o illustrate this p oin t, let u s recall the deﬁnition of these algorithms (we omit the deﬁnitions not relev an t for this discus s ion): iso rtL T = L T2list ◦ bui ldL T iso rtH = H2list ◦ buildH iso rtBST = BST2li st ◦ buil dBST L T2list = foldL T ree ( ⊙ ) t where t Nothing = [ ] t ( Jus t x ) = [ x ] H2list = foldr au x [ ] where aux x l r = x : ( l ⊙ r ) BST2list = foldBT ree aux [ ] where aux x l r = l + +( x : r ) W e observ e that L T2lis t u ses only ⊙ to construct (non trivial) lists. On the other side, H2list and BST2list mak e use of other fun ctions (namely (:) and (+ +)). Th at d istinction makes the later tw o sensible to th e ord er in g attributes of the in termediate tree. Let u s make one step bac k and deﬁne the follo wing v ariant s of iso rtH an d iso rtBST al go- rithms: iso rtH ′ = BT2list ◦ buildH iso rtBST ′ = BT2list ◦ buildBS T BT2list = foldr aux [ ] where aux x l r = [ x ] ⊙ ( l ⊙ r ) No w , the conv ersion of binary trees int o lists ( BT2li st ) do es not assume an y order in g constrains on these trees. In fact, BT2list and L T2li st should b e read as t w o instances of th e same p olytypic function. It is interesti ng to ve rify th at, for th ese mo diﬁed f u nctions, the correctness argumen t is essen tially the same as for iso rtL T . Prop osition 12. is o rtH ′ and isortBST ′ ar e sort algorithms. Pr o of. W e instantia te eqs. (3) and (4) for b oth fun ctions. W e set ǫ = Empty , and thus eq. (3) results directly from the d eﬁnition. F or eq. (5), w e need to pr o v e that for every b in ary tree t and v alue x , ( BT2list ◦ ( istH x )) t = (( insert x ) ◦ B T2list ) t ( BT2list ◦ ( istBST x )) t = (( insert x ) ◦ BT2list ) t These are p ro v ed by induction on the structure of t . W e sho w the pro of of the ﬁrst one (the second is similar). The base case is trivial. F or the in duction step we h a v e: BT2list ( istH x ( No de y l r )) = { def. istH } ( ( BT2list ( No de x ( istH y r ) l )) if x < y , ( BT2list ( No de y ( istH x r ) l )) if x ≥ y , = { def. BT2list } ( [ x ] ⊙ (( BT2list ( istH y r )) ⊙ ( BT2li st l )) if x < y , [ y ] ⊙ (( BT2list ( istH x r )) ⊙ ( BT2list l )) if x ≥ y , = { induction hypotheses } ( [ x ] ⊙ (([ y ] ⊙ ( BT2list r )) ⊙ ( BT2li st l )) if x < y , [ y ] ⊙ (([ x ] ⊙ ( BT2list r )) ⊙ ( BT2li st l )) = { com u tativit y and asso ciativit y o f ⊙} [ x ] ⊙ ([ y ] ⊙ (( BT2list l ) ⊙ ( BT2list r ))) = { def. BT2list } [ x ] ⊙ ( BT2list ( No de y l r )) In ord er to r eﬁne is o rtH ′ and i so rtBST ′ to heap sort and quickso rt, we should now p ro ceed in t w o indep enden t paths: – to sho w that the construction of the intermediate tree can b e p erformed co-inductive ly (i.e. buildH and buildBS T are equal to unfoldhso rt and unfoldqsort resp ectiv ely); – to sho w that the tree conv ersion into the resultan t list can b e simpliﬁed to their stand ard form ulation (i.e. BT2list can b e replaced by H2list for the heapsort and by BST2list for the quic ksort). The ﬁrst p oin t was p erformed in the main text (c.f. Prop ositio ns 5 and 7). The second is the on e that should consider the ordering prop erties indu ced b y the building pro cess for eac h case — more precisely , one pro v es: BT2list ◦ buildH = H2list ◦ buildH BT2list ◦ buildB ST = BST2li st ◦ build BST As in app end ix B, it is conv enient to mak e explicit the str u ctural inv arian ts p ossessed by the in termediate structures in eac h case. That is, ( HEAP t ) = ⇒ BT2list t = H2list t ( BST t ) = ⇒ BT2list t = BS T2list t The pro of require a simple lemma relating ⊙ with ordering predicates. Lemma 3. F or every 1. ( AllL ( x ≤ ) l 1 ) = ⇒ [ x ] ⊙ l 1 = x : l 1 2. ( AllL ( x > ) l 1 ) = ⇒ l 1 ⊙ ( x : l 2 ) = l 1 + +( x : l 2 ) 3. ( AllL p l 1 ) ∧ ( AllL p l 2 ) = ⇒ ( AllL p ( l 1 ⊙ l 2 )) Pr o of. The ﬁrst t wo are prov ed by simple indu ction on the s tructure of l 1 . The third b y m utual indu ction on l 1 and l 2 . No w , the requ ired pr op erties follo w by simp le induction. The base case is, in b oth cases, trivial. F or the indu ction step, we hav e for H2list : BT2list ( Node x l r )) = { def. B T2list } [ x ] ⊙ (( BT2list l ) ⊙ ( BT2list r )) = { induction h yp otheses } [ x ] ⊙ (( H2list l ) ⊙ ( H2list r )) = { def. of HEAP and lemma 3 (1,3) } x : (( H2list l ) ⊙ ( H2list r )) = { def. H2list } H2list ( N ode x l r ) and for BST2list : BT2list ( Node x l r )) = { def. BT2list } [ x ] ⊙ (( BT2list l ) ⊙ ( BT2list r )) = { induction h yp otheses } [ x ] ⊙ (( BST2list l ) ⊙ ( BST2list r )) = { comm u tativit y and asso ciativit y of ⊙} ( B S T 2 l ist l ) ⊙ ([ x ] ⊙ ( BST2li st r )) = { def. of B ST and lemma 3 (1,2) } ( BST2list l ) ⊙ ( x : ( BS T2list r )) = { def. BST2list } BST2list ( N ode x l r )

Deriving Sorting Algorithms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment