Towards Realistic Vehicular Network Modeling Using Planet-scale Public Webcams

T o war ds Realistic V ehicular Netw ork Mo deling Using Planet-scale Public W ebcams Gautam S. Th akur ‡§ P an Hui ‡ Hamed K etabdar ‡ Ahmed Helmy § § CISE, Unive rsity of Florida, G ainesville , ‡ Deutsche T elekom Laboratories, Berlin gsthakur@cise .uﬂ.edu, pan.hui@telek om.de, hamed.ketab dar@telek om.de, helmy@cise .uﬂ.edu ABSTRA CT Realistic modeling of vehicular mobility h as been pa rticu- larly challengin g d ue to a lack of large librar ies of mea- surements in the research c ommun ity . In this p aper we in- troduce a novel metho d fo r large-scale monitor ing, an aly- sis, and identiﬁcation of spatio-te mporal models f or vehic- ular mobility using th e freely av a ilable online webcams in cities a cross the globe. W e collect vehicular mobility traces from 2 ,700 trafﬁc webcams in 10 different cities for several months and generate a mobility dataset of 7.5 T erabytes con- sisting o f 1 25 million o f ima ges. T o the be st of our knowl- edge, this is the largest data set ever used in such study . T o pro cess and analyze this data, we propo se an efﬁcient and scalable algorithm to estima te trafﬁc d ensity based on backgr ound image subtraction . Initial results show that at least 82 % o f individual camer as with less than 5% devia- tion from four cities follow Loglogistic distribution and also 94% camer as from T oron to follo w gamma distribution. The aggregate resu lts from each city also dem onstrate that Log- Logistic and gamma d istribution pass the KS-test with 95% conﬁdenc e. Fu rthermo re, many of the camera trace s exhibit long r ange depe ndence, with self-similarity evident in th e aggregates of trafﬁc (per city). W e be liev e o ur n ovel d ata collection metho d an d dataset pr ovide a muc h needed con- tribution to the research community for realistic modeling of vehicular networks and mobility . 1. INTR ODUC TION Research in the a rea of v ehicular netw or ks has in- creased dramatically in recent years. With the prolifer - ation of mobile netw orking technologies and their inte- gration with the automobile industry , v ar ious forms of vehicular netw or ks a r e b eing r ealized. These netw or ks include vehicle-to-vehicle, vehicle-to-roadside, a nd vehicle- to-roa ds ide-to-vehicle architectures. Realistic mo del- ing, simulation and informed de s ign of such netw o rks face s everal challenges, mainly due to the lack o f lar ge- scale communit y- wide libraries of vehicular data mea- surement, and representativ e mo dels of vehicular mo- bilit y . Earlier studies in this area have clea rly established a direc t link b etw een vehicular densit y distribution and the per formance [16, 3] of vehicular net works primitives and mechanisms, including broa dc a st a nd geo cast pro - to cols[1]. Although go o d initial eﬀorts hav e b een ex- erted to capture re alistic vehicular density distributions, such e ﬀorts were limited by av ailability of sensed vehic- ular data[20]. Hence, there is a real need to conduct vehicular density mo deling using large r scale a nd more comprehensive data sets. F urthermor e , commonly used assumptions, such a s exp o nential distribution[1 9] of ve- hicular inter-arriv al times[1], have b een used to derive many theories and conduct several analyse s, the v a lidity of which b ea rs further investigation. In this study , we provide a novel framework fo r the systematic monitor ing, measurement, ana ly sis and mo d- eling of vehicular density dis tr ibutions at a large scale. T o avoid the limitations of sensed vehicular data, we instead utilize the existing g lobal infra structure o f tens of thousa nds of video ca meras providing a con tin uous stream of s treet images fro m doz ens of cities ar ound the world. Millions of ima ges ca ptured fr om publicly av ail- able traﬃc w eb ca meras are pr o cessed using a nov el density estimatio n algorithm, to help inv es tigate and understand the traﬃc pa tterns of cities a nd ma jor high- wa ys. Our alg orithm employs s imple, scalable , and ef- fective background subtraction techniques to pro cess the imag es and build an extensive librar y of s patio- tempo ral vehicular density data. As a ﬁrst step tow ard realistic vehicular netw o rk mo d- eling, we aim to provide a comprehensive view of the fundamen tal statistical characteristics of the vehicular traﬃc density exhibited by the data fr om four ma j or cities ov er 45 days. Two main s ets o f statistical ana l- yses a r e conducted. The ﬁrst includes an inv estigatio n of the b est-ﬁt distribution for the arriv al pro cess using v arious camer a s a nd aggreg ate city data, while the sec- ond is a s tudy of the lo ng range dependenc e (LRD) a nd self-similarity observed in the data. Our ea rly analys is show t wo main results: i) the empirical distribution of vehicular densities in most o f the cameras and cities fol- low ‘log -logistic’ and ‘gamma’ distributions. ii) Co nsis- 1 ten tly , the da ta show ed a high deg ree of self-similarity ov er order s of magnitude of time scales, in a ll cities and fo r many ca meras. This s uggests a long- range- depe ndent pro cess governing the vehicular arr iv al pro - cess in ma ny rea listic scenarios. Such result is in shar p contrast to the as sumptions of memoryless pro ces ses commonly used for vehicular mo bilit y . The contributions of this work are manifold. (i) T o the b est o f our knowledge, we pr ovide by far the lar gest and most ex tens ive librar y of vehicular density data, based on pro ce s sing of millions of images obtained from ten main cities and thousands of cameras. This a d- dresses a severe shortag e of such data sets in the co m- m unit y . The libra ry will b e made av a ilable to the re- search communit y in the future. (ii) W e prop ose a fast algorithm for traﬃc density estimation to eﬃciently pro cess millions of image ﬁles. (iii) W e establish log- logistic and gamma distributions as the most suitable ﬁts for the v ehicular density distr ibution and provide early evidence of self-simila rity exhibited by the tra ﬃc at v arious time sca les. The r est o f the do cument is outlined as follows. Sec - tion 2 discusse s rela ted w ork. In Section 3, we dis- cuss our vehicular da ta set. In Se c tio n 4 , we discuss our background subtrac tio n algo r ithm, and detection and remov al of outliers. Statistical analysis of mea sure- men ts and mo deling is illustra ted in Section 5. Finally we conclude our paper in Section 6 and give insight int o the future work. 2. RELA TED WORK Large sca le mobility datas e ts are very imp o rtant for the mobile net work and computing resear ch communit y , but collecting them is even more challenging and usu- ally exp ensive [8]. In this pap er , we pr op ose an inex- pens ive metho d to collect global scale vehicular mo bilit y traces using thousands of freely av aila ble web c ams that provide contin uous and ﬁne- grained monitoring of the vehicular traﬃc. Existing studies in transp orta tion sciences fo cus on improving r oad traﬃc and use of structural engineer- ing metho ds to r e solve is sues o f conges tion, ev acuation, and mitiga tion pla ns. Initial w ork[5] mainly fo cused on dev eloping infrastructure for mov ement of vehicles on r oads a nd br idges. Ho wev er, in the recent times[7] m uch fo cus has b een g iven to the use of sensor data. The later helps to engineer b etter traﬃc conditions, en- suring s a fety and manag ement of traﬃc. F or example, inductive lo op detector s are equipp ed to monitor tra ﬃc ﬂows. How ever, the av ailability of the data gener ated from these sensors is not rea dily av aila ble to the gen- eral public. Second, studies[4] do not necess arily fo cus on vehicular ne tw orks, traﬃc mo deling, and character- ization. In spite of data av aila bility pr oblems, surpris- ingly ther e is a larg e deployment of publicly av ailable online web camer as, which can b e used to monitoring and mo deling traﬃc. In our work, we take adv antage of these free webcams. T o our knowledge we ar e the ﬁrst to identify the p ower and u sability of these fr e e web c am- er as for t he purp ose of mo deling and char acterizing the tr aﬃc acr oss glob e . Sim ulation too ls like CORSIM[7] and VISSIM[11] a re geared to model sp eciﬁc sce na rios for planning future traﬃc conditions on a micr o-mobility and s mall sca le level. In this work, we fo cus on the a s p ect of macro - mobility to mo de l vehicular movemen ts in form of ﬂow densities to analyze tra ﬃc on huge sca le. F rom a net- working per s p e ctive, mobility mo dels[4, 14] and routing [21] techniques investigate how mobility impact the p er- formance of ro uting proto cols [2]. If the mobility mo del is unr ealistic then routing p er fo rmance is questionable. So, we need models inspired fro m real data sets. B y wa y of this work, we b elieve a compr ehensive set o f pa - rameters can be extr acted to develop such mo dels. In a recent work, Bai et. al [1] analy z e d spatio- tempo ral v ariations in vehicular tra ﬃc fro m the purp ose of in ter -vehicle comm unications. Data collected fro m realistic scenarios shows the eﬀectiveness of exp onen- tial mo del for highw ay vehicle traﬃc. On the same line, quantitativ e characteristics o f vehicle arriv al pattern on highw ays is studied in [13]. By using rea l hig hwa y traﬃc data, the study examines the ex istence of self-s imilarity characteristics on v ehicle arriv al data a nd ﬁnds that time headwa y of v ehic le s on the highw ays follo ws the heavy-tailed distribution. These ﬁndings enrich tra ﬃc mo deling, but carried out on very small sample of data and mainly loc a lized to one or tw o lo cations. In o ur study , we use 45 days of vehicular imagery da ta from four cities to mo del tr aﬃc and characterize the densit y distribution. A pr inc iple a c tiv ity rela ted to our work is image pr o - cessing and eﬃcient retr iev al of traﬃc information from these images. Ma ny studies[5] have been car ried o ut that lo o k in to a sp ects of b o th background subtraction[15, 17] and ob ject detectio n[10]. In former metho ds[6], dif- ference in the current and reference frame is used to ident ify ob jects. In detection a pproaches[18], learning the ob ject features (shap e, size etc.) are used to detect and classify them. In our work, we are using a tem- po ral metho ds for background subtr a ction to ca lc ula te a rela tive numerical v alue instead of counting cars . In our work we ﬁnd background subtraction is muc h faster than ob ject detection, which is discussed in detail in later section. 2 T able 1: Glo b al W eb cam Datasets Cit y # o f Cameras Duration In terv al Records Database Si ze Bangalo r e 160 30/Nov/10 - 0 1/Mar / 11 180 s ec 2.8 million 357 GB Beaufort 70 30/Nov/10 - 0 1/Mar / 11 30 sec. 24.2 million 1150 GB Connecticut 120 21/Nov/10- 2 0/Jan/ 11 20 sec. 7.2 million 43 5 GB Georgia 777 30/Nov/10 - 0 2/F eb/11 60 sec. 32 million 1400 GB London 182 11 /Oct/1 0 - 2 2/Nov/10 60 sec. 1 million 201 GB London(BBC) 723 30/Nov/10 - 01/Mar/ 11 60 sec. 20 million 1050 GB New y ork 160 20/Oct/10 - 13/ Jan/1 1 15 sec. 26 million 1200 GB Seattle 121 30/Nov/10 - 0 1/Mar / 11 60 sec. 8.2 million 60 0 GB Sydney 67 11/Oc t/ 10 - 05/Dec/1 0 3 0 sec. 2.0 million 35 0 GB T oronto 89 21/Nov/10 - 2 0/Jan/ 11 3 0 sec. 1.8 million 32 5 GB W ashington 240 30/Nov/10 - 0 1/Mar / 11 60 sec. 5 million 400 GB T otal 270 9 - - 125.2 mil lion 7468 GB Figure 1: Infrastructure for m e asuremen t col - lection (a) London (b) Sy dney Figure 2: T raﬃc cameras in London and Sydney . The red dots show the loca tion o f cameras deplo yed . 3. D A T A COLLECT ION There a re thous a nds, if not millions, of outdo or cam- eras curr e ntly connected to the Internet, which ar e placed by g ov ernments, compa nies, conserv atio n so cieties, na- tional parks, universities, and priv ate citizens . Out- do or webcams are usually mounted on a roadside p ole with easy acc e ssibility , installation and maintenance, and they ha ve seen enormous applications not only in adaptive traﬃc co ntrol a nd information sys tems , but also in monitoring the weather co nditions, advertising the b eauty of a pa r ticular b each or mo unt ain, or pr ovid- ing a view of animal or plant life at a par ticular lo cation. W e view the connected global net work of webcams as a highly versatile platform, enabling an un ta pped p oten- tial to monitor global trends, o r changes, in the ﬂo w o f the city , and providing lar g e-scale data to realis tically mo del vehicular, or even human, mobilit y . In this section, w e introduce the metho do logy for the data collec tio n and give a high level statistics of the data tra ces. W e c o llect vehicular mobility traces using the online webca m cr awled by our crawler. A ma jority of these w eb cams are deploy ed by the Department of T ransp or ta tions (DoT) in ea ch city . They ar e used to provide r eal time informatio n ab out roa d tr a ﬃc condi- tions to general public via online tra ﬃc web camer as. These web camera s are basica lly installed on tr a ﬃc sig- nal p oles facing to wards the roads of some prominent int ersections throughout city and hig hw ays. At regular int erv al of time, these ca mera captures s till pictures o f on-going r oad tra ﬃc a nd send them in form of feeds to the DoTs media server. F o r the pur p o se of this study , we chose 10 cities with lar ge n um be r of webcam cov er - age and to ok the p ermissio n from concerned DoTs to collect these vehicular image r y data for several mon ths. W e cov er cities in Nor th America, Europ e, Asia, and Australia. In Fig.-1, we show our ex p er imental infras - tructure to do wnlo ad and maintain the imag e data. Since these ca meras provide b etter ima gery during the daytime, we limit our study to download and analyze them only dur ing such hours. O n av er a ge, we down- load 15 Giga bytes o f imager y data per day from ov er 4700 traﬃc web cameras , with a overall dataset of 6.5 T erabytes and co nt aining ar ound 1 20 millions images. T able-1 shows the high level statistics of data s ets we collected. Each city has a diﬀeren t num b er o f deploy ed 3 cameras and a diﬀerent interv a l time to capture images. F or example, cameras for the city of Sy dney capture im- ages at an int erv al of one minute while for the s tate of connecticut the interv al time b etw een tw o consecutive snapshots is o nly 2 0 seco nds. The wide s pread g eo- graphical deployment of these cameras cov er ing ma jor sections of city and highw ays. Fig.- 2 give an example of the camera deployment s in the city of London a nd Syd- ney by ma pping the Globa l Positioning System (GPS) lo cation o f the cameras to Goo gle maps. The area cov- ered by the cameras in London is 9 50 k m 2 and that in Sydney is 1500 k m 2 . Hence, we b elieve our study will be comprehe ns ive and will reﬂect ma jor trends in traﬃc mov ement of cities. 4. ALGORITHM T O EXTRA CT TRAFFIC DENSITIES W e a im to estimate tra ﬃc density on roads co ns id- ering the num b er of vehicles or p e destrians cr ossing the road. W e hav e a sequence o f imag es ( I 1 ( x, y ) + I 2 ( x, y ) ... + I z ( x, y )) captured by webcams. Considering our pro blem, w e ha ve to be able to se parate infor ma- tion we need, e.g. num b er of vehicles and p edestrians from the back ground imag e which is nor mally ro ad and buildings around. The main factor that ca n distinguish betw een vehicles a nd background ima ge (r oad, build- ings) is the fact that the vehicles ar e not in a stationary situation for a long p erio d of time, how ever the back ground is statio nary . The solution for the problem then seems to b e a pplying a s ort of hig h pas s ﬁlter ing ov er a sequence of imag es captured by a webca m over time. The high pa ss ﬁlter remov e s the stationary pa rt of the images (ro ad, buildings, etc.), and keeps the moving comp onents (mainly v ehicles). In order to implemen t such a high pa ss ﬁlter, w e subtract re sult o f a low pass ﬁlter ov er a sequence o f images, fro m each still image. This is practica lly eq ua l to implemen ting a high pass ﬁlter over sequence of images . In order to o btain low pass ﬁltering eﬀect, we run a moving av erage ﬁlter over a time sequence o f imag es obta ined fro m o ne webcam. The duration of moving av erag e ﬁlter can be adjusted in an adho c way . The moving av erag e ﬁlter is simply im- plement ed by av era ging over intensit y ma p for several images in a certain dur a tion. A t the output o f mov- ing av era ge ﬁlter, the intensit y of ea ch pix e l is obtained by averaging intensit y o f corres p o nding pixels in the in- terv al. The output of the moving av era ge ﬁlter (low pass ﬁlter) is normally the r equired ba ckground image, which is still imag e of street and buildings . Therefore , subtracting ea ch image fro m the output of lo w pass ﬁl- ter, gives us the mo ving comp o ne nts (e.g. v ehicle s ). This is in fa ct the high pa s s comp o nent of the image ov er time. Having the high pas s comp onent of the image, the ve- hicles are highlighted from background. One may then use reg ular ob ject detection techniques to identif y a nd count num b er of vehicles in the high pass ﬁlter ed im- age. How ever, applying such techniques ma y require heavy lo ad of computation, and in the s ame time it ca n be unnecess ary . As an alternative, w e s imply counting nu mber of a c tive pixels (pixels with a v alue hig her than a cer tain threshold). Such a pro ces s can b e muc h faster than detecting and co unt ing o b jects in a n imag e. In the same time, it can be muc h more eﬀective, b ecause we a re lo oking for the p ercentage of the street (road) which is covered b y vehicles (as an indicato r of how crowded is the street), rather than num b er of vehicles. Num b er of vehicles can not be necessar ily a go o d indi- cator of crowdedness, as a long vehicle may intro duce more traﬃc than a small one. Secondly , it o vercomes the issues that ob ject detection alg orithm face in con- ditions of severe co ngestions. One o f them is visibility of bo undary contours used to separ a te ob jects from one another. In contrary , counting num b er of active pixels can indicate wha t p er centage of the road is covered, no matter ho w many vehicles a re in the r oad. Said that, consider a n imag e ca n b e re presented as I ( x, y ) = L ( x, y ) + T ( x, y ) + N ( x, y ) where I ( x, y ) is the captured image, L ( x, y ) is our low pass ﬁlter and T ( x, y ) and N ( x, y ) are resp ectively the traﬃc and asso ciated noise with the images. In ﬁrst step, we genera te a low pass ﬁlter using the afor emen- tioned technique o f moving av er age. Initially , we av e r- age a g ive data pixel with its r ight and left ne ig hbors. F or the purp ose o f this study , we k e pt the n umber of its neighbors z = 1 00. The averaging r esults in the remov a l of dominant trends. These dominant tr ends are T ( x, y ) and N ( x, y ). This low pas s ﬁlter remains consta nt for one c amera, L ( x, y ) = ( I 1 ( x, y ) + I 2 ( x, y ) ... + I z ( x, y )) /z T o get the tra ﬃc densit y as so ciated with an image we subtract the low pass ﬁlter a nd set a threshold ( τ ) to reject a resulted pixel v a lue b e low it so as to reduce the eﬀect o f no is e (shadows etc.) N ( x, y ). In summary , I ′ ( x, y ) = I ( x, y ) − L ( x, y ) Such that I ′ ( x, y ) > τ . Later, we convert the image to g rayscale I ′′ ( x, y ) and sum the pixels to get the traﬃc density ( d ). d = m X x =0 n X y =0 I ′′ ( x, y ) Outliers Detection and Removal An impor tant a sp ect o f colle cting images on s uch a larg e scale r e quires a utomated pr o cesses to manag e and ex- tract useful informatio n. As mentioned, diﬀerent cam- eras hav e diﬀerent refr eshing r ate, we have to contin- uously download ima ges at a sp eciﬁc time- interv al for 4 0 2000 4000 6000 8000 10000 12000 14000 16000 0 10000 20000 30000 40000 Image Count Traffic Densities 5000 10000 15000 20000 25000 30000 35000 Outliers (a) Ou tliers Presen t 0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 Image Count Traffic Densities 2000 4000 6000 8000 (b) Outliers Remov ed Figure 3: O utliers detection and rem o v al. (a) Outliers detection by encircling them (b) F ac- tual traﬃc densi t y distributio n. each camer a. T o ensure that we are not missing even a single traﬃc sna pshot, we k ee p our download time- int erv al a little sho rter than the camer a refres hing rate. How ever, this results in few duplicate images that we ﬁlter out a s a ﬁrst step tow a rds o utliers detection and remov al. Norma lly , the do wnloaded data set contain images, which a re the snapshot of vehicular tr a ﬃc on the roads . But in many ins ta nces, the imag es ar e cor- rupted with zero sized or with extra neous bytes (noise). Next, if the camer a instrument is non-functiona l or ha s mechanical erro rs, the traﬃc monitoring s erver replaces current traﬃc snapshot with error notiﬁcation ima g e. The challenge here is to detect all s uch error s and remov e them b efore mo deling and statistica l ana lysis. The analysis b ecome more complex as we do not know the kind of distribution underlying and hence a ny statis- tical techniques that r ely on some distribution (boxplot etc) cannot b e used. W e used semi-sup ervised learning and data mining to overcome the c ha llenges of outliers detection and remov al in millio ns o f traﬃc ima ges. In our case, we tre a t data set X containing all types of images as X = { x i , x 2 , x 3 , ..., x n } . Later on we di- vide this se t into tw o parts: the data po int s in X l = { x 1 , x 2 , x 3 , ..., x l } mapp ed to lab els in Y l = { y 1 , y 2 , y 3 , ..., y l } . The pr ovided input features includes but not limited to image size, colo r depths, multi-c hannel co lor arrays and image se g mentation stderrs for detecting o utliers. The second pa rt contains p oints with unknown lab els r epre- sented as X u = { x l +1 , x l +2 , x l +3 , ..., x l + u } such that u >> l . The already known and learned lab eled p oint ar e later used to ﬁnd cluster b oundarie s and assigning class to each cluster . In this case, we used low density separa tion ass ump- tion that help to cut the datase t into clusters . The ident iﬁed clusters ar e separated o ut as outliers, whic h are mostly distant from the regular tra ﬃc dens it y data. In Fig-3, w e compare the results of detecting and r e- moving the outliers. (a) d = 2023 , 0 . 28 (b) d = 5400 , 0 . 55 (c) d = 9230 , 0 . 93 Figure 4 : A serie s of pictures for same in te r- section but v arying [(a)low/(b)medium/(c)high] traﬃc intensities. This v ariation is captured b y density parameter d . The ﬁrst v alues is the re- sult o f bac kground subtraction and later is the normalized v alue. Figure 5: T raﬃc arriv al pro cess on hourly basi s for 45 da ys. A regul ar pattern of hi gh traﬃc in te n s it y during morning and ev ening ho urs is eviden t. 5. T O W ARD REALISTIC V EHICULAR NET - WORK MODELING As a ﬁr st step tow ard rea listic mo deling of vehicu- lar commun ication netw o rk, we fo cus on tw o s tudies of traﬃc ar riv al pr o cess in this pap er: mo deling the den- sities ( d ) against well known probability distributions and analyzing the typical traﬃc burstiness using self- similarity analys is . The ob jectiv e of this study thus help to understand the under ly ing s ta tistical pa tterns and mo del the a r riv al pro cesses. The mo dels a re s e- lected based on their applicability in every day s tatisti- cal analy sis and by several itera tions of mo deling that show ed the traﬃc close ly fo llow ( less deviation ) one or more of the disc us sed pr obability distributions. Due to page limit and as ear ly study , in this section we will only pres ent results from 4 represe nt ed cities (London, Sydney , T oronto, and Connecticut) with in total 45 8 cameras and 1 2 millio n images . An imp ortant and un- derlying fact ab out the tr a ﬃc de ns ities is the approxi- mation to re lative traﬃc o n the roa ds. This assumption is diﬀerent fr om counting cars using loo p detectors o r other sensors . As shown in the Fig.-4, we depict three traﬃc scena rios of v ary ing intensities from low to fully 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Connecticut Traffic Exponential Gamma Log−Logistic Normal Weibull (a) Connecticut 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) London Traffic Exponential Gamma Log−Logistic Normal Weibull (b) London 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Sydney Traffic Exponential Gamma Log−Logistic Normal Weibull (c) S ydney 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Toronto Exponential Gamma Log−Logistic Normal Weibull (d) T oron to Figure 6: Mo de l ing the dis tribution for aggregate traﬃc densiti es. Cit y 1 st Best Fit 2 nd Best Fit 3 r d Best Fit Connecticut L[87%] G[11%] E[0.5%] London L[42%] G[39%] W[16%] Sydney L [6 2 %] G[32%] N[2%] T oronto G[46%] W[31%] L[21%] E=Exp o ne ntial. G=Gamma, L= Loglog istic, N= Normal, W=W eibull T able 2: Dominan t dis tribution as Best Fits[ By Ranking] Cit y 6 3% 6 5% Connecticut L[62%], G[15%], W[3%] L[94%], G[44%], W[19%] London G[34%], L[34 %], W[10 %], N[0.5 %] L[8 2%], G[70 %], W[47%], N[7%] Sydney L[88%], G[61%], W[4 %], N[2%] L[98%], G[88%], W[4 4 %], N[18 %] T oronto G[75%], W[58%], L[3 4 %] G[94%], W[88%], L[87%], E[4%], N[1%] T able 3: Dominan t dis tributions as Best Fits [By % Deviation KS-T est.] 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%)  Connecticut Traffic Exponential Gamma Loglogistic Normal Weibull (a) Connecticut(L) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Connecticut Traffic Exponential Gamma Loglogistic Normal Weibull (b) Connecticut(M) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Connecticut Traffic Exponential Gamma Loglogistic Normal Weibull (c) Connecticut( H) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) London Traffic Exponential Gamma Loglogistic Normal Weibull (d) London(L) 0 0.1 0.2 0.3 0.4 0.4 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) London Traffic Exponential Gamma Loglogistic Normal Weibull (e) Lond on(M) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%)  London Traffic Exponential Gamma Loglogistic Normal Weibull (f ) London(H) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Sydney Traffic Exponential Gamma Loglogistic Normal Weibull (g) Sydney(L) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%)  Sydney Traffic Exponential Gamma Loglogistic Normal Weibull (h) Sy dney(M) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Sydney Traffic Exponential Gamma Loglogistic Normal Weibull (i) Sydney(H ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Toronto Traffic Exponential Gamma Loglogistic Normal Weibull (j) T oron to(L) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Toronto Traffic Exponential Gamma Loglogistic Normal Weibull (k) T oron to(M) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.2 0.4 0.6 0.8 1 Data CDF (CI 95%) Toronto Traffic Exponential Gamma Loglogistic Normal Weibull (l) T oron t( H) Figure 7: Cumulativ e plot for three v arying traﬃc in tens ities captured p er cit y . The indivi d ual ﬂows are c haracterized b y the Lo w(L), Me dium(M) and High(H) traﬃc intensities. 7 congested intersection for the same camera as captured by the density parameter ( d ). Exponential Gamma Loglogistic Normal Weibull 0 20 40 60 80 100 Distribution Model Avg. % [1 st Best Fit] <3%: 49% <5%: 89% <3%: 0.6% <5%: 5.9% <3%: 17% <5%: 70% <3%: 42% <5%: 70% <3%: 0% <5%: 0.9% Figure 8: The p ercenta g e o f distributio n that co ver cameras from all four cities. The v alues i n the b ox show p ercenta ge de vi ation error from empirical data. 5.1 T rafﬁc Flow Characterization In order to inv e stigate the nature of traﬃc we take a holistic approach to systema tica lly extract individua l and aggreg ate ﬂows of the tr aﬃc densities from the im- ages. E a ch individual ﬂow constitutes a distribution of traﬃc densities tha t demonstr ate the ﬂow of traﬃc as viewed fr om an individual camera. This helps us to b et- ter understand tr a ﬃc intensit y at a microsco pic level of each intersection. The aggr egate tra ﬃc combines the ﬂows from all the camera in timely or dered fashion. The main adv antage from analyzing ag grega te traﬃc is to understa nd the emergent prop erties and helps to mo del and proﬁle the cit y and make intelligen t guesses ab out diﬀerent city based on this a g grega te. On ana ly zing the tr aﬃc, an imp ortant activity to fac- torize the granularit y o f tr aﬃc for v ar ious purp os e . F or example, ho urly patter ns provides a go o d es timate o n the nature of c ongestions during morning and evening times which o therwise ﬂo w at individual density level may not depict. On the other hand, the ﬁner g ranular- it y helps to understand sudden spikes in the traﬃc ﬂow and congestion mitigation plan. In this w ork, w e choose to lo ok into a ll thes e patterns b y modeling ﬂows agains t well k nown probability distributio ns. Fig 5 gives an ex- ample of the traﬃc de ns it y on hourly basis for one of the camera in Sydney . W e can observe that there is in gen- eral high traﬃc density during the p eak ho ur s and low traﬃc density b etw een 1 0am and 2pm (oﬀ p ea k time) which provides po sitive conﬁr ma tion tha t o ur algorithm can eﬀectively detect traﬃcs. Fig. 7 shows the c um ulative density function of the traﬃc fo r three individual cameras in ea ch city , with low, medium and high av er age traﬃc. W e can see that traﬃc at individual camer as c a n v ary a lot, but in gen- eral Log -Logistic, Gamma and W eibull distr ibution can capture so me of the key features of the data. Lo g- logistic is the b est a pproximation for the individual camera tra ﬃcs in all the four cities, and we further shows the detail s tatistics of the ﬁtting in T able-2 that bes t ﬁts, which had shown least o rder of devia tion ag a inst KS-test. In T able-3 , we meas ure the deviation from empirical data and sample the c amera at 3% and 5% err o r levels. In Fig.-8, results show the av era ge dominance of each of four distribution. W e ﬁnd that even on individual aggre g ation lev el, the lo g logistic distribution provides a go o d estimate for empirical data. As evident, Loglo- gistic a nd Gamma close ly matc hes the empirical data distribution. Finally , Fig. 6 shows the cumulative s tatistics for the aggre g ated tra ﬃc for each cit y . W e ca n obs erve that dif- ferent cities have diﬀerent agg regated traﬃc, for e xam- ple we can s ee that Lo ndon in genera l has mor e traﬃc than Connecticut. 5.2 Long R ange Dependen ce In [9, 12], authors demonstr ate the ex istence of long range dep endence and self-similar nature of ether net traﬃc, which has serious implications o n the design and analys is of computer netw or ks. Ins pired by this study o n the a r riv al pro ces s of ethernet pack ets in wired net works, we a lso characterize the nature of v ehicular traﬃc and inv estigate long r ange dep endence. Self- similarity means that aggreg ate traﬃc statistics s how long range dependence and the cor r elation decays less than exp onential. In Fig-9(a-d), we show time ser ies plots for four diﬀerent chronological resolution of inter- v als for the city of Sydney . Initially , we plotted with a time in ter v al unit of one min ute. The subsequent plots come from their previous plots but with one less or- der of resolution of time int erv al. A signiﬁcant bur s t is o mni-present from ﬁner to mo s t abs tract time res o- lutions. W e als o obser ved this b ehavior in other c ities and we will further investigate in the future work by using diﬀeren t type of Hurst estimation[10]. 6. CONCLUSION A ND FUTURE WORK In this pap e r w e introduced a nov el metho d to collec t large-s cale vehicular netw ork da tasets using the alwa ys av ailable online traﬃc w eb c a ms. Thes e webcams are al- ready deploy e d by gov er nments, companies, or priv ate and he nc e it is an inexp ensive way for da ta collection. They pr ovide 2 4 hours monitoring o n the data collection po ints and hav e refre s h rate as high as seconds, whic h is very desirable for ﬁne gra ined data collection. W e col- lected 7.5 TB of vehicular ima ge data fro m mo r e than 4,500 ca mer as distr ibuted in 10 cites ov er 4 continen ts. W e believe these large a mount of data will b e very im- po rtant for mo bile netw ork resear chers to understand the dynamics of the global cities and as a key step to realistic mo del v ehicular c o mmunication netw ork s. Our 8 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0 10000 20000 Chronological Time (1 min, scale=10 6 ) (a) Density 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0 10000 20000 Chronological Time (10 min, scale=10 5 ) (b) Density 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0 5000 10000 Chronological Time (100 min, scale=10 4 ) (c) Density 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 5000 Chronological Time (1000 min, scale=10 3 ) (d) Density Figure 9: T raﬃc density at diﬀerent ti me scale on the Sydney dataset. 9 results strongly s uggest a r evisit to the genera l case of exp onential pattern as mo deling distribution for the v e- hicular traﬃc. Finally , the implication of long range depe ndence indicate the eﬀect o f traﬃc on the infra s- tructure of road net works. Acknowledgmen t W e are thankful to Geor gios Smar a gdakis, Harold Chi Liu, Ma ria Gonzalez Garcia , Ranjan Pal and Shiv a Sun- daram for their insightful co mments. 7. REFERENCES [1] F. Bai and B. Krishnamachari. Spatio-temp ora l v ariations of v e hic le traﬃc in v anets: facts and implications. In Pr o c e e dings of the sixth ACM international workshop on V ehiculAr InterNETworking , V ANET ’09, pa ges 43–52 , New Y ork, NY, USA, 200 9. ACM. [2] F. Bai, N. Sadagopa n, and A. Helmy . Impor tant: A framework to systematically analyze the impact of mobility on p er fo rmance of r o uting pr oto cols for adho c netw orks. In IN FOCOM , 20 03. [3] L. Briesemeis ter , L. Schafers, and G. Hommel. Disseminating messages a mong highly mo bile hosts based o n inter-vehicle communication. In Intel ligent V ehicles Symp osium, 2000. IV 2000. Pr o c e e dings of t he IEEE , page s 522 –527, 2000. [4] V. Bychk ovsky , B. Hull, A. K. Miu, H. Balakrishnan, and S. Madden. A Measurement Study of V ehic ula r In ter net Acces s Using In Situ Wi-Fi Net works. In 12th ACM MOBICOM Conf. , Los Angeles, CA, September 20 06. [5] R. E. Chandler, R. Herman, a nd E . W. Mon troll. T raﬃc dynamics: Studies in car following. OPERA TIONS R ESEARCH , 6(2):165 –184 , 19 5 8. [6] A. Elg a mmal, D. Harwoo d, and L. Davis. Non-parametr ic mo del for ba ckground subtraction. In D. V er non, e dito r, Computer Vision ECCV 2000 , volume 1843 of L e ctur e Notes in Computer S cienc e , pages 751–76 7. Springer B erlin / Heidelber g , 2 000. [7] A. Halati, H. Lieu, and S. W alker. CORSIM-corr idor traﬃc simulation mo del. In Pr o c e e dings of t he T ra ﬃc Congestion and T r aﬃc Safety in the 21st Centu r y Confer enc e , pages 570–5 76, 1997. [8] P . Hui, R. Mortier, M. Pi´ orko wski, T. Henderso n, and J . Crow cr o ft. P lanet-scale h uma n mobilit y measurement. In Pr o c e e dings of the 2nd ACM International Workshop on Hot T opics in Planet-sc ale Me asur emen t , HotPlanet ’10, pages 1:1–1:5 , New Y o rk, NY, USA, 2 010. ACM. [9] W. E. Leland, M. S. T a qqu, W. Willinger , and D. V. Wilso n. O n the self-simila r nature o f ethernet tr aﬃc (extended version). IEEE/A CM T r ans. Netw. , 2 :1–15 , F ebruary 1994. [10] R. Lienhart and J. Maydt. An extended set o f haar-like features for r apid o b ject detection. In Image Pr o c essing. 2002. Pr o c e e dings. 2002 International Confer enc e on , volume 1, pages I–900 – I–90 3 vol.1, 2 002. [11] N. E . Lownes and R. B. Machemehl. Vissim: a m ulti-parameter sensitivity analysis. In Pr o c e e dings of t he 38th c onfer enc e on Winter simulation , WSC ’0 6, pages 14 0 6–14 13. Win ter Sim ulation Conference, 2006. [12] B. Mandelbr o t a nd J. W. V an Ness. F ractional Brownian Motions, F ractional No ises and Applications. SIAM R eview , 1 0 (4):422– 437, 1 968. [13] Q. Meng and H. L. K ho o. Self-similar characteristics of vehicle arriv al pattern o n highw ays. J ournal of T r ansp ortation Engine ering , 135(11 ):864–8 72, 2009 . [14] J. Ott and D. Kutscher. Drive-thru internet: Ieee 802.11 b for ” automobile” users. In I N FOCOM 2004. Twenty-thir d AnnualJoint Confer enc e of the IEEE Computer and Communic ations S o cieties , volume 1, pages 4 vol. (xxxv+2 866), march 20 04. [15] M. Picca rdi. Background subtr a ction tec hnique s : a r eview. In Systems, Man and Cyb ernetics, 2004 IEEE International Confer enc e on , volume 4, pages 3 099 – 31 04 vol.4, o ct. 20 04. [16] J. Singh, N. Bambos , B. Sriniv asan, and D. Cla win. Wire le ss lan p erfor mance under v ar ied stress co nditions in vehicular traﬃc scenarios. In V ehicular T e chnolo gy Confer enc e, 2002. Pr o c e e dings. VTC 2002-F al l. 2002 IEEE 56th , volume 2, pages 74 3 – 74 7 vol.2, 200 2. [17] C. Stauﬀer and W. Grimson. Adaptiv e background mixture mo dels for r eal-time tracking. In Computer Vision and Patt ern Re c o gnition, 1999. IEEE Computer S o ciety Confer enc e on. , volume 2, pages 2 vol. (xxiii+63 7+66 3), 1999. [18] Z. Sun, G. Bebis, and R. Miller. On- road v e hic le detection: A review. IEEE T r ansactions on Pattern Analysis and Machine Int el ligenc e , 28:694 –711 , 2006. [19] N. Wisitp ong phan, F. Ba i, P . Mudalige, V. Sadek ar, and O. T onguz. Routing in s pa rse vehicular ad ho c wireless netw orks. Sele cte d Ar e as in Communic ations, IEEE Journal on , 25(8):153 8 –1556 , o ct. 2007. [20] J. Y eo, D. Kotz, and T. Henderson. Cr awdad: a communit y resource fo r archiving w ir eless da ta a t dartmouth. SIGCOMM Comput. Commun. R ev. , 36:21– 22, April 2006. [21] X. Zha ng, J. Kur ose, B. N. Levine, D. T owsley , and H. Zhang . Study of a bus-based disruption-toler a nt netw or k: mobilit y mo de ling 10 and impa c t on ro uting. I n Pr o c e e dings of the 13th annual ACM international c onfer enc e on Mobile c omputing and networking , MobiCom ’07, pages 195–2 06, New Y o rk, NY, USA, 2 007. ACM. 11

Towards Realistic Vehicular Network Modeling Using Planet-scale Public Webcams

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment