Benchmark Dataset for Timetable Optimization of Bus Routes in the City of New Delhi

Benchmark Dataset for T imetable Optimization of Bus Routes in the City of Ne w Delhi Anubhav Jain, A vdesh Kumar , Saumya Balodi, Pra vesh Biyani IIIT Delhi, India { anubhav15129, avdesh15135, saumya15172, praveshb } @iiitd.ac.in Abstract —Public transport is one of the major f orms of transportation in the world. This makes it vital to ensure that public transport is efﬁcient. This resear ch presents a novel real-time GPS bus transit data for over 500 routes of buses operating in New Delhi. The data can be used for modeling various timetable optimization tasks as well as in other domains such as trafﬁc management, travel time estimation, etc. The paper also presents an approach to reduce the waiting time of Delhi buses by analyzing the trafﬁc beha vior and pr oposing a timetable. This algorithm serv es as a benchmark f or the dataset. The algorithm uses a constrained clustering algorithm for classiﬁcation of trips. It further analyses the data statistically to provide a timetable which is efﬁcient in learning the inter - and intra-month variations. Index T erms —Timetable Optimization, Bus Scheduling, Data Analytics, Bus T ransit Dataset I . I N T RO D U C T I O N Public transport is one of the most popular means of transportation in v arious metropolitan cities across the globe. According to the Economic Surve y of Delhi 2005-06, buses account for nearly 60% of the total demand. While it is well understood that the public transport helps in combating air pollution and congestion caused due to single-occupancy vehicles, the usage of b uses in Delhi and other cities in India has seen a nominal decline while the overall trav el demand has simultaneously increased. One of the main reasons for this decline in the city of Delhi (and other cities in India) is the lack of reliability of the bus routes. The timetable is often not made by the transit authorities. Moreov er , it is often outdated soon due to the rapid change in infrastructure and the trafﬁc conditions resulting in degradation in the reliability of buses. Finally , this decrease in reliability leads to unknown waiting times at the b us stops for the passengers. Due to the absence of a timetable, most bus trips are operated in an ad-hoc fashion making it extremely difﬁcult for the passengers to trust the public transport network leading to a decrease in passenger trips. Interestingly , the various trips in a given bus-route still follow a certain pattern in a gi ven day thanks to the pattern in the trafﬁc conditions throughout the day . In other words, ev en when the transit operators do not follow an “e xplicit” timetable, there is an “implicit” timetable that is followed which is not completely random. The main goal of this work is to unearth this pattern and dev elop an operational timetable of the bus routes in the city of Delhi. The efﬁcac y of this “suggested” timetable is measured in terms of the average waiting time a passenger has to endure at various stops in the giv en route assuming she follows the timetable. This waiting time should ideally be lower than when the passenger does not follow the suggested timetable arri ves at the same stop in a random fashion. The paper presents a nov el database containing real-time data for ov er 500 routes of buses operating in Delhi. T o arriv e at the timetable for benchmarking the dataset, the paper samples two routes operated by the Delhi transport corporation. Out of the two routes, one is frequent, while the other operates at an average frequency of thirty minutes. The paper uses the collected GPS feed of the buses. The paper presents an approach which simpliﬁes the prob- lem statement to propose an efﬁcient algorithm for deﬁning the timetable based on GPS bus data collected o ver a period of two months. I I . R E L A T E D W O R K Researchers hav e been exploring the problem of reducing waiting time as well as reducing b unching in b uses. Patnaik et al. [1] applied data mining on data collected using Automatic Passenger Counters. The paper uses clustering to divide data points into different headways based on a decision tree. The method informs whether the existing headway would w ork for the buses. W ang et al. [2] used a sequential clustering method to ensure a ﬁxed order for buses so that time division could be properly applied. They calculated the travel time by incorporating bus dwell time at stops which depends on the number of riders and also the road and intersection time depending on the trafﬁc on the road at that time. Kornfeld et al. [3] proposed an approach to minimize waiting time using two models which are based on the assumption of how people arriv e at the bus stop. Y ang et al. [4] proposed a method to optimize the timetable for the subway system by proposing an integer programming model to maximize the o verlap time with the headway time. W ihartiko et al. [5] also proposed an integer programming model which uses a modiﬁed generic algorithm for timetabling of buses. Chakroborty et al. [6] and Deb et al. [7], a mixed integer nonlinear programming model was proposed for timetabling of buses where the objecti ve was to minimize the total waiting time of passengers. Saarguna wathy et al. [8] studied Open trafﬁc platform which analysis trafﬁc data linked to open street map. They used it to analyze the traf ﬁc conditions in K uala Lumpur . They listed dif ferent roads and junctions with hea vy trafﬁc ﬂow at different times. Sunil et al. [9] proposed a dynamic GPS based time-tabling algorithm for public transport that can predict the waiting time considering the real-time location of the user and the bus, and thus predicting estimated time of arri val. The major issue with current research in the domain of timetable optimization for buses is that there is not a standard dataset which is being used by researchers. V arious algorithms are applied on various datasets, which does not provide conclusiv e e vidence of improvement in the state of the art. The proposed algorithm used for benchmarking the pro- posed dataset goes be yond the ones in literature in two ways. Firstly , literature does not take the simplicity of the algorithm into consideration. This is vital primarily as the IT Department of most transportation corporations are not educated enough to work with complex algorithms. The proposed algorithm closely re volves around the traf ﬁc behavior to provide the most optimal timetable. This algorithm is being deployed to update the timetable of over a hundred buses in New Delhi, India which would affect the li ves of millions of daily bus users. Fig. 1. Route for bus no. 425 I I I . P R O P O SE D D A TAB A S E The proposed bus transit database 1 contains (i) Static and (ii) Dynamic data for b uses operating in Delhi. The dynamic data for three months has been stored separately which can be used for applications which require pre vious data. This is the ﬁrst of its kind database which provides access to real-time transit data. This dataset would provide means for standardization of timetable optimization algorithms which was not possible earlier as researchers were using classiﬁed datasets which weren’t av ailable publicly . This information can easily be used for building real-time applications which span from building timetable optimization algorithms, which 1 https://bus-data.github .io/ Fig. 2. Route for bus no. 534 Seconds Probability Density Fig. 3. Probability distrib ution of the arriv al time at stop 17 for route no. 425 at different times of the day . has been benchmarked in this paper along with v arious other applications such as li ve GPS tracked monitoring of trafﬁc, tracking anomalies in live traf ﬁc data for security purposes, trav el time estimation and optimization. As the liv e GPS data is sampled at e very 10s for all the routes, it provides substantial information about different roads on in the city and the expected tra vel speed possible on those routes. For creating the database, the paper has pre-processed the data using static information about the bus stops for that particular route. Through the initial step of pre-processing, the data was di vided into rele vant b us stops which were considered to be nodes. T o do so, ev ery raw data point which lied within a 50-meter radius of the coordinate of a bus stop was mapped to the stop. The 50 meter threshold is chosen keeping in mind situations where the bus tra veling at a high speed might not store a data point close to the bus stop. By doing so, the raw data was narrowed down to the stop based data where all the nodes were the stops. This brought uniformity for further processing of the data. The bus routes are represented in ﬁgure 1 and ﬁgure 2 for routes 425 and 534 respecti vely . The route from the ﬁrst stop to the last is considered as the up route, and the opposite direction is referred to as the down route. A. Dataset Statistics The dataset captures the following two categories, men- tioned belo w . 1) Dynamic Data: The data for all 543 routes and 3464 stops are provided real-time. The data was in the form of Protocol Buffers and consists of information about all the buses operating at that time. The data provided has a frequency of 10s, where each data packet contains the date, time, latitude and longitude values for each b us route along with bus-speciﬁc details like number plate, route number, direction of route (up/ down). This can easily be accessed for real-time applications and can be stored and used as statistical data. The data tracks all the buses throughout the time they are active. 2) Static Data: The static data pro vides information about the routes, stops, trips and stops times for all the buses which are being modeled dynamically . The information contains speciﬁc information to use dynamic data and extract the required information. The dynamic data is coded using the information provided in the static data. This compresses the dynamic data packets and makes it more efﬁcient for real-time applications. T o use this data for optimizing timetable, the dynamic data has been stored as for ov er three months in a database which can be directly used for the application of timetable optimization algorithms. The data has been decoded and is provided in a directly usable format. This has been done for routes 425 and 534. These datasets hav e also been benchmarks using the proposed approach. I V . P R O P O S E D A P P R OA C H Current b us timetables are formulated without considering the trafﬁc behavior based on time of the day and the day of the week. It has been observed from the data that these are signiﬁcant components which af fect the arriv al time of the bus at a particular stop which needs to be carefully scrutinized to provide a more ef ﬁcient timetable. The paper provides benchmarks for two routes- 425 and 534. These buses were speciﬁcally chosen as they differ quite signiﬁcantly in their frequency and reliability . T o model an efﬁcient timetable, the paper looks at these tw o buses where 534 is extremely frequent and has an a verage waiting time of around 5 minutes while buses on route 425 are more unreliable and ha ve an average waiting time close to 22 minutes. The v ariables are deﬁned as: • T otal bus stops N are present on the route. • The data has been taken for d days. • i represents the trip variable, ∈ (0 , x k ) . • j denotes the stop variable, ∈ (0 , N ) . • k represents the day v ariable ∈ (0 , d ) . • x k represents the number of trips on the k th day . • t k i,j represents the arri val time of the bus at the j th stop for the i th trip on the k th day . T o reduce the effect of noise in the data, the algorithm samples the data at every 3 rd stop. The algorithm can be summarized in two stages, deﬁnition of the starting time at the ﬁrst stop and calculation of arriv al time for the follo wing stops based on the ﬁrst stop. The optimization problem ﬁnds the most optimal b t i,j such that, b t i,j = min V ar [ t k i,j ] (1) A. Deﬁning the starting time As the timetable depends on the departure time of the bus from the ﬁrst stop, the algorithm uses K-means clustering [10] for this step. Unlike the standard clustering algorithm, which minimizes the inter-class v ariance, this approach minimizes the variances while keeping a minimum distance between the clusters. The minimum distance between two clusters is equiv alent to the standard frequency of the b uses as suggested by the Transportation Department. Each of the set of data points in each cluster created using this approach is represented by C ( n ) at iteration n which contains M ( n ) data points. The centroid of the l th set of cluster is denoted by µ ( n ) l at iteration n and the total number of clusters are c . Each of these cluster sets can be mathematically represented using equation 2, where the new data point being classiﬁed is tp . C ( n ) l = { tp : || tp − µ ( n ) l || 2 < || tp − µ ( n ) m || 2 , ∀ m ; 1 < m < c } (2) The deﬁnition of new clusters is conditioned on equation 3, where T 1 is the frequency of the bus. {|| µ ( n ) l − µ ( n ) m || > T 1 , ∀ l , m ; 1 < l, m < c } (3) The mean is updated using the equation 4. µ ( n +1) l = P t k, ( n ) i,j ∈ C ( n ) l t ( n ) ,k i,j | C ( n ) l | = µ ( n ) l ∗ M ( n ) + tp ( n +1) M ( n ) + 1 (4) In the ﬁrst step, the av erage departure time for a bus is calculated. While considering each new bus, the nearest mean was searched, and if the difference between the departure time and the nearest mean is less than the threshold T 1 , then the mean of that cluster was updated by including this data point as well. Otherwise, this would be considered as a new cluster . Finally , only if the total number of points in a cluster is greater than a threshold T 2 , the starting time is considered valid. Otherwise, the data point is considered to be an outlier case and is not considered for further calculations. Due to noise in the data and inconsistency in the starting time of the ﬁrst bus, the paper grid searches for the best threshold to ﬁnd the appropriate starting time which reduces the waiting time as well as the b unching of buses. The threshold T 2 is taken as ten days which is one-third of the total number of days for which the data is used. B. Adjacent Stops For ev ery subsequent bus stop, the av erage time to reach that bus stop is calculated from the ﬁrst node for ev ery 15- minute interval. The timetable for these nodes is deﬁned as the start time plus the av erage time taken to reach that stop (in the particular time range). The optimal time for ev ery adjacent stop can be written as, b t i,j = b t i, 1 + P d k =1 ( t k i,j − b t i, 1 ) d ∀ j ∈ (2 , N ) (5) Figure 3 shows that the arriv al time of the b us at each stop is in the form of a Gaussian distribution which has higher variance as the day progresses. By taking the mean in the abov e approach, the algorithm minimizes the waiting time, which is deﬁned as the v ariance of this distrib ution thus giving the most optimal timetable. The data is ﬁrst divided into 15-minute time slots for more accurate calculation and estimation of trafﬁc behavior . This dynamic time-based timetable takes into consideration the regular changes which come in the trafﬁc behavior e very 15 minutes. C. Calculating W aiting T ime The pre-timetable waiting time (preWT) is deﬁned as the expected time a person will w ait if he/she arriv es at a b us stop at any random time. If the ne xt bus arriv es after N 0 minutes from the pre vious bus then the expected waiting time for the next bus at that stop will be: pr eW T = N 0 X n =1 n/ N 0 = ( N 0 + 1) / 2 (6) The algorithm av erages this over all the trips for a bus route to get the mean waiting time for different stops. For calculating the post-timetable waiting time, equation 7 has been used. Where the mentioned instantaneous post- timetable waiting time (ipostWT(i,j,k)) has been averaged o ver all the days/ trips for various experiments. ipostW T ( i, j, k ) = min x ( t k i,j − b t x,j ) s.t. t k i,j > b t x,j (7) V . R E S U LT S The paper proposes an approach to create a timetable for the two b uses using a relatively direct procedure which resulted in a signiﬁcant decrease in the waiting time which has been applied to the tw o bus routes - 425 and 534. It has been applied separately to the up and down routes of the buses. The following experiments have been performed to show the ef ﬁciency of the algorithm. Bus Route Number Waiting Time (Minutes) 0 5 10 15 20 25 425-Up 425-Down 534-Up 534-Down Pre-Timetable Waiting Time Post-Timetable Waiting Time Fig. 4. Comparison of W aiting Time at the 1 st Node for all the Buses. Time Waiting Time (Minutes) 0 10 20 30 4-8 8-12 12-16 16-20 20-24 Pre-Timetable Waiting Time Post_Timetable Waiting Time Fig. 5. Comparison of waiting time for b us no. 425 Do wn Route before and after the timetable when training and testing on alternate days. Time Waiting Time (Minutes) 0 10 20 30 4-8 8-12 12-16 16-20 20-24 Pre-Timetable Waiting Time Post-Timetable Waiting Time Fig. 6. Comparison of waiting time for b us no. 425 Up Route before and after the timetable when training and testing on alternate days. • The ﬁrst experiment shows the performance of using K- Time Waiting Time (Minutes) 0 2 4 6 8 4-8 8-12 12-16 16-20 20-24 Pre-Timetable Waiting Time Post-Timetable Waiting Time Fig. 7. Comparison of waiting time for bus no. 534 down route when training and testing on alternate days. Time Waiting Time (Minutes) 0 2 4 6 8 4-8 8-12 12-16 16-20 Pre-TimeTable Waiting Time Post-TimeTable Waiting Time Fig. 8. Comparison of waiting time for bus no. 534 up route when training and testing on alternate days. Time Waiting Time (Minutes) 0 5 10 15 20 25 4-8 8-12 12-16 16-20 20-24 Pre-Timetable Waiting Time Post-Timetable Waiting Time Fig. 9. Comparison of waiting time for b us no. 425 Do wn Route before and after the timetable when testing on second month data. Time Waiting Time (Minutes) 0 10 20 30 4-8 8-12 12-16 16-20 20-24 Pre-Timetable Waiting Time Post-Timetable Waiting Time Fig. 10. Comparison of waiting time for bus no. 425 Up Route before and after the timetable when testing on second month data. Time Waiting Time (Minutes) 0 2 4 6 8 4-8 8-12 12-16 16-20 20-24 Pre-Timetable Waiting Time Post-Timetable Waiting Time Fig. 11. Comparison of waiting time for bus no. 534 down route when testing on second month data. means clustering for classiﬁcation of the data into trips. This experiment is speciﬁc to the ﬁrst node, i.e., the starting point of the bus. The algorithm has been trained on the ﬁrst-month data, and it has been tested on second- month data. • T o judge the model for the formation of the timetable of the remaining stops as well as learning intra-month variations, the model was trained and tested on alternate days of the ﬁrst and second month. The data was trained on all the odd days of the two months, and it was tested on e ven days. • For learning inter-month variations in a data which already contains high randomness, the algorithm was trained on ﬁrst-month data and tested on the second- month data. The results have been presented in the form of bar graphs showcasing the difference brought by the introduction of the proposed timetable. The impact of clustering for the initial node, by following the ﬁrst protocol, has been presented in ﬁgure 4. This sho ws that there a signiﬁcant amount of randomness in the starting time of these buses, as the current set timetable is not follo wed. The proposed approach has Time Waiting Time (Minutes) 0 2 4 6 8 10 4-8 8-12 12-16 16-20 Pre-Timetable Waiting Time Post-Timetable Waiting Time Fig. 12. Comparison of waiting time for bus no. 534 up route when testing on second month data. been able to reduce this randomness of the starting time. If this is strictly follo wed it w ould yield much more promising results for the subsequent stops as well. As the resultant increase in waiting time of the following bus stops is the addition of the randomness in the 1 st stop along with the randomness in the trafﬁc behavior . The ﬁrst can be controlled by proper implementation of the timetable by public/ pri vate transportation departments. For the second experiment, the paper looks at the formation of a timetable by looking at the inter-month variations in the data. In doing so, the model sho ws its ef ﬁciency in learning the randomness which is present within the constraints of the same month. Figure 7 shows a comparison of the waiting time without an y timetable with the waiting time post the new timetable for b us no. 534. Figure 8 sho ws the graph for the the up-route while ﬁgure 7 sho ws the graph for the do wn-route. Both the graphs compare the waiting time at different time intervals. Lastly , the model has been tested on a completely unseen data after being trained on the ﬁrst-month data. The results upon being tested on the second-month data have been pre- sented in ﬁgure 9 and ﬁgure 10 for the up and down route of bus number 425 respectively . The impact of a timetable is more in the case of bus route 425 due to higher irre gularity . Due to the high frequency of b us number 534 introducing large changes is not possible. During the calculation of ﬁnal waiting time, the randomness in the starting time due to human constraints hav e been taken into consideration. As the ﬁnal waiting time is calculated using the same probabilistic distrib ution of the starting stop. Thus, showing that e ven if the timetable is not followed closely , it would still impact the waiting time considerably . These situations make this essential for real-world applications where human constraints are signiﬁcant. V I . C O N C L U S I O N A N D F U T U R E W O R K The paper proposes a nov el and ﬁrst of its kind freely av ailable benchmark dataset for analyzing real-time b us transit. The data captures real-time v ariations in traf ﬁc, and moreo ver , it can be used for standardization of timetable optimization algorithms, which are currently applied on a variety of datasets which aren’ t publicly av ailable. For benchmarking the dataset, the paper presents an approach to propose a timetable taking into consideration trafﬁc variations across time. The paper shows the efﬁcienc y of this algorithm in reducing the waiting time of passengers on Delhi bus routes. The algorithm is applied to the Delhi T ransportation Corporation buses number 425 and 534. While this approach looks at the efﬁciency with Delhi based buses, it w ould be interesting to see the implementation of buses from different cities. The paper assumes that the trafﬁc behavior does not v ary to a large extent across days. This is an- other factor which can be added while deﬁning the timetable. Another exciting area of research would be incorporating a limit on the number of passengers. The timetable can be established to keep a check on such factors which inﬂuence passenger satisfaction. R E F E R E N C E S [1] J. Patnaik, S. Chien, and A. Bladikas, “Estimation of bus arriv al times using APC data, ” Journal of public transportation , vol. 7, no. 1, p. 1, 2004. [2] J. W ang and Y . Cao, “Operating time division for a bus route based on the recovery of GPS data, ” Journal of Sensors , vol. 2017, 2017. [3] S. K ornfeld, W . Ma, and A. Resnikoff, “Optimizing bus schedules to minimize waiting time, ” Operations Resear ch II , pp. 21–392, 2014. [4] X. Y ang, X. Li, Z. Gao, H. W ang, and T . T ang, “ A cooperative scheduling model for timetable optimization in subway systems, ” IEEE T ransactions on Intelligent T ransportation Systems , vol. 14, pp. 438– 447, March 2013. [5] F . Wihartik o, A. Buono, and B. Silalahi, “Integer programming model for optimizing bus timetable using genetic algorithm, ” IOP Conference Series: Materials Science and Engineering , vol. 166, no. 1, p. 012016, 2017. [6] P . Chakroborty , K. Deb, and B. Sriniv as, “Network wide optimal scheduling of transit systems using genetic algorithms, ” Computer Aided Civil and Infrastructur e Engineering , vol. 13, pp. 363 – 376, 12 2002. [7] K. Deb and P . Chakroborty , “Time scheduling of transit systems with transfer considerations using genetic algorithms, ” Evolutionary compu- tation , vol. 6, pp. 1–24, 02 1998. [8] S. Manogaran, M. Ali, K. M. Y usof, and R. Suhaili, “ Analysis of vehicular trafﬁc ﬂo w in the major areas of Kuala Lumpur utilizing open-trafﬁc, ” in AIP Conference Pr oceedings , vol. 1883, p. 020013, AIP Publishing, 2017. [9] N. Gunjal Sunil, V . Joshi Ajinkya, C. Gosavi Swapnil, and B. Kshir- sagar Vyanktesh, “Dynamic bus timetable using GPS, ” International Journal of Advanced Researc h in Computer Engineering & T echnology (IJ ARCET) V ol , vol. 3, no. 3, pp. 775–778, 2014. [10] J. A. Hartigan and M. A. W ong, “ Algorithm as 136: A k-means clustering algorithm, ” J ournal of the Royal Statistical Society . Series C (Applied Statistics) , v ol. 28, no. 1, pp. 100–108, 1979.

Benchmark Dataset for Timetable Optimization of Bus Routes in the City of New Delhi

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment