Crowds, Bluetooth, and Rock-n-Roll. Understanding Music Festival Participant Behavior
In this paper we present a study of sensing and analyzing an offline social network of participants at a large-scale music festival (8 days, 130,000+ participants). We place 33 fixed-location Bluetooth scanners in strategic spots around the festival …
Authors: Jakob Eg Larsen, Piotr Sapiezynski, Arkadiusz Stopczynski
Cr o wds, Bluetooth and Roc k’n’Roll: Under standing Music Festiv al P ar ticipant Beha vior Jak ob Eg Larsen jaeg@dtu.dk Piotr Sapiezynski pisa@dtu.dk Arkadiusz Stopczynski arks@dtu.dk Morten Mørup mm@imm.dtu.dk Rasmus Theodorsen ras.the@gmail.com T echnical Uni versity of Denmark Figure 1. Unique Bluetooth devices observed throughout the 8 day festival by 33 proximity-based scanners, with the color intensity corresponding to the number of observations in one hour time windows. The scanners are grouped by stages and scanners at the main stages were deploy ed on day 4. ABSTRA CT In this paper we present a study of sensing and analyzing an offline social network of participants at a large-scale music festiv al (8 days and 130,000+ participants). Spatio-temporal traces of participant mobility and interactions were collected from 33 Bluetooth scanners placed in strate gic locations at the festi v al area to disco ver Bluetooth-enabled mobile phones carried by the participants. W e analyze the data on two lev- els. On the micro le vel, we use a community detection al- gorithm to re veal a v ariety of groups formed by the partici- pants. On the macro le vel, we employ an Infinite Relational Model (IRM) in order to recov er the structure of the net- work related to participants’ music preferences. The obtained structure in the form of clusters of concerts and participants is then interpreted using meta-information about music gen- res, band origins, stages, and dates of the performances. W e show that the concerts clusters can be described by one or more of the meta-features, ef fectiv ely revealing preferences of participants. Finally , we discuss the possibility of emplo y- ing the described method and techniques for creating user - oriented applications and extending the sensing capabilities during large-scale e vents by introducing user in volvement. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. T o copy otherwise, or republish, to post on serv ers or to redistrib ute to lists, requires prior specific permission and/or a fee. INTRODUCTION Mobile phones have become increasingly ubiquitous and an integrated part of our ev eryday life o ver recent years. This has led to a number of new possibilities in studies of human mo- bility , behavior , and interactions, as mobile phones can now be used to track people’ s activity . This area has recently re- ceiv ed increased attention with studies of mobility by means of large phone data sets [25, 7] or sensor data collected on modern smartphones [6, 15]. These studies ha ve reported in- sights into fundamental human mobility patterns with results indicating very high le vels of predictability . In this paper we present a study of more than one hundred thousand music festiv al participants mobility , group forma- tion, and music preferences at a large music festiv al in Den- mark by using Bluetooth probing to disco ver mobile phones carried by festiv al participants around the festi v al area. The use of Bluetooth technology as a way to gain insights into human behavior and mobility has also received increased attention recently [29]. Bluetooth technology has been ap- plied in se v eral dif ferent domains, and different schemes ha ve been used. In a study of mobility by Hui et al. [12], partici- pants were provided with a small acti ve Bluetooth de vice that they were carrying throughout a conference to map partic- ipant mobility and events attended. Most commonly Blue- tooth scanners have been situated in fix ed locations to probe the presence of discoverable Bluetooth devices in proxim- ity , which is also the approach presented in this paper . This method has been used for dif ferent applications including es- timating the queue length e xpressed in waiting time in air - port security areas [3, 8]. Lar ge scale studies of mobility by means of Bluetooth probing have also included tracking of vehicles for the purpose of studying traffic patterns [10], and large scale race ev ents is another example [26]. In a related study O’Neill, Kostakos et al. [22, 19] concentrate on the mo- bility and interactions of participants with regard to semantic meaning of locations where the Bluetooth scanners were de- ployed. They show profoundly different patterns of presence in places of different social function, for e xample a busy street vs. a bar . Unfortunately , ev en though the y deployed more than 90 scanners, they only refer to four categories of loca- tions - a street, the uni versity entrance, an office and a bar . It is not clear how much insight they gained into the social structure of other locations. In the context of human mobility in festiv al settings, a study by V ersichele et al. [30] also applies proximity-based Blue- tooth tracking to study mobility patterns. In their study 22 scanners were used ov er a duration of 10 days with 1.5 mil- lion participants. Howe ver the general trend in their study is that participants only visit the festiv al short-term (typically one day), whereas the participants in this study are present at the festiv al area for up to 8 days, and select among 160 mu- sic concerts and multiple other e vents for the duration of the festiv al. Where e xisting studies applying Bluetooth probing ha ve fo- cused on describing mobility patterns, this study in v olves a richer semantic conte xt with information about concerts, mu- sic, genres, scenes, events, and participants, allo wing a more detailed contextual analysis of participant behavior , and mo- bility . More recently mobile sensor frameworks have been made av ailable [1, 18] enabling the collection of richer data sets capturing human behavior , mobility , and data for mapping social interaction through multiple channels. An advantage of having a mobile sensor framew ork on the smartphone is the potential in combining multiple sensor data to obtain finer granularity information and more robust estimations. For in- stance, data from sensors such as GPS, W iFi, GSM, and ac- celerometer can be combined to build a location estimator which works in different contexts (outdoors and inside the buildings) with higher accuracy than any single of these sen- sors can provide [21]. Ho wev er , a challenge in these stud- ies is the deployment which in volv es a mobile application running on participant devices. Therefore these studies have typically been carried out on a smaller selected population, but often over longer periods of time. As a result, the obser- vational conditions and especially population sampling may introduce unknown biases. Although a mobile client (smart- phone) may lead to v ery rich data sets, this methodology has a different set of challenges in terms of deployment at the festi- val. This includes supporting multiple clients and that partici- pants ha ve to acti vely install an application containing mobile sensing components. In this study the duration of the e vent is only 8 days, but using the Bluetooth probing technique we hav e access to a larger population. In the follo wing sections we descrbie the methodology , lim- itations, and challanges of data collection using Bbluetooth scanning system in an en vironment with a limited and short- liv ed technical infrastructure. Ne xt we present the data ac- quired during the 8 days of the festi v al and discuss the results of the Bluetooth disco very process. The chapter is concluded with a discussion of potential applications and the insights that can be obtained from studying the spatio-temporal data that can be acquired through Bluetooth probing. ROSKILDE FESTIV AL Roskilde Festiv al is one of the six biggest annual music fes- tiv als in Europe and is held south of Roskilde in Denmark. It started in 1971 and since 2009 has been attracting more than 100,000 participants annually (with up to 30% being v ol- unteers). In 2011 it gathered an estimated 130,000+ people. The festi val lasts for 8 full days, starting Saturday e vening and finishing Sunday at midnight. For the first 4 days only the camping grounds and a small festi v al zone are open, in- cluding a single stage (P avilion Junior) featuring upcoming Nordic bands. On Thursday afternoon the main grounds are opened, the major music events start and last for the next 4 days. The main festiv al grounds cov er about 0.2 km 2 with 6 stages of v arious sizes. The festiv al campsite, located south to the main festiv al grounds cov ers nearly 1 km 2 . In addition to the stages, the grounds include cultural zones, shops, restaurants, artistic installations etc. Participants can freely move through the grounds in the day time; once the concerts are finished for the day , the main grounds are closed and then open in the morning hours next day . Some areas in the main grounds are off limits for participants, such as backstage areas or technical areas behind merchandise passages. In 2011 the participants consisted of 77,500 festi v al guests, around 3,000 press representatives, 3,000 artists, 30,000 vol- unteers, 20,000 one-day guests plus an unknown number of guests o ver 60 and under 10 years old – we estimate that at least 130,000 people present at the festiv al during the 8 days in total. 54% of the population were w omen and approx 22% of the audience visited the festiv al for the first time. The av- erage age was 23 years and a typical participant was a student living in a Scandinavian city . 80% of the participants came from Denmark, 8% from Norway , 4% from Sweden, and 8% from other countries 1 . The six stages host concerts of different sizes and genres: • Orange stage, capacity 60,000+, all genres • Arena stage, capacity 17,000, all genres • Cosmopol stage, capacity 6,000, hip-hop, electronica, ur- ban world music • Odeon stage, capacity 5,000, mixed, mostly rock • Pavilion stage, capacity 2,000, mix ed, mostly rock • Gloria stage, capacity 1,000, mixed, experimental 1 Source: http://roskilde- festival.dk/ METHODOLOGY Our study of human mobility in the festiv al settings relies on discov ering Bluetooth-enabled devices that are operating in discov erable mode. As Bluetooth is a short-range low-po wer protocol for implementing W ireless Personal Area Networks (WP AN), it limits the range in which Bluetooth-enabled de- vices can be discovered. It operates on the Industrial, Sci- entific and Medical (ISM) frequency band of 2.4GHz [23]. Communication always happens in master-sla ve mode and is established between new devices with a master de vice send- ing inquiry packets to discov er nearby devices that are in the inquiry scan substate (discoverable). Discov erability of a de- vice commonly needs to be set manually by the user , and can be either limited in time or set to infinite. It is worth notic- ing that for instance Android-based smartphones (until recent versions) only allow time limited discoverability , while iOS devices (iPhone, iPod, etc.) and W indo wsPhone smartphones are only discov erable while the user is interacting with the Bluetooth menu. While this limits the number of potential phones we can discov er significantly , we show that there are still many disco verable de vices. In the present study Bluetooth scanners functioned as master devices, broadcasting inquiry messages (scanning) continu- ously . Responses from the devices in proximity were silently logged, without any active participation on the user side. This is similar to the approach described in [11] where tracking of the individual in a non-in v asiv e way is considered more suit- able for large-scale studies. The recei ved signal strength in- tensity (RSSI) of the response was not registered. Although it is technically possible to use RSSI to calculate the posi- tion of the discovered device through multilateration [2, 16], the accuracy of the approach v aries depending on the en vi- ronment. Moreov er , due to the limited range of Bluetooth, we considered position accuracy obtained from a single scan- ner (i.e. around 10 meter radius for class 2 Bluetooth de vices) sufficient. Bluetooth Scanner Device Off-the-shelf Nokia N900 smartphones were used as Blue- tooth scanners with custom softw are built for detecting Bluetooth-enabled de vices in proximity . Of f-the-shelf hard- ware was used as a relati vely simple solution, providing 3G communication (necessary for obtaining the results in real time from the large festi val area), data storage, battery po wer (for the ev ents of short power outages), GPS for tracking the device in case it was lost, and finally a Bluetooth module. The data from the scans was stored in a local SQLite database on the device and additionally uploaded to a serv er , depending on the network av ailability . Scanner and uploader applica- tions were running on the smartphone, and extra background processes restarted them if required. This w as to ensure the highest possible av ailability and rob ustness of the system. A scan for discov erable devices typically takes about 30 sec- onds, so scanning performed as frequent as possible results in approximately two scans per minute. Devices that did not upload data to the server for a prolonged period of time were rebooted either by issuing a command via Bluetooth or by manually turning them on and off. In order to minimize this effect, periodical reboot ev ery 24 hours was enforced in the software. The collected data is a time-series of ev ents. Each of the ev ents is described by the time, scanner ID and a Bluetooth MA C address of a discov ered device. This information does not enable us to link the device to the person (such as name or personal identification). Thus, the Danish Data Inspectorate considered the information handled in this project as being non-sensitiv e information about the participants thereby en- abling the observations to be made without special permis- sions or requiring informed consent from the participants. T o ensure that not even the detected devices were identifiable afterwards, the MA C addresses were hashed after extracting information about the vendor . The human-readable identi- fiers (Bluetooth friendly names) of the devices were not re- triev ed in order to improv e the scanning time and to ensure anonymity of the participants. OBSER V A TIONAL STUD Y The data was captured through 33 Bluetooth scanners placed in strate gic positions around the festi v al site, as sho wn in Fig- ure 2. The scanners were placed in the vicinity of the stages, as those were the most interesting, semantically rich spots. Howe v er , since the av ailability of power sources was crucial while choosing the e xact location, and the infrastructure at the festiv al is only temporary , the scanners were mainly located in the shops, beer booths (close to the counters) and mixing areas of the stages. Those locations provided sufficient cov- erage of relev ant areas to discov er patterns in participants’ mobility . Figure 2. Map of Roskilde F estival inner area with indication of the location of Bluetooth Scanners. The orange areas indicate places for the audience for the r espective stages. The Bluetooth scanning data was uploaded in real time via the 3G network for real time processing purposes, but upload of the data was of course subject to network a vailability . Prob- lems with the mobile network connections were occurring due to a high number of mobile phones in a relati vely small physical area, especially during large concerts. Therefore, for some scanners the collected data was uploaded once the con- nection was a v ailable (typically in early morning hours). 7 of the 33 devices were running without manual intervention for the whole period of the festiv al, but the rest had to be main- tained one or more times during the festi val. For instance, if the power had been switched of f for more than about 7 hours (typically in the early morning) the de vices had to be manu- ally turned on. The radius of Bluetooth is limited to about 10 meters for the transmitters used in most of the mobile phones (class 2). This makes it possible to pinpoint the location of the observed de- vices, ho we ver making it a challenge to collect representati ve data in a large area, as it will only be partly covered. The devices observed by a scanner could belong to a person only passing by; on the other hand, a person staying right outside the radius of the co verage even for the whole concert might not be discov ered. Figure 3. Nokia N900 Bluetooth scanner in a protective box attached under a beer booth counter . D A T A COLLECTION AND ANAL YSIS The deployed Bluetooth scanners collected 1,204,725 obser- vations during the 8 days of festi v al activities. This included a total of 8,534 unique de vices disco vered, meaning an av erage of 141 observations of each de vice during the festi val. Over- all, this corresponds to at least 6.5% of the population at the festiv al have been observed in the study , thereby providing a window to understand festiv al participant behavior , mobility , and interactions. T able 1 pro vides an ov ervie w of the observ ations from the 33 scanners used in the study . As can be seen from the table the most unique de vices were discovered by scanner 9, 10, 23, 24, and 25 that are all located around the lar gest stage where most participants would be expected to be seen. Each of those scanners disco vered abo ve 4,000 unique de vices. An ov erview of unique de vices observed throughout the 8 days of the festiv al is sho wn in Figure 1 on the first page. Beyond serving as a unique identification of the de vice the MA C address is structured so the vendor of the device can be determined from first three octets of the address (24 bits) formally known as an ”Organizationally Unique Identifier” (OUI) [13]. The list of the assigned OUIs is managed by IEEE, designated by the ISO Council to act as the registra- tion authority . Some identifiers found in the devices may not correspond directly to the end-product manufacturers, as the y may be registered under subcontractors company . In total, around 70 unique vendors were discovered, howe ver the 7 largest vendors account for 96% of all unique devices and 99% of all observations, as sho wn in Figure 4. MODELING # Obs Uniq O # Obs Uniq O 1 77145 3607 1 18 28844 2302 3 2 44224 1880 9 19 32773 2245 8 3 53706 3091 11 20 34264 4753 15 4 31836 1801 15 21 22022 3473 20 5 33167 3265 16 22 20003 1901 2 6 38834 2120 20 23 43784 4372 29 7 28440 1102 3 24 53695 4404 27 8 40648 893 0 25 55025 4429 51 9 49852 4316 2 26 61706 3290 12 10 45813 4116 28 27 24714 1900 16 11 21714 3467 3 28 32512 1651 8 12 30027 3433 31 29 27944 1491 5 13 60276 2770 11 30 32067 2411 22 14 34202 3159 8 31 15616 2514 21 15 36293 2582 5 32 19190 2553 22 16 22044 1809 3 33 25578 2934 18 17 20280 1227 13 T able 1. An overview of the 33 scanners with numbers of observations (Obs), and unique devices per scanner (Uniq), and unique devices only discover ed per scanner (O). T otal number of unique devices was 8,534. Figure 4. Number of unique observations and devices (log scale). The 7 largest vendors account f or 96% of the devices and 99% of the observa- tions. One of the interesting questions re garding e v ents such as mu- sic festi vals is the internal structure of the crowd: whether people move alone or in groups and how groups are dif fer- ent. In addition, the influence of music taste on collective group decisions on concert selections is interesting as is the mobility of the groups. In attempt to understand what in- sights on these issues we can gain from the data obtained in the presented study , we analyze the data at two le v els. Firstly , we concentrate on the micro le vel by running a community discov ery algorithm. Then, we in v estigate the macro le vel to determine the general trends of attendance and relate the find- ings to the a vailable meta information re garding the schedule of the festiv al and types of artists. Micro Gr oups Modeling W e understand micro groups as sets of people frequently co- occurring in spatio-temporal bins. W e divide the timeline of the entire festi val into 1076 x 10-minute temporal bins, 10 meter radius areas around the scanners create the spatial bins, as shown in Figure 2. A similar technique of inferring social links from spatio-temporal co-occurrences is described in [4]. Out of 8,534 discovered devices we rejected those, which were seen in less than 10 temporal bins or less than 3 spatial- bins. Those devices were considered belonging to partici- pants for whom we do not hav e suf ficient data or being sta- tionary de vices (such as cre w laptops). After this processing 5,339 devices were obtained (63%). For all these common co- occurrences were calculated. The weights of the links were calculated as the number of co-occurrences of participant A with participant B divided by total number of occurrences of participant A (A to B edge). This creates a directed graph, where A can be important to B but not necessarily the other way around. This accounts for the asymmetry in the partic- ipants acti vity and different natures of their relations. For the visualization and subsequent analysis, only links that oc- curred in at least 3 different locations and weight at least 0.5 (seen in 50% of all observ ations of the participant) were cho- sen. Figure 5. Directed graph of discover ed micro-groups of participants. Participants frequently seen together create couples, triangles and larger structur es, providing insights into the internal structures of the festival cr owd. The final constraints on the discovered micro groups are strong: they require that from 130,000 participants, people are seen within 10 minutes in a radius of approx. 10 meters at least half of the times they are observed in total and in at least 3 different locations, to ensure sufficient entropy for mean- ingful modeling. The constraint are imposed on existence of each edge, hence the directed edges. This should ensure that the disco vered motifs are in fact people moving around to- gether . It was found that 12 nodes (de vices) were forming structures (pairs and square) with perfect correlation of oc- currences which we consider de vices belonging to the same person. Based on the disco vered groups, a directed graph can be constructed, with edges indicating the disco vered friend- ships. In total, 574 nodes with 448 edges were detected. The motifs can be seen in Figure 5. The most interesting are the structures with high connecti vity , indicating groups of partic- ipants observed to often mo ve together . The baseline for the micro group detection was calculated using rewiring algorithm [20], shuffling the participants in spatio-temporal bins. For N=35 tests µ = 5 nodes and µ = 3 edges were discovered ( σ = 4 . 32 and σ = 2 . 60 respecti vely). This indicates that the reco vered structures are not an effect of random mov ement of participants but reflect an actual un- derlying structure. The star structure visible in the upper left corner of Figure 5 with multiple inbound edges and none outband is an interest- ing artifact showing a person working in a shop in an area cov ered with several scanners. The person was frequently picked up by 3 scanners (1, 2, and 3) with customers also picked up there b ut independently from each other . Similar artifacts were seen in larger number when the threshold of common co-occurrences was set to 2, since some of the long beer booths had two scanners placed in them. Such star struc- tures with inbound links can be summarized by saying that those places (represented by people working in them) were important to participants, b ut participants were not significant for them. The presented algorithm for detecting micro groups and dis- cov ered structures is a simple e xample of possibly v ery gran- ular analysis of the collected data. W ith extremely small spatio-temporal bins we still recov er over 500 people mov- ing around while belonging to a particular structure. MA CRO GR OUPS MODELING W e combine the spatio-temporal traces with the band sched- ule, to find out which concerts each of participants attended. Next, we assign a set of meta information to each show . This way we establish a richer semantic conte xt and analyze the guests’ motiv ations for choosing particular concerts. The metadata consists of: • genre – based on av ailable Last.fm tags, each band is manually assigned with one genre label from the fol- lowing: electr onic , r oc k/pop , folk/world , hip-hop/rap , metal/punk/har dcor e , other • playcount – number of times Last.fm users listened to mu- sic of a band • country of origin – from the Roskilde Festiv al schedule; the countries hav e been grouped into following cate gories: Denmark , Other Nor dic , USA , W estern Eur ope , Other • scene – from the Roskilde Festiv al schedule • date – from the Roskilde Festiv al schedule Intuitiv ely , the number of people at the concert would be highly correlated with the intensity with which people listen to the bands, i.e. the playcount. T o v erify this assumption, we calculate Pearson’ s correlation between the number of unique devices found during each concert and the logarithm of play- count of the band, see T able 2. W e group the concerts accord- ing to the size of the stage they performed at. As shown in T a- ble 2 there is a small (if an y) positi ve correlation between the popularity of the band and the number of disco vered de vices. This sho ws that people’ s choices regarding the concerts they attend can not be fully accounted for in this way and more Size of stage Small Medium Big ρ 0.2462 0.0351 0.3427 P − V al ue 0.0333 0.8593 0.0091 T able 2. Correlation between popularity of the band (log playcount) and the number of unique devices complex modeling should be used to reveal more interesting patterns. Data pre-processing Our Bluetooth traces are a time-series of events, each of which contains the participant id, scanner id, and time. The goal of the pre-processing stage is to transform the beha vioral time-series data into a binary attendance table, which maps each participant to the concerts she attended. In each e vent, we assign the scanner to the stage where it w as located. Then, we assume that scans which took place between 10 minutes before the starting time of a concert and 1 hour 45 minutes after that moment were tak en “during” this concert. Thus, we determine during which concert, if an y , each e vent happened. This results in a matrix where each element represents the number of times each participant w as scanned at a giv en con- cert. T o indicate whether a giv en participant actually attended a concert, we transform the table to a binary table by setting a threshold on the number of observations. Outlier detection The binary table created in pre-processing contains two cate- gories of outliers. Firstly , there are guests who participated in less than three concerts and are thus irrele v ant in terms of the analysis. Bluetooth devices, which were recorded throughout the festi v al at the same location such as employee cell phones or laptops at a particular stage constitute the second category of outliers. These are defined as entities which participated in at least 70% of concerts at one stage and at least in twice as many concerts at that one stage compared to all the other stages in total. After removing outliers, 5127 attendees are left for further analysis. Metadata pre-processing W e obtain the community assigned tags for each band from Last.fm. There are more than 400 unique tags as- sociated with the participating bands and for our model- ing purposes we need to significantly reduce the dimen- sionality of this data. Based on the most significant tags and manual verification, we assign each band to one par- ticular genre: electr onic , r oc k/pop , folk/world , hip-hop/rap , punk/metal/har dcor e , other . Such categorization is, of course, highly simplified, but provides a satisfactory repre- sentation of kinds of music performed at the Roskilde Festi- val. The Infinite Relational Model W e fit an Infinite Relational Model[17, 31] to the binary at- tendance matrix to re veal the underlying patterns of people’ s behavior at the festiv al. Note, that the Model is oblivious to the accompanying meta information such as genre, band’ s country of origin, date, and location of each show . The infinite relational model ( I R M ) is a model for binary rela- tional data (graphs) and can be characterized by the follo wing generativ e process for bipartite graphs. First, each of the row and column nodes are assigned to a cluster according to the Chinese restaurant process ( C RP ). The C RP is an anal- ogy for building a partition ground up by assigning the first node (i.e. customer in a restaurant) to a table and subsequent nodes (customers arri ving at the restaurant) to an existing ta- ble, i.e. cluster , with probability proportional to ho w many existing customers are placed at the table and at a ne w table, i.e. cluster , with a probability proportional to the parameter α . Customers thereby tend to sit at most popular tables mak- ing the popular tables ev en more popular – an ef fect noted as the rich gets richer . The partition of the nodes induced by the C RP is exchangeable in that the order in which the customers arri ve does not influence the probability of the par- tition[24]. Ne xt, link probabilities are generated which spec- ify the probability of observing a link between clusters; and finally , the links in the network are generated according to these probabilities. For bipartite graph we ha ve the follo wing generativ e process: z (1) ∼ CRP( α (1) ) , Row cluster assignment, (1) z (2) ∼ CRP( α (2) ) , Col. cluster assignment, (2) η `m ∼ Beta( β , β ) , B/t. cluster link pr ob., (3) A ij ∼ Bernoulli( η z (1) i z (2) j ) , Link. (4) Inference in the I R M model, i.e. determining the posterior distribution of the cluster assignments, entails mar ginalizing ov er the link probabilities, which can be done analytically . This is a major adv antage of the I R M model, enabling infer - ence by Markov chain Monte Carlo (MCMC) sampling over the cluster assignments alone. Marginalizing over link prob- abilities, i.e. η , we obtain the following joint posterior likeli- hood p ( A , z (1) , z (2) | β , α (1) , α (2) ) = p ( A | z (1) , z (2) , β ) p ( z (1) | α (1) ) p ( z (2) | α (2) ) = " Y `m Beta( N + `m + β , N − `m + β ) Beta( β , β ) # × " α (1) L Γ( α (1) ) Γ( I + α (1) ) L (1) Y ` =1 Γ( M (1) ` ) # · " α (2) L Γ( α (2) ) Γ( J + α (2) ) L (2) Y ` =1 Γ( M (1) ` ) # , where L ( k ) is the number of clusters, M ( k ) ` is the number of nodes in the ` th cluster of mode k , and N + `m and N − `m are the number of links and non-links between nodes in cluster ` and m . Using Bayes theorem the conditional distribution of the cluster assignment of a single node is giv en by p ( z (1) i = ` | A , z (1) \ z (1) i , α (1) , z (2) , β ) ∝ " Y m Beta( N + `m + β , N − `m + β ) Beta( N + \ i `m β , N −\ i `m β ) # q (1) p ( z (2) j = m | A , z (2) \ z (2) j , α (2) , z (1) , β ) ∝ " Y ` Beta( N + `m + β , N − `m + β ) Beta( N + \ j `m β , N −\ j `m β ) # q (2) , such that q ( k ) = w ( k ) ` if w ( k ) ` > 0 α ( k ) other w ise where w ` is the number of nodes already assigned to cluster ` and N + \ i `m and N −\ i `m denotes the number of links and non-links between nodes in cluster ` and cluster m not counting any links from node i of mode one ( j is similarly used to denote not count- ing any links from node j in mode two). Hence, a new clus- ter is generated according to the CRP with probability pro- portional to α ( k ) . By (Gibbs) sampling each node assign- ment of the ro w ( z (1) i ) and column ( z (2) j ) clusters in turn from the above posterior distribution we can infer z (1) and z (2) . The inference thereby also estimates from data the number of groups in each mode. W e note that this posterior likelihood can be efficiently cal- culated only considering the parts of the computation of N + `m and N − `m as well as e valuation of the Beta function that are affected by the considered assignment change. Notice, the expected v alue of the relations η gi ven the node assignments z (1) and z (2) is defined by h η lm i = N + lm + β N + lm N − lm +2 β . Apart from the above Gibbs sampling we also include so-called split- merge moves to improve the inference [14]. The split merge procedure was implemented with three restricted Gibbs sam- pling sweeps initialized by the sequential allocation proce- dure of [5]. Infinite relational model can be ef ficiently applied to large datasets using GPU computing [9], which could allow for real time applications. Here we set β = 1 , α (1) = l og ( I ) and α (2) = l og ( J ) , where I is the number of unique devices and J is the number of concerts. Robustness of the model W e use a number of measures to e v aluate the generalizability of the results and robustness of the model. The model es- timation procedure is run 110 times; each time 2.5% of the links and an equal number of non-links are treated as miss- ing, and then used for prediction. Firstly , normalized mutual information ( N M I ) is calculated between each pair of esti- mated models. Notice, 0 ≤ N M I ≤ 1 where 0 indicates no relationship between the tw o assignment matrices and 1 indi- cates a perfect correspondence [9]. The N M I scores for the concert assignment matrices av erage at 0.91 with the standard deviation of 0.03, while the score for the attendee assignment matrices has the mean of 0.45 and standard de viation of 0.02. The relati vely lo w N M I for the clusters of participants is re- lated to the fact that the model forces the assignment of each attendee to only one cluster . There can be many such assign- ments which are equally valid and thus with e very run of the model calculation the final participant groups vary . Since the assignments of concert clusters are significantly more stable, they will be in focus of further analysis. The predicti ve performance of the model is measured using the Area Under Curv e ( AU C ) of the Recei ver Operator Char - acteristic. AU C ev aluates how well the distributions of links and non-links are separated. Notice, 0 ≤ A UC ≤ 1 where 0.5 indicates separation not better than a random guess and 1 in- dicates a perfect separation. This measure is not vulnerable to class imbalance problem [28]. The av erage value of AU C for the 110 models is 0.81 with the standard deviation of 0.01. Finally , it is shown that after 150 iterations the log probabil- ity of the model con ver ges to a stable value across 110 runs, see Figure 6. It is important to emphasize that this stability is achie ved for the models trained on non-complete datasets (with each run 2.5% of links and the equal number of non- links were randomly discarded to be used for prediction). As shown in Figure 6 the model is robust to random initialization conditions as well as to data partially missing. −50 0 50 100 150 200 250 300 350 −1.6 −1.58 −1.56 −1.54 −1.52 −1.5 −1.48 −1.46 −1.44 −1.42 x 10 5 Iterations Log probability Min, max and mean values of log probability in 110 models Figure 6. Robustness: Independently of random initialization conditions and parts of the data used for cr oss-validation, the final value of log likelihood is stable across 110 trained models. Results After having pro ven the stability and generalizability of the used method, more models are calculated based on the full attendance table, without treating an y part of the data as miss- ing. The model with highest log probability is used for fur- ther in vestigation. As sho wn in Figure 7 this model groups 5127 people in 16 clusters and the 160 concerts in 25 clus- ters. The color coded value of η indicates the between-cluster link probability . In subsequent sections these values are in- terpreted and related to the av ailable meta information. Relating chosen concert clusters to av ailable metadata This section describes particular findings which further jus- tify the use of the chosen technology as well as pro vide addi- tional insight into the audience dynamics. Figures 8 - 12 sho w the distribution of concerts in the cre- ated clusters in relation to particular features. W e only con- sider first 10 clusters, containing between 24 (cluster 1) and 7 (cluster 10) concerts. This captures .725 of all concerts at the festiv al. W ith less concerts in clusters, it is increasingly hard to provide meaningful interpretation. W e use χ 2 test to compare the distributions in the clusters against the ov erall distribution to understand if the cluster Concert cluster nr User cluster nr η sorted 5 10 15 20 25 2 4 6 8 10 12 14 16 0.1 0.2 0.3 0.4 0.5 0.6 Figure 7. Between cluster link probability f or the estimated 16 clusters of attendees and 25 clusters of concerts, with clusters sorted by size in descending order . Preference regarding the choice of concerts can be observed, for example user cluster 5 is strongly associated with concert clusters 15, 16, 17, 20, 21 many people in cluster 5 attended concerts from these clusters. bears any meaning in relation to the particular feature. It should be ho wev er emphasized that the results are not rock- solid: with such a small number of concerts in the clusters, the results are more of a guidance in relating the clusters to av ailable metadata, rather than quantification of the findings. Still, we can note that the model produces interpretable re- sults, giving insight into the festi v al structure. Figures 8 - 11 show the distrib ution of concerts from clusters (1-10) across the av ailable meta information. The last col- umn in each figure indicates whether the distribution in that cluster is significantly dif ferent than the overal distribution: if yes, the cluster can be considered meaningful and explained by this feature. Figure 8 shows that the clusters are quite structured in terms of the dates. It is intuitiv ely understood - the concerts are attended by festiv al participants present at that particular day . As shown in Figure 9 only two clusters hav e distribution of genres dif ferent than overall distrib ution. These two clusters clearly point to electronic and folk/w orld genres. Figure 10 deals with the distribution of origin of the bands and shows three clusters with well-pronounced group- ing of the bands: Danish, Danish+Nordic, and USA. Figure 11 indicates that most the clusters display strong grouping of the bands based on the stage where they hap- pened. This may be related to the f act that concerts of similar type (if not necessary the same genre) are planned at the same stage; also, participants mobility is limited and a common be- havior of participants may be to stay at the same stage. The summary shown in Figure 12 makes it clear that the model produces clusters primarily based on the stages where they took place. Interestingly howe ver , we also see the influ- ence from the date of the concert, origin of the band, and the genre. Although the presented results are not very strong sta- tistically , we conclude that the model does produce clusters that relate to features of the concerts/bands. W e can describe the produced clusters (1-10) based on their relations to features: 1. Electronic concerts from the main days of the festi v al, hap- pening at the three stages (Cosmopol, Gloria, Odeon). 2. Danish bands playing in the warm-up days at Pavilion Ju- nior stage. 3. V arious genres from the first days of the main festiv al from three stages (Cosmopol, Gloria, Odeon). 4. Concerts from the first days of the main festiv al from P avil- ion stage. 5. Mainly concerts from the second (largest) day of the main festiv al from v arious stages. 6. Danish and other Nordic bands entirely from the warm-up days. 7. Folk and W orld bands from the main days of the festiv al, mostly from the smallest Gloria stage. 8. Bands from the US playing various genres on the last day at different stages. 9. V arious bands from the main days playing at Pavilion. 10. Concerts happening on the last day , possibly capturing one- day-ticket participants. &RQ FHUW'DWH ([S ODLQH G 2YHUD OO Figure 8. Distribution of concert dates in clusters. 7 out of 10 clusters hav e dates distribution significantly different fr om the overall. Between cluster link probability matrix As shown in Figure 7, there are se veral clusters of participants which show very specific preferences regarding the concerts. For e xample, participant group 5 (392 persons) only attended concerts from clusters 8, 10, 15, 16, 17, 20, 21. Nearly all of the concerts in these clusters took place on 3rd of July (last day of the festiv al with major bands performing). Participants from group 4 (475 persons) sho wed similar preference on that day b ut they also attended concerts on other days. Participant group 6 (352 persons) behaved like participant group 4 on %DQ G*HQUH (OHFW URQLF 5RFNS RS )RONZRUOG 5DS KLS KRS 3X QN PHWD OFRUH 2WK HU ([S ODLQH G 2YHUD OO Figure 9. Distribution of concert genres in clusters. T wo clusters have distribution significantly differ ent from the overall. %DQ G2U LJLQ 'HQPD UN 2WK HU1R UGLF 86 :HVWHUQ (X URS H 2WK HU ([S ODLQH G 2YHUD OO Figure 10. Band origin distribution in the clusters. Three clusters show significant grouping of bands: Denmark, Denmark + Other Nordic, and US. days other than 3rd of July b ut sho wed no interest in the con- certs on that day . Another participant group which sho ws a clear pattern in concert attendance is group 12 (91 persons) which has high link probabilities with clusters 4, 9, 16, and 24. It occurs that all of the concerts from these clusters took place at the Pa vilion stage. DISCUSSION Our study has demonstrated that disco very of Bluetooth de- vices at large-scale ev ents can provide interesting insights on participant behavior , group formation, and music pref- erences. The analysis of the collected Bluetooth data has demonstrated how the spatio-temporal data can re veal under - lying structures, when combined with additional conte xtual metadata describing the concerts and music genres. In the &RQ FHUW6WDJH $UHQ D &RVPRS RO *ORULD 2GH R Q 2UDQ JH 3D YLOLRQ 3D Y-X QLR U ([S ODLQH G 2YHUD OO Figure 11. Distribution of concert stages. W e can notice most of the clusters displaying significant grouping of the concerts according to the stage. 6XPPDU\ 2ULJLQ 6WD J H 'DWH *HQ UH Figure 12. Summary of the clusters and features where their distribution is significantly different fr om overall. present study we found that o ver the duration of the festiv al 6-7% of the participants appear to hav e Bluetooth switched on and in discoverable mode. Howe v er , based on the a vail- able data it is not possible to conclude on the reasons for this or the actual usage of Bluetooth. Moreover , we were able to observe the distrib ution of v endors of the disco vered de vices, but this distrib ution may not correspond directly to the actual distribution of mobile phones at the festiv al. In other words, the Bluetooth disco verable devices may not be representati ve, as for instance most Android-based smartphones only allo w time limited discoverability . As such we would expect to ob- serve fewer Android de vices in our dataset than there actually are at the Festiv al. The increasing adaptation of the Android smartphones could perhaps account for the lower penetration of Bluetooth-discov erable de vices in the crowd when com- pared to [30]. The spatio-temporal data allow for analysis of co-occurrences of participants, thereby gi ving indications of group formation among the festiv al participants. Furthermore, an adv antage of the Bluetooth methodology for doing participant census is that we learn the identity of devices. W ith this, it is not only possible to estimate the number of people present at dif ferent concerts but also determine patterns in the selection of dif- ferent concert across the entire festi v al, based on music pro- files determined from the spatio-temporal data. Therefore the analysis of this data hav e provided insights into the underly- ing structures, that is, the discov ery of groups with specific behaviors (music preferences) in terms of choosing concerts. Our analysis shows, that the allocation of artists in terms of stage and day of Festiv al when the y perform is a crucial issue. W e find that many people are not willing to move around the festiv al area - instead participants tend to spend much of their time around a particular stage. W e also show , that for those who do attend concerts at dif ferent locations, the country of origin of a band is an important factor when selecting the gigs. Furthermore, we do not find clusters of fans of particular mu- sic genre which means the participants are open to wards dif- ferent kinds of performances. Such information can be very valuable for the Festi v al organizers in the process of booking and allocating bands to stages. As the collected data was uploaded continuously by the scan- ners it was possible to create a near real-time visualization of the location of participants at the festiv al. The real time visualization displayed the acti vity as the number of unique devices seen in half-hour time windows in dif ferent zones of the festiv al and mapped this information onto a 3D model of the festiv al area. The rotating 3D model was displayed on a 46 inch monitor located in the so-called Social Zone of the festiv al and ran in continuous loops, displaying speed up of activities from the first day until the current moment, see Fig- ure 13. This way of visualizing the activity data allowed for high dynamic of normally slower changing patterns, an easy ov erview of the festi val activity so far , and the possibility of incorporating past data that was only uploaded later (in case scanners did not have a network connection). This setup also allowed us to test the feasibility of obtaining the Bluetooth data in real time using the regular cellular 3G network as a way to b uild end-user applications on top of the system. At the festiv al we were able to observ e participants as they experienced the visualization of the Bluetooth data. Ini- tially , they were attracted by the animation, bright colors, and high dynamics, then they subsequently understood what was shown in the visualization. In the setup that was deployed at this festiv al, the interaction through the 3D visualization of the Bluetooth de vices in the festiv al areas was indirect. The analysis of the data has demonstrated that ev en more sophis- ticated participant feedback could be included in such a visu- alization – ev en in real-time. Furthermore, it could allow for more direct interaction through mobile social apps on partici- pant smartphones. For instance to locate groups, participants, or relev ant e vents, as the y are happening at the festiv al. Figure 13. The 3D real time animated visualization shown to partic- ipants on a large display situated inside a cubic installation that also hosted a silent disco. The 3D model of the festival area was continuously rotating and replaying the visualization of the collected Bluetooth data from the beginning of the festiv al up to the current moment. As mentioned in the introduction, sensor frameworks for smartphones ha ve receiv ed increased attention recently . Fu- ture studies could further improve the data collection at a large scale e vent through the richer datasets that can be ob- tained from smartphone embedded sensors. By distrib uting the scanning on multiple client de vices the inherent limitation of the present short-range proximity based probing approach may be addressed. In the current setup it is challenging to cov er a lar ge physical area in addition to the set of challenges in deploying the system – including limited av ailability of power and network in the festi v al settings. Howe ver , a chal- lenge in the distributed scanning approach is the deployment of a suf ficient number of client de vices in order to obtain suf- ficient continuous coverage of the area. The initial steps in the direction of distrib uted Bluetooth scanning were taken by Stopczy ´ nski et al. [27]. W e believ e that the results that can be obtained from this Bluetooth probing methodology may also be useful on mul- tiple lev els for the festi val organizers. The data can help the organizers in assessing participant reactions to the music se- lection and distribution ov er the different stages. A more de- tailed analysis of participant mobility may also help the or- ganizers in planning the layout of the festi v al area for future festiv als. CONCLUSIONS In this paper we have shown that proximity-based Bluetooth sensing is a useful method for obtaining spatio-temporal data in a large-scale e vent setting. It is possible to analyze the data, accounting for sparsity and missing data using mathe- matical models and discov er meaningful patterns of partici- pant behavior , including mobility , group formation, and mu- sic preferences. W e have also demonstrated the feasibility of capturing Bluetooth data from a lar ge cro wd and visualize the resulting spatio-temporal data in real time. Finally , we hav e proposed ho w the Bluetooth probing methodology may serv e as a frame work for creating future mobile social interaction applications for such large-scale e vents. A CKNO WLEDGMENT W e would lik e to thank the Roskilde Festi v al organizers. Also thanks to Nokia for partly sponsoring the mobile phones used as part of the study . Finally thanks to Krzysztof Siejko wski, Marcin Ignac, and S ø ren Rosenbak. REFERENCES 1. Aharony , N., Pan, W ., Ip, C., Khayal, I., and Pentland, A. Social fmri: In vestigating and shaping social mechanisms in the real world. P ervasive and Mobile Computing (2011). 2. Bensky , A. W ir eless positioning technolo gies and applications . Artech House, Inc., 2007. 3. Bullock, D., Haseman, R., W asson, J., and Spitler , R. Automated measurement of wait times at airport security . Journal of the T ransportation Resear c h Board 2177 , -1 (2010), 60–68. 4. Crandall, D. J., Backstrom, L., Cosley , D., Suri, S., Huttenlocher , D., and Kleinberg, J. Inferring social ties from geographic coincidences. Pr oc. of the National Academy of Sciences 107 , 52 (2010), 22436–22441. 5. Dahl, D. B. Sequentially-allocated merge-split sampler for conjugate and nonconjugate Dirichlet process mixture models. T ech. rep., T exas A&M Univ ersity , 2005. 6. Eagle, N., and Pentland, A. Reality mining: sensing comple x social systems. P ersonal and Ubiquitous Computing 10 , 4 (2006), 255–268. 7. Gonzalez, M., Hidalgo, C., and Barabasi, A. Understanding individual human mobility patterns. Natur e 453 , 7196 (2008), 779–782. 8. Hansen, J., Alapetite, A., Andersen, H., Malmborg, L., and Thommesen, J. Location-based services and priv acy in airports. Human-Computer Interaction–INTERA CT 2009 (2009), 168–181. 9. Hansen, T ., Morup, M., and Hansen, L. Non-parametric co-clustering of large scale sparse bipartite networks on the gpu. In IEEE Int. W orkshop on Machine Learning for Signal Pr ocessing (MLSP) (2011), 1 –6. 10. Haseman, R., W asson, J., and Bullock, D. Real time measurement of work zone trav el time delay and ev aluation metrics using bluetooth probe tracking. Journal of the T ransportation Researc h Boar d (2010). 11. Hay , S., and Harle, R. Bluetooth tracking without discoverability . Location and Context A wareness (2009), 120–137. 12. Hui, P ., Chaintreau, A., Scott, J., Gass, R., Crowcroft, J., and Diot, C. Pocket switched networks and human mobility in conference en vironments. In Pr oceedings of the 2005 ACM SIGCOMM workshop on Delay-tolerant networking , A CM (2005), 244–251. 13. IEEE. Public OUI and Company ID Assignments. http://standards.ieee.org/develop/regauth/oui/ . 14. Jain, S., and Neal, R. M. A split-merge markov chain monte carlo procedure for the dirichlet process mixture model. Journal of Computational and Graphical Statistics 13 , 1 (2004), 158–182. 15. Jensen, B., Larsen, J., Jensen, K., Larsen, J., and Hansen, L. Estimating human predictability from mobile sensor data. In IEEE Int. W orkshop on Machine Learning for Signal Pr ocessing (MLSP) (2010), 196–201. 16. Kelly , D. Minimal Infrastructur e Radio F requency Home Localisation Systems . PhD thesis, National Univ ersity of Ireland, 2010. 17. Kemp, C., T enenbaum, J. B., Grif fiths, T . L., Y amada, T ., and Ueda, N. Learning systems of concepts with an infinite relational model. In Pr oc. of the National AAAI Conf. on Artificial Intelligence (2006). 18. Kiukkonen, N., Blom, J., Dousse, O., Gatica-Perez, D., and Laurila, J. T owards rich mobile phone datasets: Lausanne data collection campaign. Pr oc. ICPS (2010). 19. Kostakos, V ., O’Neill, E., Penn, A., Roussos, G., and Papadongonas, D. Brief encounters: Sensing, modeling and visualizing urban mobility and copresence networks. A CM T rans. Comput.-Hum. Inter act. 17 , 1 (Apr . 2010), 2:1–2:38. 20. Maslov , S., Sneppen, K., and Zaliznyak, A. Detection of topological patterns in complex networks: correlation profile of the internet. Physica A: Statistical Mechanics and its Applications 333 (2004), 529–540. 21. Montoliu, R., and Gatica-Perez, D. Discovering human places of interest from multimodal mobile phone data. In Pr oceedings of the 9th International Confer ence on Mobile and Ubiquitous Multimedia , A CM (2010), 12. 22. O’Neill, E., Kostakos, V ., Kindberg, T ., Schiek, A. F . g., Penn, A., Fraser , D. S., and Jones, T . Instrumenting the city: de veloping methods for observing and understanding the digital cityscape. In Pr oceedings of the 8th international confer ence on Ubiquitous Computing , UbiComp’06, Springer-V erlag (Berlin, Heidelberg, 2006), 315–332. 23. Peterson, B. S., Baldwin, R. O., and Kharoufeh, J. P . Bluetooth inquiry time characterization and selection. IEEE T ransactions on Mobile Computing 5 , 9 (2006), 1173–1187. 24. Pitman, J. Combinatorial stochastic processes , vol. 1875. Springer-V erlag, 2006. 25. Song, C., Qu, Z., Blumm, N., and Barab ´ asi, A. Limits of predictability in human mobility . Science 327 , 5968 (2010), 1018. 26. Stange, H., Liebig, T ., Hecker , D., Andrienko, G., and Andrienko, N. Analytical workflo w of monitoring human mobility in big event settings using bluetooth. In Pr oc. of the 3r d ACM SIGSP ATIAL Int.l W orkshop on Indoor Spatial A war eness , ACM (2011), 51–58. 27. Stopczynski, A., Larsen, J., Lehmann, S., L., D., and M., F . Participatory Bluetooth Sensing: A Method for Acquiring Spatio-Temporal Data about Participant Mobility and Interactions at Large Scale Ev ents. In P ervasive Computing and Communications W orkshops, 2013. P erCom W orkshops ’13 (2013). 28. T an, P .-N., Steinbach, M., and Kumar , V . Introduction to Data Mining , (F irst Edition) . Addison-W esley Longman Publishing Co., Inc., Boston, MA, USA, 2006. 29. V ersichele, M., Delafontaine, M., Neutens, T ., and V an de W eghe, N. Potential and implications of bluetooth proximity-based tracking in moving object research. In 1st Int. workshop on movement pattern analysis (MP A) in conj. with the 6th Int. conf. on Geographic Information Science (2010). 30. V ersichele, M., Neutens, T ., Delafontaine, M., and V an de W eghe, N. The use of bluetooth for analysing spatiotemporal dynamics of human movement at mass e vents: A case study of the ghent festi vities. Applied Geography 32 , 2 (2012), 208–220. 31. Xu, Z., Tresp, V ., Y u, K., and Kriegel, H.-P . Learning infinite hidden relational models. Uncertainty in Artificial Intelligence (U AI2006) (2006).
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment