Tuesday 1 July 2014

Using Twitter to Estimate the Dynamics of Activities and Travel Demand

Recently a couple of colleagues (Ying Jin and Steve Denman) and I have been exploring the potential of geo-located tweets in understanding spatiotemporal urban dynamics. The analysis is based on a passively generated dataset containing 25 millions geo-located tweets collected over a period of 7 months (10/2013 to 05/2014), and covers an area that extends beyond London and covers most of the Greater Southeast of England. 



The analysis is performed against detailed activity datasets containing information about the location of domestic and non-domestic premises, their size and value. We use two activity datasets; the first one contains all the non-domestic hereditaments in England in 2010 (around 7 millions), their use types, floorspace sizes and rent rates; the second one contains number of households and persons and average dwelling price by type (2013, where available) at the full postcode level (200 thousands postcodes in the London Postal District).   

A cross-cluster analysis of activity-points against tweets is used to calculate the reach and accessibility measures for each activity point at different times of day and distance ranges. This shows how tweets concentrate around particular activities, during different times in the day. A relatively broad range of distance bands (50m to 1km) highlights the significance (primary or secondary role) of each activity type for each location/time combination. 

Cross cluster coefficient between Twitter presence and Workplace employment (x-axis: distance band, y-axis: time of day, colour: clustering level proportionally to random distribution)




On the other hand, a cross-cluster analysis of tweets against activity points confirms the findings of the former analysis by showing how the reach of floorspace by activity type varies with time and space. The analysis results in a probability-based one-to-many link-map between Twitter instances (tweets) and activities. As such, each tweet is associated with a set of activities. 



Geolocated Tweets - Weekday, London (Red: Offices, Yellow: Retail, Green: Leisure, Cyan: Rail)

This proportional association is used to estimate spatiotemporal distributions of tweets for each activity type. These are used towards three, largely unexplored, directions:


1. The spatial distribution of tweets associated with domestic spaces is correlated against a series of socio-demographic variables of the population (from Census 2011) to establish correlations between Twitter use and population profiles. 



Correlation between number of tweets and persons by Socioeconomic Class (Lower Super-output Level, 2011 Census)
Correlation between number of tweets and persons by Age Group (Lower Super-output Level, 2011 Census)

2. The temporal distribution of tweets associated with travel (air, road network and surface rail) are compared with data from the National Travel Survey (NTS 2008-2012) to explore the extend in which biases associate with the temporal intensity of Twitter usage is reflected in the tweets associated with the particular activity and the respective impact on the efficacy of the dataset in establishing temporal dimensions of travel demand. 


Comparison between the temporal distribution of tweets associated with surface rail and respective data from the National Travel Survey (x-axis: time of day, y-axis: proportional concentration, 100 = day average)
3. The spatiotemporal distribution of tweets around retail and office floorspace is correlated against floorspace valuation (from the Valuation Office Agency, 2010) to explore the relation between presence and rent-rate; i.e. the effect that the number of people - at various distances from and times of day - has on the estimated potential for generation of revenue. 


Correlation between number of Twitter Users and Rent-rate for Retail premises (x-axis: time of day, y-axis: correlation). Thickness of line represents distance from retail premise (e.g. 100 metres represents represents correlation between rent rate of shop and number of Twitter users within 100 metres from the shop)
This exercise is based on the combined analysis of information about the dynamic location of people and the type of activities they either carry out or are available in the vicinity. It makes it possible to quantify the effect of the location of residential and non-residential buildings on the spatiotemporal patterns of the physical presence of their users; and to estimate the impact of changes in their location on these patterns. It also offers insight into how floorspace rent-rates for different uses are affected by the number of people in the area (reach and accessibility) at different times in the day.


Geolocated Tweets - Weekday, London (Red: Non-domestic, Yellow: Domestic, Cyan: Transport, Green: Other)