Tuesday, 1 July 2014

Using Twitter to Estimate the Dynamics of Activities and Travel Demand

Recently a couple of colleagues (Ying Jin and Steve Denman) and I have been exploring the potential of geo-located tweets in understanding spatiotemporal urban dynamics. The analysis is based on a passively generated dataset containing 25 millions geo-located tweets collected over a period of 7 months (10/2013 to 05/2014), and covers an area that extends beyond London and covers most of the Greater Southeast of England. 



The analysis is performed against detailed activity datasets containing information about the location of domestic and non-domestic premises, their size and value. We use two activity datasets; the first one contains all the non-domestic hereditaments in England in 2010 (around 7 millions), their use types, floorspace sizes and rent rates; the second one contains number of households and persons and average dwelling price by type (2013, where available) at the full postcode level (200 thousands postcodes in the London Postal District).   

A cross-cluster analysis of activity-points against tweets is used to calculate the reach and accessibility measures for each activity point at different times of day and distance ranges. This shows how tweets concentrate around particular activities, during different times in the day. A relatively broad range of distance bands (50m to 1km) highlights the significance (primary or secondary role) of each activity type for each location/time combination. 

Cross cluster coefficient between Twitter presence and Workplace employment (x-axis: distance band, y-axis: time of day, colour: clustering level proportionally to random distribution)




On the other hand, a cross-cluster analysis of tweets against activity points confirms the findings of the former analysis by showing how the reach of floorspace by activity type varies with time and space. The analysis results in a probability-based one-to-many link-map between Twitter instances (tweets) and activities. As such, each tweet is associated with a set of activities. 



Geolocated Tweets - Weekday, London (Red: Offices, Yellow: Retail, Green: Leisure, Cyan: Rail)

This proportional association is used to estimate spatiotemporal distributions of tweets for each activity type. These are used towards three, largely unexplored, directions:


1. The spatial distribution of tweets associated with domestic spaces is correlated against a series of socio-demographic variables of the population (from Census 2011) to establish correlations between Twitter use and population profiles. 



Correlation between number of tweets and persons by Socioeconomic Class (Lower Super-output Level, 2011 Census)
Correlation between number of tweets and persons by Age Group (Lower Super-output Level, 2011 Census)

2. The temporal distribution of tweets associated with travel (air, road network and surface rail) are compared with data from the National Travel Survey (NTS 2008-2012) to explore the extend in which biases associate with the temporal intensity of Twitter usage is reflected in the tweets associated with the particular activity and the respective impact on the efficacy of the dataset in establishing temporal dimensions of travel demand. 


Comparison between the temporal distribution of tweets associated with surface rail and respective data from the National Travel Survey (x-axis: time of day, y-axis: proportional concentration, 100 = day average)
3. The spatiotemporal distribution of tweets around retail and office floorspace is correlated against floorspace valuation (from the Valuation Office Agency, 2010) to explore the relation between presence and rent-rate; i.e. the effect that the number of people - at various distances from and times of day - has on the estimated potential for generation of revenue. 


Correlation between number of Twitter Users and Rent-rate for Retail premises (x-axis: time of day, y-axis: correlation). Thickness of line represents distance from retail premise (e.g. 100 metres represents represents correlation between rent rate of shop and number of Twitter users within 100 metres from the shop)
This exercise is based on the combined analysis of information about the dynamic location of people and the type of activities they either carry out or are available in the vicinity. It makes it possible to quantify the effect of the location of residential and non-residential buildings on the spatiotemporal patterns of the physical presence of their users; and to estimate the impact of changes in their location on these patterns. It also offers insight into how floorspace rent-rates for different uses are affected by the number of people in the area (reach and accessibility) at different times in the day.


Geolocated Tweets - Weekday, London (Red: Non-domestic, Yellow: Domestic, Cyan: Transport, Green: Other)

Wednesday, 5 December 2012

Spatiotemporal patterns and the dynamics of the city

One more post on GPS traces from taxis, this time from San Francisco. The video follows the origins (green dots) and destinations (red dots) of passenger trips made by taxi in San Francisco. It is based on GPS traces from 500 taxis over a full month. 

Each record contains information about location, timestamp and, crucially, whether the taxi is occupied or not. This makes it possible to mine approximations of passenger trips and therefore, their origins and destinations. Every new green dot lighting up represents the origin of a passenger trip made by taxi (the dot subsequently fades out over the next 15 minutes). Every red dot lighting up represents arrival to a destination by taxi. 


The video illustrates the tempo of the city. During the course of 24 hours (midnight to midnight), the locations of the origins and the destinations change. Areas gradually shift from being collectors of trips to being distributors and vice versa. The spatial concentration of destinations (red dots) between 7.00 and 9.00 in the business district is followed by dispersion in the evening. However, this expected pattern is not the only one observed. Other areas demonstrate their unique time profiles, which, in some cases, are dominated by relatively short peaks. 

GPS traces make it possible to identify spatiotemporal patterns that where impossible to observe until recently. As a next step, the temporal profile of each area will be compared with its land use characteristics in order to identify correlations between flows and activities.

Friday, 30 November 2012

3D simulation with JMonkeyEngine 3.0

I have been looking for a reasonable high level 3D library for Java for some time. Recently I have been using JME 3.0 (www.jmonkeyengine.org) and it seems to be doing the trick. Here are a couple of pics, one from an accessibility model of Cambridge, UK and one from an on-the-fly pedestrian simulation.


The accessibility model has been produced by a simple tool that takes the a shapefile (.shp) or other GIS polygon file and extrudes it polygon based on one of its attributes. The colour of each resulting prism may represent the same or a different attribute.


This second one is from a thin 3D viewer sitting on top of a pedestrian simulation suite. The 3D viewer receives information about the position, speed and direction of each pedestrian from the pedestrian simulation and updates the positions and kinematics of the skeletons accordingly. The colour of each avatar may represent any of the metrics associated with their motion (e.g. speed, estimated distance to destination, local density, delay etc.).

GPS traces of Taxis in Beijing

Last couple of weeks I have been playing around with a dataset of taxi GPS traces from Beijing. This includes positions and timestamps from 10.000 individual taxis for a week. 

Here is a map of speeds (red low, green high). It has been created by calculating the distance between consecutive pairs of traces from the same vehicle. When the time difference between a pair of such timestamps is less than 60 seconds, they are used to estimate a value instance based on their euclidean distance and time difference. 


Datasets like this one are becoming increasingly available and they provide new information-rich ways to explore the spatial and temporal dynamics of urban systems. We are planning to evaluate this specific dataset, using spatiotemporal autocorrelation and if the results are encouraging we will be using it to estimate the speeds of the links of a transport model of Beijing.

I will come back to this to discuss the autocorrelation method we devised and share some results when we have made some more progress.