I’ve recently been experimenting with Data Visualisation in R. As part of that I’ve put together a little bit of (probably error ridden and redundant) code to help mapping of Australia.
First, my code is built on a foundation from [Luke’s guide to building maps of Australia in R](http://lukesingham.com/map-of-australia-using-osm-psma-and-shiny/), and [this guide to making pretty maps in R](https://timogrossenbacher.ch/2016/12/beautiful-thematic-maps-with-ggplot2-only/).
The problem is that a lot of datasets, particularly administrative ones, come with postcode as the only geographic information. And postcodes aren’t a very useful geographic structure – there’s no defined aggregation structure, they’re inconsistent in size, and heavily dependent on history.
For instance, a postcode level map of Australia looks like this:
Way too messy to be useful.
The ABS has a [nice set of statistical geography](http://www.abs.gov.au/AUSSTATS/abs@.nsf/Lookup/1270.0.55.001Main+Features1July%202016?OpenDocument) that will let me fix this problem by changing the aggregation level, but first I need to convert the data into another file.
Again, fortunately the ABS publishes concordances between postcodes and the Statistical Geography, so all I need to do is take those concordances and use them to mangle my data lightly. First, I used those concordances to make some CSV input files:
Postcode to Statistical Area 2 level (2011)
Concordance from SA2 (2011) to SA2(2016)
Statistical Geography hierarchy to convert to SA3 and SA4
Then a little R coding. First convert from Postcode to SA2 (2011). SA level 2 is around the same level of detail of postcodes, and so the conversions won’t lose a lot of accuracy.And then convert to 2016 and add the rest of the geography:
## Convert Postcode level data to ABS Statistical Geography heirarchy
## Quick hack job, January 2017
## Robert Ewing
require(dplyr)
## Read in original data file, clean as needed.
## This data file is expected to have a variable 'post' for the postcode,
## and a data series called 'smsf' for the numbers.
data_PCODE <- read.csv("SMSF2.csv", stringsAsFactors = FALSE)
## Change this line depending on your data series.
## This code is designed to read in only one series. If you need more than one,
## you'll need to change the Aggregate functions.
## Change this line to reflect the name of the data series in your file
data_PCODE$x <- as.numeric(data_PCODE$smsf)
data_PCODE$smsf[is.na(data_PCODE$x)] <- 0
data_PCODE$POA_CODE16 <- sprintf("%04d", data_PCODE$post)
## Read in concordance from Postcode to SA2 (2011)
concordance <- read.csv("PCODE_SA2.csv", stringsAsFactors = FALSE)
concordance$POA_CODE16 <- sprintf("%04d", concordance$POSTCODE)
## Join the files
working_data <- concordance %>% left_join(data_PCODE)
working_data$x[is.na(working_data$x)] <- 0
## Adjust for partial coverage ratios
working_data$x_adj = working_data$x * working_data$Ratio
## And produce the SA2_2011 version of the dataset. Data is in x.
data_SA2_2011 <- aggregate(working_data$x_adj,list(SA2_MAINCODE_2011 = working_data$SA2_MAINCODE_2011),sum)
## Now read in the concordance from SA2_2011 to SA2_2016
concordance <- read.csv("SA2_2011_2016.csv", stringsAsFactors = FALSE)
## Join it.
working_data <- concordance %>% left_join(data_SA2_2011)
working_data$x[is.na(working_data$x)] <- 0
## Adjust for partial coverage ratios
working_data$x_adj = working_data$x * working_data$Ratio
## And produce aggregate in SA2_2016
data_SA2_2016 <- aggregate(working_data$x_adj,list(SA2_MAINCODE_2016 = working_data$SA2_MAINCODE_2016),sum)
## and finally join the SA2 with the rest of the hierarchy to allow on the fly adjustment.
statgeo <- read.csv("SA2_3_4.csv", stringsAsFactors = FALSE)
data_SA2_2016 <- data_SA2_2016 %>% left_join(statgeo)
The end result gives you a data set that can be converted to a higher level. Here's the chart above, but this time using SA3 rather than postcodes: