Locating Charging Stations for Electric Vehicles (Spatial Clustering) – Part 3
This post is a follow up of the previous post of this series.
One can refer to the Post 1 and Post 2 for overall understanding of the approach.
Useful links :
Dataset used : http://archive.ics.uci.edu/ml/datasets/Taxi+Service+Trajectory+-+Prediction+Challenge%2C+ECML+PKDD+2015
Code files repository : https://github.com/palashgoyal1/Taxi_Service_Trajectory
The Part-3 of this blog series contains the Points on the Implementation of the Improvements mentioned in the previous post of this blog series and the awesome set of the References 🙂
Below are the links to the previous sections of this series.
(For better understanding of the reader, I have included the required code snippets wherever required, and have shared the desired file link present on the github repository. If required, feel free to comment below and I would add the relevant snippet in the post.)
The cluster selection on which the sub-clusters have to be defined (or not to be defined) could be done by making the Heatmap of the present clusters. We can also make use of hierarchical clustering over the present clusters.
This approach has been implemented using the aggregation of the coordinates so as to reduce the dataset size.
Below is the example of the file ‘B_A_start.csv’ : Starting points of all the trips for the category : CALL_TYPE=’B’ and DAY_TYPE=’A’. The file has 134,582 unique coordinates of the starting points.
(I have done it for one file type, starting points for ‘B_A_start’ combination, same process could be replicated to data points of other types as well.)
# Plotting the above coordinates
ba_start_spdf <- SpatialPointsDataFrame(coords = ba_start__, data = ba_start_, proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
plot(ba_start_spdf, pch=1, cex=0.5)
. Start Coordinates Point Density Plot
To identify the clusters from the above distribution of the starting coordinates of the cabs, we convert these to continuous field of point density. (Plot above)
# Convert points to pp class
sp_obj <- as(SpatialPoints(ba_start_spdf), "ppp")
dense_obj <- density(sp_obj, adjust = 0.2) # create density object of class 'im' (Image)
The density plot shows that there are mainly two clusters for the above distribution of the coordinates.
The density information could also be represented with the contours of almost equal value. The main aim is to save the high density location in a spatial data format and raster format, and then to extract the polygons, which will have the aggregated points.
# Density information via contour plots
contour(density(sp_obj, adjust = 0.15), nlevels = 4)
# adjust is used for smoothing bandwidth. Higher the value for 'adjust' -> more zoomed in
. Equal Value Contours Contour Lines
From the above contour plots, we can extract the polygons of multiple clusters which will have the aggregated data points. These are the density polygons, extracted from the SpatialLinesDataFrame object (refer code Cluster_aggregation.R for better explanation)
do_sgdf <- as(dense_obj, "SpatialGridDataFrame") # density object to spatialGridDF conversion
im_sgdf <- as.image.SpatialGridDataFrame(do_sgdf) # image conversion
con_lines <- contourLines(im_sgdf, nlevels = 9) # contour creation
con_sldf <- ContourLines2SLDF(con_lines, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")) # SpatialLinesDataFrame conversion
plot(con_sldf, col = terrain.colors(8))
. Density Polygons
Now aggregate the coordinates within each high density zone, or summarize the data within each of the polygon. These are the points which are most densely distributed. This dataset could be used for creating further sub-clusters if the charging station service capacity or time constraints are not met.
As the cluster points are densely distributed, so the left over points are shown in the next graph. These data points or coordinates could be dealt separately by creating separate charging stations.
To check the efficiency of this process, i.e., we can check the % of coordinates belonging to the clusters which have been removed : count(coord_in_cluster)/count(coord_overall) = 52.497%
So, we have removed around 52.5% of the coordinates and now the apClusterK algorithm could be run on the leftover coordinates, and also on the two clusters if any sub-clusters are to be made. It shows that almost 50% data is already
clustered, and we have to find proper distribution of the rest of the half data for optimal no. of charging stations.
multi_cluster <- gPolygonize(con_sldf[5,])
garea_mc <- gArea(multi_cluster, byid = T)/10000
multi_cluster <- SpatialPolygonsDataFrame(multi_cluster, data = data.frame(garea_mc), match.ID = F)
# Within and outside coordinates
coord_in <- ba_start_spdf[ba_aggr, ]
# No. of points inside the clusters
coord_out <- ba_start_spdf[!row.names(ba_start_spdf) %in% row.names(coord_in), ]
# No. of points outside the clusters
plot(coord_out, pch=1, cex=0.5)
# Sparse distribution of coordinates (or low density)
plot(ba_aggr, border = "red", lwd = 3, add = T)
. High Density Clusters zone Data points in Low Density zones
This brings to the end of the blog series on ‘Locating Charging stations for the Electric Vehicles’. I hope you all liked it.
Please feel free to leave comment/suggestions in the comment box below!
Cheers!! Long live data!!
– Palash Goyal