# Locating Charging Stations for Electric Vehicles (Spatial Clustering) – Part 3

This post is a follow up of the previous post of this series.

One can refer to the Post 1 and Post 2 for overall understanding of the approach.

**Useful links** :

Dataset used : http://archive.ics.uci.edu/ml/datasets/Taxi+Service+Trajectory+-+Prediction+Challenge%2C+ECML+PKDD+2015

Code files repository : https://github.com/palashgoyal1/Taxi_Service_Trajectory

The Part-3 of this blog series contains the Points on the *Implementation of the Improvements* mentioned in the previous post of this blog series and the awesome set of the *References* 🙂

Below are the links to the previous sections of this series.

Contents

### Data Pre-processing

### Algorithm Selection, Advantages and Challenges

### Improvements in the Selected Algorithm

### Further scope of improvement

(For better understanding of the reader, I have included the required code snippets wherever required, and have shared the desired file link present on the github repository. If required, feel free to comment below and I would add the relevant snippet in the post.)

### Improvements implementation

The cluster selection on which the sub-clusters have to be defined (or not to be defined) could be done by making the **Heatmap** of the present clusters. We can also make use of hierarchical clustering over the present clusters.

This approach has been implemented using the aggregation of the coordinates so as to reduce the dataset size.

Below is the example of the file ‘B_A_start.csv’ : Starting points of all the trips for the category : CALL_TYPE=’B’ and DAY_TYPE=’A’. The file has 134,582 unique coordinates of the starting points.

(I have done it for one file type, starting points for ‘B_A_start’ combination, same process could be replicated to data points of other types as well.)

1 2 3 |
# Plotting the above coordinates ba_start_spdf <- SpatialPointsDataFrame(coords = ba_start__, data = ba_start_, proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")) plot(ba_start_spdf, pch=1, cex=0.5) |

. Start Coordinates Point Density Plot

__Density Plot__

To identify the clusters from the above distribution of the starting coordinates of the cabs, we convert these to continuous field of point density. (Plot above)

1 2 3 4 |
# Convert points to pp class sp_obj <- as(SpatialPoints(ba_start_spdf), "ppp") dense_obj <- density(sp_obj, adjust = 0.2) # create density object of class 'im' (Image) plot(dense_obj) |

The density plot shows that there are mainly two clusters for the above distribution of the coordinates.

** Contours** The density information could also be represented with the contours of almost equal value. The main aim is to save the high density location in a spatial data format and raster format, and then to extract the polygons, which will have the aggregated points.

1 2 3 |
# Density information via contour plots contour(density(sp_obj, adjust = 0.15), nlevels = 4) # adjust is used for smoothing bandwidth. Higher the value for 'adjust' -> more zoomed in |

. Equal Value Contours Contour Lines

From the above contour plots, we can extract the polygons of multiple clusters which will have the aggregated data points. These are the density polygons, extracted from the SpatialLinesDataFrame object (refer code __Cluster_aggregation.R__ for better explanation)

1 2 3 4 5 |
do_sgdf <- as(dense_obj, "SpatialGridDataFrame") # density object to spatialGridDF conversion im_sgdf <- as.image.SpatialGridDataFrame(do_sgdf) # image conversion con_lines <- contourLines(im_sgdf, nlevels = 9) # contour creation con_sldf <- ContourLines2SLDF(con_lines, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")) # SpatialLinesDataFrame conversion plot(con_sldf, col = terrain.colors(8)) |

. Density Polygons

Now aggregate the coordinates within each high density zone, or summarize the data within each of the polygon. These are the points which are most densely distributed. This dataset could be used for creating further sub-clusters if the charging station service capacity or time constraints are not met.

As the cluster points are densely distributed, so the left over points are shown in the next graph. These data points or coordinates could be dealt separately by creating separate charging stations.

To check the efficiency of this process, i.e., we can check the % of coordinates belonging to the clusters which have been removed : count(coord_in_cluster)/count(coord_overall) = 52.497%

So, we have removed around 52.5% of the coordinates and now the **apClusterK** algorithm could be run on the leftover coordinates, and also on the two clusters if any sub-clusters are to be made. It shows that almost 50% data is already

clustered, and we have to find proper distribution of the rest of the half data for optimal no. of charging stations.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
multi_cluster <- gPolygonize(con_sldf[5,]) garea_mc <- gArea(multi_cluster, byid = T)/10000 multi_cluster <- SpatialPolygonsDataFrame(multi_cluster, data = data.frame(garea_mc), match.ID = F) plot(multi_cluster) ... ... # Within and outside coordinates coord_in <- ba_start_spdf[ba_aggr, ] # No. of points inside the clusters coord_out <- ba_start_spdf[!row.names(ba_start_spdf) %in% row.names(coord_in), ] # No. of points outside the clusters plot(coord_out, pch=1, cex=0.5) # Sparse distribution of coordinates (or low density) plot(ba_aggr, border = "red", lwd = 3, add = T) nrow(coord_in)/nrow(ba_start_spdf) # 0.52497 |

. High Density Clusters zone Data points in Low Density zones

### References

##### Papers

##### Summaries

https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf

##### Seminars

http://blog.danielemaasit.com/2016/02/15/webinar-getting-started-with-spatial-data-analysis-with-r/

**Packages:**

- APCluster

https://cran.r-project.org/web/packages/apcluster/vignettes/apcluster.pdfhttp://www.bioinf.jku.at/software/apcluster/APCluster-Webinar.pdf

- spatstat

https://cran.r-project.org/web/packages/spatstat/spatstat.pdf

https://cran.r-project.org/web/packages/spatstat/vignettes/getstart.pdf - SPODT

https://www.jstatsoft.org/article/view/v063i16/v63i16.pdf

http://horizon.documentation.ird.fr/exl-doc/pleins_textes/divers16-07/010067259.pdf - sp

https://cran.r-project.org/web/packages/sp/vignettes/intro_sp.pdf - rgdal

https://cran.r-project.org/web/packages/rgdal/index.html

https://cran.r-project.org/web/packages/rgdal/vignettes/OGR_shape_encoding.pdf - raster

https://cran.r-project.org/web/packages/raster/index.html

https://cran.r-project.org/web/packages/raster/vignettes/Raster.pdf - rgeos

https://cran.r-project.org/web/packages/rgeos/index.html - maptools

https://cran.r-project.org/web/packages/maptools/index.html

https://cran.r-project.org/web/packages/maptools/vignettes/combine_maptools.pdf

This brings to the end of the blog series on ‘Locating Charging stations for the Electric Vehicles’. I hope you all liked it.

Please feel free to leave comment/suggestions in the comment box below!

Cheers!! Long live data!!

– Palash Goyal

## Recent Comments