GeoCommons
Mapping Minnesota's Accident Hot-spots
Wikimedia Commons
Exploring the Dynamics of Traffic Accidents in Minnesota
Spatial Data Scientist: Luke Zaruba
Degree Program: Masters of Geographic Information Science
Abstract
This project details the development and implementation of a near real-time geospatial data pipeline of fatal and serious traffic accidents across the State of Minnesota. To accomplish this, data is scraped and preprocessed via a serverless, automated pipeline on the cloud, before various spatial and temporal analytics are run on the resulting data to provide insights into crash hotspots and clusters across both space and time. The novelty of this pipeline lies in the creation of an open, near-real-time, geospatial web application to view serious traffic crashes in Minnesota, as well as investigations made into the integration of generative artificial intelligence (AI) into the geocoding process and the use of simple spatial analyses to study the stability of point clustering footprints over time. This project shows several insights into the historical patterns of car crashes across the State, specifically concerning: 1) the need for advances in transportation engineering and increased enforcement and emergency services in areas of high concern; and 2) the association between the presence of road construction projects and changes in accident trends.
Fig 1. Data pipeline for near real-time traffic accident prediction.
Data and Pipeline
The data used in this project comes from the Minnesota State Patrol Crash Updates site, which contains records of all serious and fatal crashes within the State, where the Minnesota State Patrol is the primary reporting law enforcement agency, stretching back to 2017. This represents less than 5% of all total crashes within the State every year. This data is all scraped from the web, transformed, and loaded to a relational database. Furthermore, the data has no explicitly geospatial data, and only contains an unstructured location description as the sole source of information for where an incident occurs. This description is fed into a geocoder to create points in space for each crash. Quality assurance and control (QA/QC) processes are implemented to ensure that the data is adequate for downstream use and analysis. The near real-time pipeline that is composed of these various steps is deployed in the cloud and executes in an autonomous manner (Fig. 1).
Methods
The focus of this project was to conduct an analysis of competing methods/algorithms, to understand if the different methods told consistent stories about the underlying dynamics of crashes across the State. First, the Local Indicators of Spatial Association (LISA) method was used to identify spatial clusters and outliers of accidents, aggregated by the various Cities, Townships, and Unorganized Territories that make up Minnesota (Fig. 2). Second, the Approximate DBSCAN (A-DBSCAN) method was used to identify robust spatial clusters directly from the point patterns (Fig. 3). Lastly, to understand the temporal robustness and stability of the clusters created by A-DBSCAN, the algorithm was run on yearly subsets of the data, before the resulting cluster footprints for each run were overlaid to understand which areas are consistently included within clusters across different years (Fig. 4). Lastly, ad-hoc analyses were conducted on the global time series, as well as the individual time series of different A-DBSCAN clusters to understand trends in the various signals (Fig. 5). Namely, offline change point detection algorithms were used to identify discontinuity within the time series signals.
Fig 2. LISA accident clusters.
Fig 3. A-DBSCAN Convex Hulls.
Fig 4. Temporal Overlay A-DBSCAN.
Results
The various methods used all share different perspectives of the underlying data, but all tell a consistent story of where accidents are occurring. There are around 7 to 15 different accident corridors around the State, based on the outcomes of the analyses conducted in this project. In addition to the methods all sharing a similar story to one another, qualitative evidence in the form of news articles also seems to back up these stories as well. Lastly, based on the qualitative evidence from various news articles as well as time series analysis, the results suggest that road construction may be a driver in accident frequency, however, more thorough work is needed to investigate these findings.
Next Steps
With a large amount of data and analytical outputs created through the data pipeline, an API and web application were created to enable both technical and non-technical users to use and interact with the data for various purposes. One future goal is to expand the capabilities of these applications. In addition to this, with an easy-to-use data source established, the ability to make use of different analytical methods is more easily attainable. Opportunities such as network-based analysis, spatiotemporal prediction, and agent-based modeling all could be potential avenues for further exploration.
Fig 5. Time Series of Clusters with Major Construction Projects.
Project Advising
Dr. Bryan Runck, Associate Director, GeoCommons
Award
This project was awarded first place at the graduate student competition held by the Minnesota GIS/LIS Consortium at their 2023 annual conference.