Geospatial Data in Python – Interactive Visualization

0
39
Geospatial Data in Python - Interactive Visualization


Geospatial data is data about objects, events, or phenomena that have a location on the surface of the earth. Geospatial data combines location information (usually coordinates on the earth), attribute information (the characteristics of the object, event, or phenomena concerned).
Kristin Stock, Hans Guesgen, in Automating Open Source Intelligence, 2016

Geospatial data is the core component of Spatial Data Science which is a subset of Data Science. Location, distance and spatial interactions are the core aspects in SDS that are treated with specialized methods and software to analyze, visualize and learn from spatial data.

In this tutorial, You’ll learn how to work with geospatial data and visualize it on an iteractive leaflet map using Python and Folium library.

Folium is a powerful library that combines the strength of Python in data processing and the strength of Leaflet.js in mapping. Its ease of use allows you to create interactive maps and populate it with data in just few lines of code.

To start, let’s get the tools ready!

I’m using the latest versions at the time of writing.

Python: 3.9
Folium: 0.12.1
JupyterLab: 3.2.5

You can use Google Colab or Kaggle Kernels where Folium is already installed for you, if you’re using JupyterLab locally, you can easily install Folium with the following command:

pip install folium

In this workshop, we’re going to work with Hospitals locations in the US using the dataset from HIFLD open data portal under public domain license.

  • Source: HIFLD | Download from here or here.
  • Content: Hospitals in USA. (Locations and other information).
  • Records: 7596.
  • Last update: 8th Dec 2020.
  • License: Public Domain.

The source data is available in a variety of formats (pdfs, tables, webpages, etc.), and it includes wide variety of useful features. But for the purpose of this workshop we’ll use the .csv format and we’ll focus only on fewer columns:

ADDRESS: string
STATE: string
TYPE: string
STATUS: boolean
POPULATION: integer
LATITUDE: decimal
LONGITUDE: decimal

The pairs (LATITUDE, LONGITUDE) are used to place the locations on the map, while other columns like STATE, TYPE and STATUS are used for filtering, and finally ADDRESS and POPULATION are used as metadata for cusomizing the markers on the map. (If you’re practicing with the code in this lab on your own, you can include more columns if you want to apply further customizations on the map.)

Now, let’s start coding!

Let’s first define some useful constants like the list containing our targeted column names WORKING_COLS, the file path FILE_PATH, and the state name we’re working on STATE.

FILE_PATH = "__PATH__TO__CSV__FILE__"
WORKING_COLS = ["ADDRESS", "STATE", "TYPE", "STATUS", "POPULATION", "LATITUDE", "LONGITUDE"]
STATE = "CA"

Then, load the data and keep only the columns listed above for the given state.

hosp_df = pd.read_csv(FILE_PATH)
hosp_df = hosp_df.loc[hosp_df["STATE"] == STATE, WORKING_COLS]

Here’s how the data looks like when after loading:

Screenshot 2021-12-25 213559.jpg

Before we start using the data, let’s explore it and see if we need to clean it or to apply some preprocessing.

Missing values

First, we can check if we have any missing values (NaN), this can be verified with hosp_df.isna().sum().sum() that gives 0 which means there’s no missing values.

Numeric values

To check the consistency of numeric values, some useful statistics on numerical features like POPULATION, LATITUDE and LONGITUDE can be computed using the dataframe’s describe method hosp_df.describe():

Screenshot 2021-12-25 220158.jpg

It’s noticable that the population column has some negative values since the min=-999, and it’s clear that population cannot be a negative value, so it must be fixed either by adjusting the values to 0 or by dropping the rows containing negative values.

hosp_df = hosp_df[hosp_df["POPULATION"] >= 0]

Finite values

The column STATUS contains two unique values "CLOSED" or "OPEN", this can be verified by hosp_df["STATUS"].unique().

Folium provides severl useful features to plot and cusomize interactive maps, we’ll start with plotting a basic map and customize it with our data and advanced features.

Plotting a basic map with Folium

Folium provides the class folium.Map() which takes location parameter as a list containing one pair of latitude and longitude, and it generates a map around the given location, to automatically center the generated map around our data, we can pass the mean value of latitude and longitude values in the data:

m=folium.Map(
    location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
    zoom_start=6)
m

The generated map is interactive, you can zoom in and out using the buttons on the top-left corner or the mouse wheel.

Screenshot 2021-12-27 230500.jpg

Adding tiles to a map

The default tileset in Folium is OpenStreetMap, for different representations, we can add layers with different tiles like Stamen Terrain, Stamen Water Color, CartoDB Positron, and more. Each tileset is used to show different features of a map.

It’s possible to add multiple tile layers to a single map in Folium using the class folium.TileLayer, and switch between them interactively using a layer control panel folium.LayerControl.

m=folium.Map(
    location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
    zoom_start=6)
folium.TileLayer('cartodbdark_matter').add_to(m)
folium.TileLayer('cartodbpositron').add_to(m)
folium.TileLayer('Stamen Terrain').add_to(m)
folium.TileLayer('Stamen Toner').add_to(m)
folium.TileLayer('Stamen Water Color').add_to(m)
folium.LayerControl().add_to(m)
m

The folium.LayerControl provides an icon on the top-right corner to pop up a radio group for switching between different layers.

Screenshot 2021-12-28 003545.jpg

Adding markers to a map

Markers are important in an interactive map to specify a location. Folium provides folium.Marker class to create a with a given location that can be added to a map.

Basic Markers

It’s possible to plot all the points from our data by passing that process as a lambda function to the apply method on our dataframe.

m=folium.Map(
    location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
    zoom_start=8)

hosp_df.apply(
    lambda row: folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']]
        ).add_to(m),
    axis=1)
m

Screenshot 2021-12-29 013233.jpg

Customized markers

To customise the markers, more parameters can be passed to folium.Marker:

  • popup: folium.Popup or str (can be html) content to be displayed when clicking on a marker.
  • tooltip: folium.Tooltip or str (can be html) content to be displayed when hovering on a marker.
  • icon: folium.CustomIcon, folium.Icon or folium.DivIcon – the Icon plugin to use to render the marker.

The popup and tooltip content can be customized by either providing a plain formatted text or html block, it’s also possible to change the appearance of the marker by providing one of the classes folium.CustomIcon, folium.Icon or folium.DivIcon to the icon parameter. The folium.Icon class takes an icon name alongside with the provider’s prefix ("fa" or "glyphicon" which is by default) and the color name or code, the list of glyphicon can be found here.

m=folium.Map(
    location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
    zoom_start=8)

def get_icon(status):
  if status == "OPEN":
    return folium.Icon(icon='heart',
                       color='black',
                       icon_color='#2ecc71'
                       )
  else:
    return folium.Icon(icon='glyphicon-off',
                       color='red')

hosp_df.apply(
    lambda row: folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        popup=row['ADDRESS'],
        tooltip='<h5>Click here for more info</h5>',
        icon=get_icon(row['STATUS']),
        ).add_to(m),
    axis=1)
m

Screenshot 2021-12-29 013653.jpg

Bubble map

To represent numeric values on a map, we can plot circles of different sizes by binding the circle radius to its value in the dataset, in our case, we’re representing the covered population by each center with a circle of radius proportional to its POPULATION value.

The folium.CircleMarker class takes a required parameter radius in addition to more inherited parameters to customize its appearance.

  • radius: number – Radius of the circle marker, in pixels. (default: 10)
  • stroke: boolean – Wether to draw stroke along the path or not. (default: True)
  • color: str – Stroke color.
  • weight: number – The width of the stroke in pixels. (default: 3)
  • opacity: number – Stroke opacity from 0 to 1.0. (default: 1.0)
  • fill: boolean – Wether to fill the path with color. (default: True)
  • fill_color: str – Fill color. Defaults to the value of the color parameter.
  • fill_opacity: number – Fill opacity. (default: 0.2)

PS. for simplicity, I’m just multiplying the population values by a factor of 1/20. A more reliable way would be mapping the population values with a specific radius range.

m=folium.Map(
    location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
    zoom_start=8)

def get_radius(pop):
  return int(pop / 20)

hosp_df.apply(
    lambda row: folium.CircleMarker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        radius=get_radius(row['POPULATION']),
        popup=row['ADDRESS'],
        tooltip='<h5>Click here for more info</h5>',
        stroke=True,
        weight=1,
        color="#3186cc",
        fill=True,
        fill_color="#3186cc",
        opacity=0.9,
        fill_opacity=0.3,
        ).add_to(m),
    axis=1)
m

Screenshot 2021-12-30 192721.jpg

Marker Clusters

When working on an intensive map, it can be useful to use marker clusters in order to avoid the mess caused by many nearby markers overlapping each other.

Folium provides an easy way to set up marker clusters, so instead of adding markers directly to the map, they are added to a folium.plugins.MarkerCluster instance which is then added to the map.

m=folium.Map(
    location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
    zoom_start=8)

cluster = MarkerCluster(name="Hospitals")

def get_icon(status):
  if status == "OPEN":
    return folium.Icon(icon='heart',
                       color='black',
                       icon_color='#2ecc71'
                       )
  else:
    return folium.Icon(icon='glyphicon-off',
                       color='red')

hosp_df.apply(
    lambda row: folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        popup=row['ADDRESS'],
        tooltip='<h5>Click here for more info</h5>',
        icon=get_icon(row['STATUS']),
        ).add_to(cluster),
    axis=1)
cluster.add_to(m)
m

marker_cluster.png

When hovering on a cluster, it shows the bounds of the area covered by that marker. this default behaviour can be omitted by setting showCoverageOnHover option to false as follows:

cluster = MarkerCluster(name="Hospitals", options={"showCoverageOnHover": False})

Folium provides more options to discover and use for visualizing your geospatial data on interactive maps, so it’s worth reading the documentation and get hands-on experience for your further projects.

You can find the jupyter notebook here to reproduce the results in this tutorial.



Source link

Leave a reply

Please enter your comment!
Please enter your name here