An interactive time-lapse of the Wealth & Health of nations using Streamlit and Docker

the-wealth-health-of-nations
docker streamlit

Introduction

In this article we are going to recreate and analyze the famous Hans Rosling and Gapminder’s world health chart using Steamlit. We will also deploy it to the cloud using docker so we can share it with anyone.

Find here Github, Streamlit Cloud and HTML Chart.

The Process

To deploy the Dashboard we will perform the following steps.

2_streamlit_app_pocess_overview

1. Download the data from Gapminder

To collect the data we will visit Gapminder's Systema Globalis github repo that tries to compile all public statistics; Social, Economic and Environmental; into a comparable total dataset.

### path -> /functions.py
import pandas as pd

# Download data
# https://www.gapminder.org/data/
# https://github.com/open-numbers/ddf--gapminder--systema_globalis

# Total Population of world countries with projection between 1800-2100.
total_population_url = "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/countries-etc-datapoints/ddf--datapoints--population_total--by--geo--time.csv"

# Life Expectancy in years of world countries with projections between 1800-2100.
# Life Expectancy is defined as : The average number of years a newborn child would live if current mortality patterns were to stay the same
life_expectancy_years_url = "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/countries-etc-datapoints/ddf--datapoints--life_expectancy_years--by--geo--time.csv"

# Income per person (GDP/capita - inflation adjusted - PPP$ based on 2017 ICP) of world countries with projections between 1800-2050.
income_per_person_url = "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/countries-etc-datapoints/ddf--datapoints--income_per_person_gdppercapita_ppp_inflation_adjusted--by--geo--time.csv"

# Details for each world country.
geo_countries_url = "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--entities--geo--country.csv"


def download_data():
    total_population = pd.read_csv(total_population_url)
    life_expectancy_years = pd.read_csv(life_expectancy_years_url)
    income_per_person = pd.read_csv(income_per_person_url)
    geo_countries = pd.read_csv(geo_countries_url)

2. Transform data

To create our final dataset we will have to : i) filter years between 1800 - 2023 (data after 2021 are estimates)ii) handle missing and null values iii) merge different datasets into one.

### path -> /functions.py

    # Transform data
    # Create initial dataframes to handle missing values
    countries = pd.DataFrame(life_expectancy_years.geo.unique(), columns=['geo'])
    years = pd.DataFrame(list(range(1800, 2023)), columns=['time'])
    init = countries.merge(years, how='cross')

    # Other transformations
    geo_countries['world_6region'] = geo_countries['world_6region'] \
        .astype(str) \
        .apply(lambda x: x.replace("_", " ").title())
    geo_countries['world_4region'] = geo_countries['world_4region'] \
        .astype(str) \
        .apply(lambda x: x.replace("_", " ").title())

    # Merge dataframe
    merged_data = init.merge(life_expectancy_years, how='left', on=["geo", "time"])
    merged_data = merged_data.merge(income_per_person, how='left', on=["geo", "time"])
    merged_data = merged_data.merge(total_population, how='left', on=["geo", "time"])
    merged_data = merged_data.merge(geo_countries[['country', 'income_groups', 'name',
                                                   'world_4region', 'world_6region']],
                                    how='left', left_on=["geo"], right_on=["country"])
    merged_data.drop(['country', 'geo'], axis=1, inplace=True)

    # Rename columns
    merged_data.rename(
        columns={'name': 'country',
                 'time': 'year',
                 'population_total': 'population',
                 'income_per_person_gdppercapita_ppp_inflation_adjusted': 'gdpPercap',
                 'life_expectancy_years': 'lifeExp',
                 'world_4region': 'continent',
                 'world_6region': 'region'},
        inplace=True)

    merged_data = merged_data.sort_values(by=['year'], ascending=True)

    # Drop countries with null values
    counties_with_missing_data = merged_data[(merged_data['lifeExp'].isna()) |
                                             (merged_data['gdpPercap'].isna()) |
                                             (merged_data['population'].isna())]['country'].unique()

    merged_data = merged_data[~merged_data.country.isin(counties_with_missing_data)]
    return merged_data

3. Create Streamlit App

Next we are going to create a Streamlit App. We will also use an animated scatter-plot from Plotly Express library to visualize the timelapse.

### path -> /main.py

import pandas as pd
import plotly.express as px
import streamlit as st
from os.path import exists
from functions import download_data

# Get data

if exists("data.csv"):
    df = pd.read_csv("data.csv")
else:
    df = download_data()
    df.to_csv("data.csv")

st.set_page_config(layout="wide")

# Configure sidebar

years = list(range(1800, 2023))
max_population = int(max(list(df.population.unique())))

# Add filters to page

pcol = st.columns(5, gap="large")
with pcol[0]:
    continent = st.multiselect('Continent', df.continent.unique())
with pcol[1]:
    country = st.multiselect('Country', df.country.unique())
with pcol[2]:
    years = st.slider('Year Span', 1800, 2022, (1800, 2022), step=10)
with pcol[3]:
    population = st.slider('Population', 1000, max_population, (10000, max_population), step=100000)
with pcol[4]:
    income = st.multiselect('Current Income', df.income_groups.unique())

# Apply filters to dataframe

df = df[(df.year >= years[0]) & (df.year <= years[1])]
df = df[(df.population >= population[0]) & (df.population <= population[1])]
if continent:
    df = df[df.continent.isin(continent)]
if country:
    df = df[df.country.isin(country)]
if income:
    df = df[df.income_groups.isin(income)]

# Plot

fig = px.scatter(df, x='gdpPercap', y='lifeExp', color='region', size='population', size_max=130,
                 hover_name='country', log_x=True, animation_frame='year',
                 animation_group='country', range_x=[300, 100000], range_y=[0, 115],
                 labels=dict(population="Population", gdpPercap="Income per person (GDP/capita, PPP$ inflation-adjusted)", lifeExp="Life Expectancy (Years)"))

fig.update_layout({
    'autosize': False,
    'width': 1200,
    'height': 500,
    'paper_bgcolor': 'rgba(0, 0, 0, 0)',
    'plot_bgcolor': 'rgba(0, 0, 0, 0)',
    'xaxis': dict(showgrid=False),
    'yaxis': dict(showgrid=False),
    'legend': dict(
        title=None,
        yanchor="bottom",
        xanchor="right",
        y=0.1,
        x=1
    )
})

fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 30
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 5

# Export to html
# fig.write_html("data.html")

st.plotly_chart(fig, use_container_width=True)

4. Dockerize it

As a next step we will wrap everything together into a Dockerfile. The contents of a Dockerfile are instructions that describe how to create and build our Docker image.

### path -> /app.Dockerfile

#The FROM command tells Docker to create a base layer.
#In this case, we have used the Python v3.10 image available in Docker Hub
# as the base image on which our application will be built
FROM python:3.10

# COPY command is duplicating the local requirements.txt file (which contains package names and versions
# required by the model app) into the Docker image’s ‘app’ folder
COPY requirements.txt app/requirements.txt

# WORKDIR sets the Working Directory within the Docker image.
# In this case, it is creating an 'app' folder where all files will be stored
WORKDIR /app

# The RUN command specifies what Command Line Interface (CLI) commands to run within the Docker container.
# In this Dockerfile, it is installing all packages and dependencies in the requirements.txt file,
# and running some additional commands to download the spaCy ‘en_core_web_sm’ model correctly from the web
RUN pip install --no-cache-dir --upgrade -r requirements.txt

# COPY command is duplicating all local files into the Docker image’s ‘app’ folder
COPY . /app

# EXPOSE specifies the port number on the network where our application will be deployed and accessible.
# The Streamlit library is particularly compatible with the Port 8501 as set above
EXPOSE 8501

# CMD is used to specify the commands to be run when the Docker container is started
CMD streamlit run main.py

Then we will create a docker-compose.yml file that will include the settings of how to run the Docker image as container.

### path -> /docker-compose.yml

version: "3.7"

services:
  streamlit-wealth-and-health:
    build:
      context: .
      dockerfile: app.Dockerfile
    ports:
      - "8501:8501"
    volumes:
      - .:/app
    networks:
      - stnet

networks:
  stnet:

5. Deploy it to custom server

First we need to install docker and docker-compose, see how here. Then we will run the following command:

### path -> /

docker-compose up -d

Our dashboard will be available at http://0.0.0.0:8501 (or replace with your external ip - IPv4).

6. Deploy it to Streamlit cloud

Just sign up to Streamlit cloud, connect your Github account and make sure to select the correct organization from the top right drop-down menu (1). Then click on the new app button (2) and input the Github url of the Streamlit python file (in this case is https://github.com/justdataplease/wealth-and-health-of-nations/blob/master/main.py). Streamlit will take it from here and manage the deployment process. After 2 minutes you will be able to visit your dashboard at the link provided.

3_host_streamlit_app

Analysis

Now using our dashboard and with the help of Gapminder's article, we will try to briefly answer the question, How was the Wealth & Health of nations over the centuries?

A country can be either rich and sick (bottom right), rich and healthy (top right), poor and sick (bottom left) or poor and healthy (top left).

Through the centuries, it appears that income and health tend to be correlated (countries move on an horizontal line). People in richer countries live longer (top right) or the other way around (bottom left). It seems there are no low income countries with a long life expectancy (high top left) or high income countries with a short life expectancy (low bottom right).

In 1800 all countries were poor and sick located at the bottom left corner, with life expectancy bellow 40 years and income per capita less than 5k. At that time the best performing countries were Netherlands and UK. During the industrial revolution (1760-1840) mainly European countries began separating themselves from the others by becoming richer and healthier, as opposed to colonized countries in Asia and Africa that remained stuck. After 1890 western countries followed the upward trend up until the WW1 (1914-1918) when we can notice a big drop. After that and in spite of the Great Depression, western countries continued to improve all the way except from the period of WW2 (1939-1945). African and some Asian countries were static until 1950 but that started to change when former colonies started to gain independence and as time passed to become wealthier and healthier. Around 1980, a few economies in Asia and Latin America started to emerge and around 1990 they caught up with western countries. On the contrary several countries on Sub-Saharan African could not follow due to diseases like HIV and civil wars. After 1990 (and until 2000) we can also inspect a high peak in population growth (bubbles became big) that can be expected due to high levels of life expectancy.

Today the most prosperous country is Luxembourg, while the less prosperous is Congo. Fortunately most of the countries are in the middle but the differences between countries are still noticeable. There is also a huge difference in life expectancy between countries at the same income level, depending on how the money is distributed.

It is also interesting to isolate a specific country to examine its history. For instance, we can notice a drop in Russia's wealth and health during the Russian Revolution (1917), Germany's around the end of World War II (1945) and China's during the Great Chinese Famine (1858-1961).