Building a fully automated Medium stats pipeline to track my writing performance

Combining Prefect, Selenium, Pandas and Metabase to measure and manage my writing success on Medium

Motivation and Background

The idea behind it was mostly inspired by two particular talks from the conference that helped me to come up with the general idea and architecture:

1) Modern data stacks

This talk presented the most popular options for data ingestion, storage, transformation and visualization, as well as some examples of how Project A portfolio companies built their data stack. During the talk they also shared two excellent articles, I can highly recommend:

  1. The startup data stack starter pack (2020)

2) From zero to hero — A marketing sophistication framework

In this talk the authors presented six marketing sophistication stages and the respective approaches used in acquisition and engagement, analytics setup, tool infrastructure and organisation in place.

You can’t manage what you don’t measure. — Peter Ducker

Hence, I started this project to better measure and manage my medium writing.

Building the Architecture

Since this project is one of the more complex ones, I used my favorite visualization tool draw.io to show you a high-level overview of the architecture that I came up with:

Target Architecture for a fully automated Medium stats pipeline

1. Scraping Medium Stats with Selenium

In order to get the stats out of Medium, I chose Selenium for these two reasons:

  1. A lot of rendered Javascript and a GraphQL endpoint that I was not able to fetch with requests
Showcasing of Medium Stats Scraper

2. Setting up the Extract, Transform Load Pipeline with Pandas

With the scraped data, I used pandas to set up a very simple ETL pipeline that extracts data from all CSVs, transforms and loads them into an sqlite database. With my learnings from the Udacity Data Engineering nanodegree, I set up a star schema model, with one dimension table for posts and two fact tables for post stats and external views:

Medium Star Schema Model

3. Orchestrating workflows with Prefect

To make this project really work, the orchestration and scheduling of the scraping and ELT jobs were key. While I learned some of the basics of Airflow during my Udacity Nanodegree, I found it difficult and annoying to set up and get Airflow working.

Example run of my medium stats scraper workflow.
Flow Run Schematics of Scraping Job.

4. Visualization of Results

For the visualization of results I chose Metabase, an open source BI tool I work every day with at N26. The tool is not only free, but you can also set up some very nice looking dashboards with filters in a very short time:

Main Dahshboard VIew

1. Total Stats

This section helps me to see the “bigger picture” of my writing and motivates me to understand on a high level how every day my main key performance indicators develop. I particularly love to see my total earnings (not a millionaire yet) as well as how much time medium members spent reading my blog posts.

Medium Main KPIs

2. KPI daily Series Changes

In this section I wanted to better understand how KPIs shift from day to day. With the sorted stacked bar chart, I can easily see how different KPIs perform day by day (biggest ons sorted at the bottom). As I am collecting more and more data, I also plan to include some weekly aggregations in the future.

Times Series Stats on my Blogpost Performance
Before clicking on the Title.
After clicking on the Title.

Outlook and some even more exciting ideas

This dashboard is the first part of a much bigger automation plan. A key chart that I wanted to share is the external views chart:

Total Medium Blogpost views by External viewing Source

A smart Twitter Bot

For the next big automation step, I plan to scrape tweets that are about some relevant Medium blog post topic that I wrote about before. For example, if someone asks for advice on a Data Science Portfolio Project, I want to automatically share via Twitter my blogpost on A step-by-step guide for creating an authentic data science portfolio project.

Operational Intelligence Business Analyst at N26 looking to build his skills in Data Science.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store