Explore your Medium.com articles with Knowledge Graphs and NLP

medium-sky
dashboards javascript medium-com

After writing this article, I developed an open-source app named Medium-Sky, which is available in GitHub. You can also check out a live demo or the live demos of popular Medium.com personas available in GitHub like Barack Obama.

As a Medium Writer I always wanted to have an overview of my writing.

Initially I thought, it would be awesome to have a map of my articles with some statistics integrated into it. To bring this idea to life, I decided to use knowledge graphs. I used stars (⭐) to represent main articles and planets (⚪) to represent external website domains. The lines between the stars and planets represented direct references. In addition, the size of the star showed how popular the article was (determined by the number of voters), while the size of the planet reflected the number of unique references it had in main articles.

That was it, I had my map which I decided to name Medium-Sky.

1_justdataplease_sky

I also wanted to create a profile of myself as a writer and determine the subjects that interested me the most. To achieve this, I utilized basic NLP techniques as well as ChatGPT (inevitably – since it has invaded our lives) to extract keywords from my articles. By combining them, I was able to determine the most frequently used keywords. Furthermore to complete my profile, I calculated some metrics that showed my writing behavior such as my preferred time of day and day of the week for publishing, frequency of publishing, article length, external domains utilized, and metrics related to POS (Part of Speech) usage.

To keep this article short and interesting, I will not go into much detail about how each metric is calculated. If you are interested, you can check out GitHub where I have provided detailed documentation.

Generate your own Medium-Sky

To use Medium-Sky to create your own HTML knowledge graph,

1) Clone GitHub repo.

git clone https://github.com/justdataplease/medium-sky.git

2) In order to download Medium articles, you need to subscribe to medium.com API Rapidapi (ATM you get 150 requests per month for free – if you want to analyze all your articles, this app will work for free if you have less than 148 articles).

3) [OPTIONAL] In order to calculate ChatGPT metrics you need to have an account at OpenAI.

4) Copy paste .env_sample to .env and paste you X-RapidAPI-Key that you will find here and [OPTIONAL] Openai api key that you will find here.

5) Install requirements.

pip install -r requirements.txt

6) To get all you articles (without ChatGPT integration) run:

python kgraph -u=<username>

To get the last 10 of your most recent articles run:

python kgraph -u=<username> -l=10

If you want to use ChatGPT to get summary and keywords metrics, run:

python kgraph -u=<username> -l=10 -ai

7) Finally you can find the generated HTML in the output folder.

<username>_m.html

Let's try it out

To test how Medium-Sky works, we’ll explore some Medium profiles (Last updated 2023-04-05). I chose some Medium users that I follow, and I also searched for some popular people on Medium or in real life.

We will analyze the latest 30 articles from these authors and draw conclusions based on them.

The following analysis is just a show-case of Medium-Sky based on articles content and may or may not reflect reality. If you are one of the authors mentioned bellow and wish to remove your Medium-Sky send me a message.

Barack Obama, Tony Stubblebine, Darius Foroux, Anne Bonfert, Benjamin Sledge, Cassie Kozyrkov, The PyCoach, Nikos Kafritsas, Dmytro Iakubovskyi, TDS Editors, Dagster Blog

Profiles are sorted randomly.

@barackobama (200k)

2_barackobama_sky

The Former President of USA that has a highly popular profile with 200k followers with a good score of 12 claps per person.

The profile appears to be active as the author last published an article one month ago, with a publishing frequency of 12 days.

The most popular article on this profile is a lighthearted piece that discusses author’s favorites books, movies, and music from 2022.

He doesn’t use any external domains in the articles, with a very low score of 0.7 references per article (with some exceptions like – whitehouse.gov, obama.org, nytimes.com and theatlantic.com). That could imply that the author relies on his expertise, rather than external authority.

The author writes some lengthy articles, with most of them published during working hours (between 12:00 and 19:00) and he is not using any Medium Publishers.

Finally, a keyword analysis of the author’s content shows that he frequently writes about democracy, elections, the pandemic, guns and safety, healthcare, but also Michelle, his wife!

@coachtony (39k)

3_coachtony_sky

The current CEO of Medium.com that has a popular profile with 40k followers with a moderate score of 22 claps per person.

This profile seems to be active since the author last appeared 43 days ago, although he does not write very often with an average publishing frequency of 78 days (30th article is on 2017).

He doesn’t use many external domains in the articles, with a score of 3.5 references per article (average – mainly affected by 2 articles). His favorite publication is Better Humans as he shows his support by mentioning other writers from it and promoting it in his articles with a link to subscribe.

The most popular article on this profile is a talks about productivity and explains how to optimize iPhone settings to create a distraction-free work environment.

The author writes some normal-length articles, with most of them published at the start or the end of work day and mainly on Wednesdays.

Finally, a keyword analysis of the author’s content shows that he frequently writes about productivity, creativity, inspiration, habits and of course Medium!

@dariusforoux (261k)

4_dariusforoux_sky

A famous Book Author that has a very popular profile with 260k followers with a great score of 9.4 claps per person.

This profile seems to be highly active since the author last appeared 5 days ago, with an average publishing frequency of 2.6 days.

He doesn’t use many external domains in the articles, with a score of 2.6 references per article. He only promotes his personal website in every article. That could imply that the author relies on his expertise, rather than external authority. Furthermore, he is not using any Publications.

The most popular article on this profile is emphasizes on the importance of eliminating unnecessary tasks in life in order to become more productive.

The author writes some lengthy articles, mainly in the morning (08:00-12:00) and on Fridays or Wednesdays.

Finally, a keyword analysis of the author’s content shows that he frequently writes about writing strategies, financial success and time management.

@anne.bonfert (163k)

5_anne_bonfert_sky

A famous Traveler and Book Author that has a very popular profile with 163k followers with a bad score of 33 claps per person (or a good score with many fanatics).

This profile seems to be highly active since the author last appeared yesterday, with an average publishing frequency of 20 days.

She doesn’t use many external domains external domains, except her own resources which are referenced in almost every article (score of 4.5 references per article). She is promoting her Newsletter, her book on Amazon and her YouTube channel. Furthermore, her favorite publication is Globetrotters.

The most popular article on this profile talks about the writer’s journey of discovering her passion for writing, nature, travel, and photography.

The author writes some normal-length articles, mainly in the afternoon-evening (15:00-21:00) and on Wednesday.

Finally, a keyword analysis of the author’s content shows that he frequently writes about Africa, travelling, photography, wildlife and nature.

@benjaminsledge (44k)

6_benjaminsledge_sky

A famous War Veteran and Book Author that has a popular profile with 44k followers with a moderate score of 17 claps per person.

This profile seems to be active since the author last appeared 16 days ago, with an average publishing frequency of 16 days.

He uses some external domains per article, with a score of 6 references per articles. His primary sources of reference include his book on Amazon, Wikipedia for event-related information, news sites for factual data, and religious content.

The most popular article on this profile discusses Russia’s invasion of Ukraine and explores the geopolitical factors and historic reasons behind it. Most of the times he is in not writing in any Publication.

The author writes some lengthy articles, mainly in the afternoon(12:00-18:00) and on Tuesdays.

Finally, a keyword analysis of the author’s content shows that he frequently writes about war, mental health and religion.

@kozyrkov (135k)

7_kozyrkov_sky

A famous Scientist at Google that has a very popular profile with 135k followers with a great score of 7 claps per person.

This profile seems to be weekly active since the author last published 7 days ago, with an average publishing frequency of 7 days.

In her articles, she typically references around 6 external sources. However, she tends to rely heavily on her own resources such as her Substack blog, YouTube channel, Linkedin, and Twitter account as the primary sources for her content. Additionally, she also makes use of Google Cloud Platform and Wikipedia for her research.

The most popular article on this profile introduces ChatGPT as a controversial AI tool. Her favorite publication seems to be Towards Data Science.

The author writes some lengthy articles, mainly in the afternoon(12-18) and on Wednesdays or Sundays.

Finally, a keyword analysis of the author’s content shows that he frequently writes about data, decision science, machine learning and statistics.

@frank-andrade (44k)

8_frank_andrade_sky A famous Medium Tech Author or “The PyCoach” that has a popular profile with 44k followers with an impressive score of 6 claps per person.

This profile seems to be highly active since the author last appeared 3 days ago, with an average publishing frequency of 3 days.

His articles generally reference around 4 external sources, with his own previous Medium articles, personal blog, and YouTube channel being his primary sources of reference.

The most popular article on this profile discusses how to effectively use ChatGPT by using prompts to guide its behavior. His favorite publication is the Artificial Corner, that he also owns.

The author typically writes both medium and lengthy articles during the early afternoon hours of 12:00-15:00, with the exception of Saturdays and Mondays.

Finally, a keyword analysis of the author’s content shows that he frequently writes about ChatGPT, and python.

@nikoskafritsas (2k)

9_nikos_kafritsas_sky

A Data Science Author that has 2k followers with a moderate score of 17 claps per person.

This profile seems to be active since the author last appeared 12 days ago, with an average publishing frequency of 19 days.

He generally uses around 5 external sources in his articles and mentions his LinkedIn Profile, a few GitHub repositories, and Kaggle. It’s worth noting that he also mentions arxiv.org, which highlights his identity as a researcher.

The most popular article on this profile discusses the use of deep learning models for time series forecasting. His favorite publication seems to be Towards Data Science.

The author writes some lengthy or very lengthy articles, publishing them primarily on weekdays.

Finally, a keyword analysis of the author’s content shows that he frequently writes about deep learning and time series forecasting.

@dima806 (4.5k)

10_dima806_sky

A Data Science Author that has 4.5k followers with a moderate score of 19 claps per person.

This profile seems to be highly active since the author last appeared 5 days ago, with an average publishing frequency of 3 days.

He typically includes a large number of outside sources in his articles, with an average number of 8. Additionally, he often mentions his LinkedIn Profile and commonly refers to reputable science-related domains such as Kaggle, Scikit Learn, Catboost, which highlights his identity as a practical Data Scientist.

The most popular article on this profile discusses how a story can become viral and shares insights on Medium’s core functionality and algorithm. His favorite publication seems to be Data And Beyond.

The author writes some normal length articles, mainly in the evening on Wednesdays or weekends.

According to a keyword analysis of the author’s content, he often discusses machine learning, shap values, and data preprocessing. Furthermore, he utilizes movie ratings as a means of conducting his analysis.

@towardsdatascience (56k)

11_tds_sky

A very popular Data Science Publication with 57k followers and an impressive score of 7 claps per person.

This profile seems to be highly active since they last appeared 6 days ago, with an average publishing frequency of 4 days.

They don’t use external sources apart from referencing their own publication.

The most popular article on this profile delivers a message to TDS writers, emphasizing the significance of human authorship over AI-generated text.

They write some normal-length articles, and the majority of these articles are published between 12:00 and 15:00 on Thursdays. Furthermore, their word uniqueness metric (29%) is noteworthy, indicating their proficiency as a writers.

According to a keyword analysis of their content, they write about data science and machine learning.

@dagster-io (303)

12_dagster_sky The publication of our beloved open-source pipeline orchestrator with 57k followers and an impressive score of 6 claps per person.

This profile seems to be active since they last appeared 12 days ago, an average publishing frequency of one article per month.

While they use external sources, they primarily promote their GitHub and website.

The most popular article on this profile discusses the use of PostgreSQL as a message queue in the Dagster Cloud system.

They write some lengthy articles and they do not appear to have a consistent publishing pattern with regards to the day and time of publication, except that they do not publish on weekends.

According to a keyword analysis of their content, they write about their tool and data pipelines.

Conclusion

In conclusion, we analyzed multiple Medium.com Personas using various techniques, including knowledge graphs, basic natural language processing (NLP), and ChatGPT. By presenting a summary of their articles and extracting key metrics, we gained valuable insights into their writing patterns and identified what topics they like to write about the most. Moreover, we demonstrated how to create a personalized Medium-Sky in HTML format. I plan to host mine on my blog. How about you?


If you liked my article and you want to support me as a writer please subscribe!