Today, I want to go over how I’ve built the data pipelines to deliver the insights you see on this website and what makes it different from other resources out there. I’ll cover all the major steps and explain why I chose each option while trying not to be boring as I write all this stuff.

First of all, where does the data come from?

For the Twitch data in particular, there are two main sources. The Twitch data comes from… Twitch! The rest of the information related to the game itself comes from IGDB. As for the data unrelated to Twitch, I’ll talk about that in a future blog post.

But… how is it done?

Alright, so here’s where all the important information goes. I’ll break down the steps below.

PYTHON

Using Python I can reach out to Twitch’s and IGDB’s APIs, authenticate, request, and process the data later on.

GOOGLE CLOUD PLATFORM

First of all, I chose Google Cloud Platform as the main suite for all services I use here because of practicality and because I use it for other services other than the ones I’ll mention here (besides GCP being awesome).

[For those who don’t know, Google offers credits so you can test and see if it fits your needs and test different options. By the way, those credits apply to all Google Cloud Platform services.]

VIRTUAL MACHINES

I have a couple of instances running, for both retrieving and processing the data. I must have one instance dedicated for retrieving so I get data 24×7. The rest is a mix of data retrieval and processing.

There’s a lot to unpack here in terms of technicality, so I will just skip this part so I don’t lose you! I might write another blog post in the future specifically on this.

STORAGE

BigQuery. One might say that the other options in the market might be better, but I believe that for my use case, BigQuery serves me well. 

DATA VISUALIZATION

The last step is the visual tool: Power BI. Power BI takes the already processed data stored on BigQuery and transforms it into interactive, visually appealing dashboards. The dashboards not only show the continuous influx of data daily but also several other datasets, like games datasets, brands datasets, etc. 

The data in the dashboards is updated at least once a day to ensure that you always have complete data from yesterday and earlier.

So, the choice of presenting the data in Power BI dashboards instead of charts like those found elsewhere is because I wanted a tool that can be interacted with. The whole idea of GGs Analytics is to provide personalized insights, and Power BI is one tool that allows for that. You can filter entire dashboards using the available filters or even interact with the charts by clicking on them, filtering entire dashboards on the fly to better understand the trends. And all of this on the same page.

[For those who missed the first post, my focus is on Data Science. Even though there’s a continuous effort to make this website more appealing and functional, this is still a project with limited funds, so I have to focus the little resources available on the content itself. Once I’m happy with the content, I’m going to focus on the packaging.]

On-going endeavour

As I’m only showcasing a part of the data I have, I will be adding more dashboards in the coming weeks and months. I’m totally open to feedback, so if there’s anything you want to see here, feel free to send me a message.

Not only that, I’m gonna offer some other pretty neat features (dashboard and not-dashboard related) in the future that might be exactly what you need.

Well, that’s all folks. Catch you in the next one!

Gus