from Senior Data Engineer
I’ve been working in Gojek since September 2017 as Senior Data Engineer to develop Streaming and Batch Data Pipelines to support business data analysis.
"ELT using DBT Python so data engineer can create a modular data pipeline with automatic generated data lineage and documentation."
*About Me*
I’ve been 2 years in GO-JEK as Senior Data Engineer. My main function is to create the best pipeline to ingest streaming - batch data and transform it so can be consumed directly by Analysts. I’ve experienced in Python for more than 3 years to create data analysis, data transformation, daily/hourly reporting, and web services API.
*Talk Experiences*
Cloud and Big Data Seminar - Jakarta (May 2019) Economy Faculty Universitas Indonesia
Google Next 2018 - San Francisco (July 2018) Google Cloud
Google Cloud Onboard 2018 - Jakarta (February 2018) Google Cloud
Python Conference 2018 - Jakarta (January 2018) Pycon Indonesia
# Data Engineer Pain Points
Being a data engineer that maintaining data transformation pipeline is having several pain points. Since we care most about data quality, we want to make sure our data is 100% correct, valid, completed, and unique per key. With that mandatory quality, we also want to deploy our daily job in the most efficient way. Moreover, if the data points are huge and the relations between tables are complicated, it's hard to know the parent-child relationship and its data transformation. So that we need data lineage for that particular reason.
# Data Build Tool Python
Nowadays, many analytics consultant experts find a way to create better data transformation workflow. They chose Python as the programming language and created a tool called Data Build Tool (DBT). Since python is easy to read for the data engineer to the data analyst, it gives a boost to the tools until it becomes used by many companies across the world.
DBT is a command-line tool python-based that using SQL to the data engineer. With DBT, data engineer takes ownership of the entire data transformation workflow, from writing data transformation code to deployment, data quality checking, and data lineage documentation in a single repository.
# Data Transformation in Gojek
We always do the *poc* with the latest technology and concept to finding the best way to maintain our data transformation job. And we think DBT Python is the new way to develop our data pipeline with better quality and documentation.