from AI Engineer
An AI Engineer focusing on building machine learning models and platform by utilizing big data and big compute methods into deep learning architectures.
"Data is the new oil. This talk will present the ways to handle 3 V’s in Big Data using Python. The Vs to cover are Velocity, Volume, and Variety."
An AI Engineer focusing on building machine learning platform by utilizing big data and big compute methods into deep learning architectures. Has 15 years of working experiences and more than 20 online certificates. Has just recently completed Deep Learning Nanodegree Program from Udacity. Currently working with TensorFlow, PyTorch, Spark, Kafka and Kubernetes.
We will start by describing 3 V's (Velocity, Volume, and Variety) in big data and explaining why are they important. Then, we will show how to handle them using Python together with some demos, benchmark results, as well as things to avoid.
On the Velocity, we will use Kafka. On the Volume, we will use Spark and TensorFlow Data (tf.data). And finally, on the Variety, we will evaluate some serialization formats (such as ProtoBuf, NPY, Pickle, and HDF5) as well as file formats (such as RDD, JSONL, Parquet, and ORC).