from Airy
Learning/teaching data science, machine learning, and artificial intelligence. Lead data scientist at Airy. Adjunct lecturer at Universitas Al Azhar Indonesia.
"This talk will compare and contrast dimensionality reduction techniques, such as PCA, autoencoder, t-SNE, and UMAP to visualise high-dimensional data."
Learning/teaching data science, machine learning, and artificial intelligence. Lead data scientist at Airy. Adjunct lecturer at Universitas Al Azhar Indonesia.
Data usually have high dimensionality, especially unstructured data. Extracting [unigrams](https://en.wikipedia.org/wiki/N-gram) from text data, for example, usually yields hundreds to thousands of features. Even when we are using machine learning models to do a supervised learning task, interpretability of the model is usually hard to achieve.
A workaround to this is to visualize the data in 2D, so that we can see some interesting patterns in the data. I will show some interesting figures from applying dimensionality reduction techniques, such as [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html), [autoencoder](https://blog.keras.io/building-autoencoders-in-keras.html), [t-SNE](https://distill.pub/2016/misread-tsne/), and [UMAP](https://github.com/lmcinnes/umap) to some data. I will also introduce [Altair](https://altair-viz.github.io/) as the go to library to interact with the resulting visualization.