from Software Engineer @ ScrapingHub
Software engineer, interested in web mining and web data extraction.
"Web scraping task can become complicated. I’ll explain how Scrapy will make it easier for developers to create web scraper."
* Software engineer at ScrapingHub (2014-now), mostly work on developing Scrapy spiders.
* Experienced in web data extraction, both in industry and academic research.
* Re-implemented some automatic web data extraction algorithms:
** AutoRM (Shi et al., 2015) and DAG-MTM (Shi et al., 2014): https://github.com/seagatesoft/webdext
** DEPTA (Zhai and Liu, 2006): https://github.com/seagatesoft/sde
* Previous jobs:
** Software Engineer, Wego.com (2012-2014)
** Technical Consultant, Jatis Solutions (2010-2012)
* Education:
** Master of Computer Science, Universitas Gadjah Mada (2015-2018)
*** Thesis: XPath Wrapper Development from AutoRM and DAG-MTM's Extraction Result
** Bachelor of Computer Science, Universitas Gadjah Mada (2005-2009)
*** Thesis: Web Information Extraction System using Automatic Pattern Discovery Method based on Tree Matching
### Web scraping tasks
* Crawling task: how to get the web pages
* Extraction task: how to extract data from the web pages
### Scrapy: web scraping framework
* Scrapy architecture
* Scrapy built-in features
* Crawling: sending HTTP request
* Extraction: extract data from HTTP response
### Demo
* Inspect web pages
* Experimenting using Scrapy shell
* Write and run Scrapy spider
### Ethics and legal concerns