Scrapy: Web Scraping Made Easy

Indonesia Talk

Sigit Dewanto

from Software Engineer @ ScrapingHub

Software engineer, interested in web mining and web data extraction.

"Web scraping task can become complicated. I’ll explain how Scrapy will make it easier for developers to create web scraper."

* Software engineer at ScrapingHub (2014-now), mostly work on developing Scrapy spiders.
* Experienced in web data extraction, both in industry and academic research.
* Re-implemented some automatic web data extraction algorithms:
** AutoRM (Shi et al., 2015) and DAG-MTM (Shi et al., 2014): https://github.com/seagatesoft/webdext
** DEPTA (Zhai and Liu, 2006): https://github.com/seagatesoft/sde
* Previous jobs:
** Software Engineer, Wego.com (2012-2014)
** Technical Consultant, Jatis Solutions (2010-2012)
* Education:
** Master of Computer Science, Universitas Gadjah Mada (2015-2018)
*** Thesis: XPath Wrapper Development from AutoRM and DAG-MTM's Extraction Result
** Bachelor of Computer Science, Universitas Gadjah Mada (2005-2009)
*** Thesis: Web Information Extraction System using Automatic Pattern Discovery Method based on Tree Matching