1 months ago 423 views
Presented by Ian Littman
June 24, 2021
It may be date('Y'), but plenty of data that you, or your company, may need isn't in a database you own or an API you can access...but *is* available via a website comewhere. With the right techniques, you can automate your way into accessing that data via various means that folks call "web scraping."
Ian Littman has done his fair share of scraping over the years. He's interacted with dozens of data sources ranging from basic, server-rendered HTML, to modern web apps actually underpinned by a private API, plus a weird edge case or two along the way, using a variety of tools in the process. One potentially surprising finding is that you can do a lot more scraping than you'd think with plain, CPU- and RAM-efficient PHP code, as long as you approach the problem from the right direction. In this presentation, he'll share some tips, tricks, and tools to give you the best shot at reliably, efficiently getting the data you need from the interfaces you can access. These tips will be brought to life with a couple of examples, themselves ranging from basic to rather involved, transcending HTML and JSON into other data formats you're likely to see while trawling the web.