Most of us are adept enough with computers that you know what they can easily do and what they can’t. Invent a new flavor of ice cream? Not easy. Grab the news headlines related to Arduinos from your favorite news feed? Relatively easy. But, of course, the devil is in the details. FreeCodeCamp has a 3-hour course from [Frank Andrade] that dives into the gory details of automating web tasks using Python and a variety of libraries like Path, Xpath, and Selenium. You can watch the course, below.
Topics start off with grabbing tables from websites and PDFs. But it quickly graduates to general-purpose web scraping and even web automation. These techniques can be very useful for testing browser-based applications, too.
By the end, you’ve created an executable that grabs news every day and automatically generates an Excel report. There’s also a little wind down about WhatApp automation. A little something for everyone. We also greatly approved of [Frank]’s workspace which appears in the background. Looks like he would enjoy reading Hackaday.
Honestly, while we’ve seen easier methods of automating the browser, there’s something appealing about having the control something like Python affords. Sure beats building hardware to simulate a human-in-the-loop.
why not ruby? is faster than python
Not really true and not really relavamt for web scraping like this. The speed of the language is completely swamped by network requests and interaction with browser and stuff.
But it’s cool to say Python is slow!
Webdriver on Windows is awfull, JS made automation really hard.
Generates an Excel report? How quaint.
My thoughts exactly, like using a Ferrari to pull a farm wagon.
A Lambo would be a better choice…
https://www.driving.co.uk/news/diversions/clarksons-farm/jeremy-clarkson-bought-lamborghini/
But a Lamborghini can!
B^)
https://www.lamborghini-tractors.com/en-eu/
This is actually useful for a lot of people since a lot of business still organize their data in excel.
What should I be learning instead to keep up with the times?
In this case, I’d probably generate some sort of dynamic web thing.
C and libcurl, it’s probably a lot harder, but you’d be programming in a real language and once it’s a lot more powerful. Once you start integrating your data in real programs you’ll feel the difference.
C and libcurl to keep up with the times? That’s crazy overkill and keeping up with decades ago.
If you want to overkill and keep up with the times use rust or go.
Great timing, I need a refresher in this stuff. Excel or csv ain’t quaint, they’re just useful “universal” human-readable formats that can be easily converted or input to something else.
Agree
Agreed. .csv files are more compact and easier to process than JSON. They also fit really well into an SQL pipeline with little added boilerplate or translation.
Yes of course, scrape your favorite news topics from a website, then put it into everybody’s favorite media reader, Excel :)
These comments remind me why I always avoid evangelists when I hire programmers.
Nothing wrong with Goot old LWP::UserAgent ;-)