Core Java

How to crawl websites with Selenide and JDK 14+

Sometimes we find ourselves in a situation in which we need certain data, that needs to be manually fetched from some website. As developers, of course automation is our friend, which is why we can write some automated approach to crawl websites, instead of searching all this information ourselves. I’ve recorded a video, in which I’m fetching up some data from my blog website and transform it into a CSV format, by using Selenide and some new Java features such as Records.

Please keep in mind to be a nice citizen and only use such techniques for websites and situations where you’re allowed to do so, and where your actions don’t disrupt any service.

You can find the code example on GitHub: Selenium Playground

What we’re doing is to use Selenide with it’s helpful queries and methods, and Java Records and Streams to map the entries of my blog to a desired output format. The difference to using a web API is that we have to be a bit more creative in how we identify and get the individual parts, since the data is not necessarily structured for automated consumption.

Published on Java Code Geeks with permission by Sebastian Daschner, partner at our JCG program. See the original article here: How to crawl websites with Selenide and JDK 14+

Opinions expressed by Java Code Geeks contributors are their own.

Sebastian Daschner

Sebastian Daschner is a self-employed Java consultant and trainer. He is the author of the book 'Architecting Modern Java EE Applications'. Sebastian is a Java Champion, Oracle Developer Champion and JavaOne Rockstar.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button