How To Scrape Data From a Webpage Using Vanilla JavaScript


In this tutorial, we’ll take a look at how to use JavaScript in a browser’s dev tools to scrape data from any webpage.

If you’ve ever had to manually collate data from a webpage into a different format, like a spreadsheet or a data object, you’ll know it’s a very repetitive and tiresome process!

Luckily, most browsers include tools that allow you manipulate any webpage as much as you’d like. These tools are called developer tools (commonly referred to as dev tools) and are usually used by web developers to debug websites. We’ll be using them in this tutorial.

This tutorial requires prior knowledge of Javascript as we’ll be writing code in JavaScript to handle interacting with the webpage and collecting the data. 

There are different ways to access dev tools depending on the browser you’re using: Chrome, Safari, Firefox or Microsoft Edge. The most common way is to right-click (or Control + click) on the webpage and select the Inspect option.

Once we have our dev tools open, the two tabs we’ll be using are the Elements tab and the Console tab.

The devtools panel open on a webpageThe devtools panel open on a webpage
The dev tools panel open on a webpage

The Elements panel shows us all the HTML elements present on a webpage and the Console panel allows us to write JavaScript code directly in the webpage.

2. Identifying the Elements

The next step is to identify which elements we want to scrape from the webpage.

For example, let’s say we wanted to get a list of tutorials written by a Tuts+ author. We’d open dev tools on the author page and identify which element we wanted to scrape by using the inspect selector tool.

The inspect tool allows you select the element you want to inspect on the webpage

3. Targeting the Element

The next step is to target the element from the Console panel using JavaScript. There are multiple ways to target elements using JavaScript and in this tutorial we’ll be using the methods querySelectorAll() and querySelector().

In the example above, we want to target all elements with a class name posts_post. We can do this by typing the following command in the Console panel:

Now we have a variable posts that contains the elements that we want to collect data from.

4. Manipulating Elements with JavaScript

Since we’re trying to scrape data from a webpage, we need to identify what data we want to collect. In this example, let’s collect the title and description of each tutorial.

Let’s write a function that allows us to collect the title and description from each li.posts_post in our posts variable.

Going back to our webpage and inspecting the elements again, we see that the title text is contained in a h1 tag and the description text is contained in a div with the class name posts_post-teaser.

To target these elements, we’ll write the following command into console:

Let’s breakdown what’s happening in the above code:

  • Create a new variable postsObj to store the manipulated data
  • Use a spread syntax […] to convert our posts variable from a NodeList to an array.
  • Use the map function to loop through the posts array and carry out the manipulation on each post 
  • Target the h1 and posts__post-teaser elements inside the post and store their innerText values inside the object keys title and description
  • Return an object value that contains the key and value pairs defined 

This is what out postsObj value will return:

5. Conclusion

To recap, in order to scrape any data from page, we:

  1. Access the browser dev tools
  2. Identify the element using the inspect tool
  3. Use the Console panel to target and collect data from the elements
  4. Store the data in a Javascript object using the map method

Of course, manually writing JavaScript code in dev tools isn’t the only way to scrape data on a webpage and there are a lot of web scraper extensions that offer the same functionality without the need to write code.

However, this method is very useful for getting familiar with the developer tools in a browser and understanding how to manipulate data with JavaScript. 

Source link


Please enter your comment!
Please enter your name here

Share post:




More like this

11 Best Video Editing Apps

We’re living in a world where video content...

Advent Calendars For Web Designers And Developers (2022 Edition) — Smashing Magazine

Are you ready for the countdown to Christmas?...

How to Manage Your Web Design Business with Squarespace

If you’re looking for a way to streamline...