How to Make Web Scraper Using JavaScript and NodeJS

Daniyal Akbar

15 June, 2022
7 min read
Comments off
JavaScript NodeJS npm Programming

Hey developers! In this article, we are going to see how to make Web Scraper using JavaScript and NodeJS. For the example of this article, we are going to use NodeJS and Express as its framework. Along with a few common NPM libraries such as Axios and Cheerio. And for demonstration purposes, we are going to extract data from the current website, Programatically.

What is a Web Scraper

A web scraping tool is an easy and convenient way of extracting data and content from a website. Instead of tediously copy-pasting or jotting it down manually, a web scraper tool extracts the data you are looking for and saves it in a format that you want. You just need to target the fields and values using CSS classes and id and it will start scraping data.

Tl;dr

Just initialize an NPM project using these commands:

				
					npm init -y
npm i 
npm i express  
npm i cheerio 
npm i axios

Then create an “index.js” file in the project directory and copy the following code.

				
					const axios = require('axios') 
const express = require('express') 
const cheerio = require('cheerio') 
const { response } = require('express') 
  
const PORT = 8000 
const app = express() 
  
app.listen(PORT, () => console.log(`server is running on PORT ${PORT}`)) 
   
axios('https://programatically.com').then( 
    response => { 
        const html = response.data 
        const $ = cheerio.load(html) 
        var list = [] 
         
        $('.heading-title-text').each(function() { 
            const blog_title = $(this).text() 
            const blog_link = $(this).find('a').attr('href')  
            list.push( {blog_title, blog_link} ) 
        }) 
        console.log(list) 
    } 
).catch(err => console.log(err))

After copying all the above code in “index.js” file, run this command to start the Web Scraper tool:

				
					node index.js

You can find the complete project file GitHub repository link at the bottom of this article.

Prerequisites

– NodeJS should be installed (Download NodeJS)
– Should Know About NPM

Table of Content

Configure a Web Scraper Project
Installing NPM Libraries Used in Web Scraping Project
Create a Basic Web Server Using NodeJS
Scraping All Html Script
Summary of Web Scraping Code

STEP 1: Configure a Web Scraper Project

To begin with, create a folder called “Web-Scraper”. Open it in VSCode or any other IDE you like. Open a terminal or CMD and type in this command:

				
					npm init

After you execute the above command, it will ask you a list of questions. Simply keep on pressing enter to all of the questions and you’ll be done.

Next, create a new file called “index.js” in the same project folder. After that, execute the following command in the terminal or CMD:

				
					npm i

This will create a new file called “package.json” which will contain a list of all the dependencies that we’ll be using in this Web Scraper tool using JavaScript and NodeJS. See the image below of package.json file:

STEP 2: Installing NPM Libraries Used in Web Scraping Project

This is a very simple step. We need 3 NPM libraries in our Web Scraping project. They are Express, Cheerio, and Axios

Express is a very popular NodeJS Framework (Learn More)

Cheerio is used for traversing and targeting elements in your HTML script. It has a very similar syntax to jQuery. (Learn More)

Axios is a popular library used for creating and handling HTTP calls and requests. (Learn More)

Execute the following commands to get these libraries installed.

				
					npm i express  
npm i cheerio 
npm i axios

STEP 3: Create a Basic Web Server Using NodeJS

Moving on, it’s time to create a basic Web Server Using NodeJS and Express for our Web Scraper using JavaScript and NodeJS. Open the “index.js” file that we created earlier and write in the following code.

				
					const axios = require('axios') 
const express = require('express') 
const cheerio = require('cheerio') 
const { response } = require('express') 
  
const PORT = 8000 
const app = express() 
  
app.listen(PORT, () => console.log(`server is running on PORT ${PORT}`))

This will create a basic Web Server on Port 8000. It is where we will send and receive our HTTP requests and response. To see if this basic Web server is working, write the following code in the terminal:

				
					node index.js

Note that you need to stop the project and rerun the above command whenever you make changes to your code. Now check the console and it should print a statement as shown in the image below.

STEP 4: Scraping All Html Script

Finally, since we have our basic Web Server up and running, it’s time to create our Web Scraper using JavaScript and NodeJS. Write the following code in your “index.js” file after the webserver code that we wrote in the previous step. Afterward, restart your node project using the “node index.js” command.

				
					axios('https://programatically.com').then( 
    response => { 
        const html = response.data 
        const $ = cheerio.load(html) 
        var list = [] 
         
// Here I am Targeting CSS class and its Attributes to Fetch the Data.  
// You Would Make your Changes Here  
        $('.heading-title-text').each(function() { 
            const blog_title = $(this).text() 
            const blog_link = $(this).find('a').attr('href') 
            list.push( {blog_title, blog_link} ) 
        }) 
        console.log(list) 
    } 
).catch(err => console.log(err))

STEP 5: Explanation of JavaScript Code

				
					const html = response.data 
const $ = cheerio.load(html) 
var list = []

The first line is simply fetching ALL the raw HTML content from the website link that we gave Axios. The next line is where we are using cheerios to parse in the raw HTML so that we can target our specific elements jQuery style. The last line is simply creating an empty list so that all targeted fetched data can be stored in it.

				
					$('.heading-title-text').each(function() { 
    const blog_title = $(this).text() 
    const blog_link = $(this).find('a').attr('href') 
    list.push( {blog_title, blog_link} ) 
}) 
console.log(list)

The first line is where we are using cheerio syntax to target the CSS class of all the articles titles from the list of blogs on my website. The “.each” is used for performing an action for each of the traversed headings of the article. It is where I am fetching the “href” attribute of the blogs as well. I am storing both the title and href link to separate constant variables and simply pushing them in the list that we created earlier. However, I am pushing them inside “{ }” brackets to make it a list of objects. Lastly, I am simply printing out the list values in the console log.

The Big Picture

And we’re done.

Hope this article helps you guys to learn how to make a web scraper using JavaScript and NodeJS. Feel free to download and use the Web Scraper project that I have uploaded on my GitHub account. If there is any particular topic that you want me to cover just drop a message in the comment and hit the like button. Have a great one!

How to Make Web Scraper Using JavaScript and NodeJS

Daniyal Akbar

What is a Web Scraper

Tl;dr

Prerequisites

Table of Content

STEP 1: Configure a Web Scraper Project

STEP 2: Installing NPM Libraries Used in Web Scraping Project

STEP 3: Create a Basic Web Server Using NodeJS

STEP 4: Scraping All Html Script

STEP 5: Explanation of JavaScript Code

The Big Picture

JavaScript Web Scraper

Recent Posts

Recent Comments

Archives

Categories

Search

Recent Post

Understanding Mutex, Semaphores, and the Producer-Consumer Problem

Process scheduling algorithm – FIFO SJF RR

How to Implement Multithreading in C Language

Stay In Touch

Featured Videos

Follow us

How to Make Web Scraper Using JavaScript and NodeJS

What is a Web Scraper

Tl;dr

Prerequisites

Table of Content

STEP 1: Configure a Web Scraper Project

STEP 2: Installing NPM Libraries Used in Web Scraping Project

STEP 3: Create a Basic Web Server Using NodeJS

STEP 4: Scraping All Html Script

STEP 5: Explanation of JavaScript Code

The Big Picture

Recent Posts

Recent Comments

Archives

Categories

Search

Recent Post

Stay In Touch

Featured Videos

All Tags

Related Posts

Follow us