Remain Unblocked Using The Puppeteer Extra Plugin Stealth

Do you frequently use Puppeteer for web scraping? If so, you’re likely familiar with the all-too-common challenge of getting blocked. Luckily, the Puppeteer-Extra-Plugin-Stealth can help you avoid this issue.

What is the Puppeteer Extra Stealth Plugin, how does it work, and how can you use it during your web scraping process? Find the answer to these and other common questions below.

Try Our Residential Proxies Today!

What Is Puppeteer?

learn about puppeteer extra plugin stealth

Let’s start from the beginning and explain Puppeteer.

Puppeteer is a Node.js library developed by the Chrome team at Google. It provides an API (Application Programming Interface) that can control headless browsers or full browsers.

The Puppeteer library is commonly used for web scraping because developers can use it to extract data from websites by navigating through pages and pulling information. You can also use it for other tasks, though, including the following:

  • Headless Browsing: Puppeteer allows you to run browsers in headless mode, meaning without a graphical user interface. This is useful for automated tasks and server-side operations.
  • Automation: Puppeteer enables you to interact with web pages programmatically, fill out forms, click buttons, navigate through pages, and more.
  • Screenshots and PDF Generation: Puppeteer can capture screenshots and generate PDFs of web pages, which is helpful for various purposes such as monitoring website appearance or generating reports.
  • Testing: Puppeteer is often used for automated testing of web pages. It can simulate user interactions and behavior to ensure that web applications work as expected.

Puppeteer is built on top of the Chrome DevTools Protocol, which is the underlying protocol that Chrome and Chromium-based browsers use to communicate with each other. While Puppeteer is most commonly associated with Google Chrome, it also supports other Chromium-based browsers, such as Opera and Microsoft Edge.

Common Puppeteer Web Scraping Challenges

puppeteer web scraping challenges

Puppeteer is a powerful tool for web scraping. However, there are particular challenges and considerations that users may encounter, including these:

Dynamic content

Websites often use JavaScript to load content dynamically after the initial page load. Puppeteer may need to wait for specific elements or events to ensure that the required content is present before scraping. Properly handling dynamic content is crucial to ensure complete and accurate data.

Anti-scraping measures

Some websites employ anti-scraping techniques to prevent automated access. This can include CAPTCHAs, IP blocking, or other methods. Developers using Puppeteer should be mindful of these measures and implement strategies to bypass or handle them appropriately.

Page load performance

Puppeteer’s headless browser may not render pages as quickly as a regular browser. This can impact the time it takes to scrape data, primarily if the website relies heavily on dynamic content loading. Optimizing page load performance is essential for efficient web scraping.

Handling cookies and sessions

Puppeteer doesn’t automatically persist cookies or sessions between requests. Developers need to manage cookies and sessions manually to maintain a consistent state throughout the scraping process.

Crawling and pagination

When scraping multiple pages, dealing with pagination and navigating through different sections of a website can be challenging. Developers need to implement logic for crawling through paginated content and handling different URL structures.

Website structure changes

Websites frequently undergo updates and changes in their structure, which can break existing scraping scripts. Regular maintenance is necessary to adapt to any modifications in the HTML structure, CSS classes, or other elements of the target website.

Resource intensiveness

Puppeteer can be resource-intensive, especially when running headless browsers. Users should be mindful of memory and CPU usage, especially when scraping large datasets or running multiple instances concurrently.

What Is the Puppeteer Extra Extension?

puppeteer extra extension

Puppeteer Extra is an extension of the Puppeteer library. It is essentially a modular plugin system built on top of Puppeteer, and it provides additional features and functionalities.

The primary goal of Puppeteer Extra is to make it easier to extend Puppeteer with various plugins, enabling users to add extra capabilities to their browser automation and web scraping workflows. These plugins can include additional browser automation tools, anti-detection measures, and other enhancements.

The following are some of the most well-known features that Puppeteer Extra offers:

  • Plugin Architecture: Puppeteer Extra is designed with a modular plugin system. Users can choose which plugins to use based on their specific needs, making it a flexible and customizable solution.
  • Additional Functionality: Plugins in Puppeteer Extra can provide extra features that may not be available in the core Puppeteer library. For example, there might be plugins for handling cookies, managing sessions, or automating specific interactions.
  • Anti-Detection Measures: Some plugins within Puppeteer Extra focus on making the browser automation less detectable by websites. This can include user agent spoofing, IP rotation, and other measures to mimic human-like behavior.
  • Community Contributions: Puppeteer Extra benefits from contributions from the open-source community. Users can create and share their plugins, expanding the capabilities of Puppeteer for various use cases.

What Is the Puppeteer-Extra-Plugin-Stealth?

puppeteer extra plugin stealth

Puppeteer Stealth is also known as puppeteer-extra-plugin-stealth. It is an extension built on top of Puppeteer Extra that uses different techniques to hide properties that would otherwise flag your request as a bot. As a result, Puppeteer Stealth makes it harder for websites to detect your scraping technology.

When you web scrape with a headless browser — a web browser without a graphical user interface (GUI) — you have browser-like access, but websites also get code execution access. This access means they can use tools like browser fingerprinting scripts to collect data that could identify an automated browser.

Here’s where Puppeteer Stealth comes in handy. This plug masks some of the default headless browser properties, such as headless: true, navigator.webdriver: true, and request headers, allowing you to crawl more surreptitiously.

How does Puppeteer Stealth work?

Puppeteer Stealth can mask headless browser properties with the help of extension modules, including built-in evasion modules.

Built-in evasion models are the pre-packaged plugins that powder the Puppeteer Stealth plugin. Because the base model of Puppeteer has leaks and properties that can get it flagged as a bot, Stealth is helpful as it aims to plug those leaks.

Each of Puppeteer Stealth’s built-in evasion modules is designed to plug a specific leak. Here are some examples:

  • iframe.contentWindow fixes the HEADCHR_iframe detection because it modifies window.top and window.frameElement
  • Media.codecs modifies codecs to support the same things as Chrome
  • Navigator.hardwareConcurrency changes the number of logical processors to four
  • Navigator.languages modifies the “languages” property, allowing for custom languages
  • Navigator.plugin mimics navigator.mimeTypes and navigator.plugins with functional mocks to match the standard Chrome that humans use
  • Navigator.permissions masks the “permissions” property, allowing you to pass the permissions test
  • Navigator.vendors lets you customize the navigator.vendor property
  • Navigator.webdriver masks navigator.webdriver
  • Sourceurl hides the Puppeteer script’s sourceurl attribute
  • User-agent-override modifies specific user-agent components
  • Webgl.vendor changes the Vendor/Renderer property from Google (the default for Puppeteer headless)
  • Window.outerdimensions adds the missing window.outerWidth or window.outerHeight properties

How to Use the Puppeteer-Extra-Plugin-Stealth

use the puppeteer-extra-plugin-stealth

At this point, you might be thinking that you could benefit from the Puppeteer Exra Stealth plugin. You might also be wondering how to use the puppeteer-extra-plugin-stealth as part of your web scraping strategy.

Here’s a step-by-step breakdown of how to install and use the plugin for more effective web scraping:

Step 1: Install Puppeteer Extra and the Stealth plugin

The first step, of course, is to install Puppeteer Extra and the Stealth plugin. To do this, you’ll need to launch the following command:

npm install puppeteer-extra puppeteer-extra-plugin-stealth

Step 2: Set up Puppeteer Extra and register the stealth plugin

From here, you’ll be ready to get set up and registered. Start by replacing the puppeteer import statement with the following:

import puppeteer from “puppeteer-extra”

Keep in mind that if you use Common JS, you’ll need to input the following lines of code:

const puppeteer = require(“puppeteer-extra”)

const StealthPlugin = require(“puppeteer-extra-plugin-stealth”)

Next, import StealthPlugin from puppeteer-extra-plugin-stealth by doing the following:

import StealthPlugin from “puppeteer-extra-plugin-stealth”

At this point, you can register the Stealth plugin by passing it to the puppeteer object using the use() method:

puppeteer.use(StealthPlugin())

After you’ve taken this step, you’ll have added the default evasion capabilities that the plugin supports.

Remember that the StealthPlugin() constructor accepts an optional object with a set of strings that correspond to the evasions to enable:

// enable only a few evasion techniques

puppeteer.use(StealthPlugin({

enabledEvasions: new Set([“chrome.app”, “chrome.csi”, “defaultArgs”, “navigator.plugins”])

}))

You can also use the code showcased below to dynamically remove a specific evasion strategy from the Stealth plugin:

const stealthPlugin = StealthPlugin()

puppeteer.use(stealthPlugin)

// …

// remove the “user-agent-override” evasion method

pluginStealth.enabledEvasions.delete(“user-agent-override”)

Step 3: Integrate

From here, you can put everything together using this code snippet:

import puppeteer from “puppeteer-extra”

import StealthPlugin from “puppeteer-extra-plugin-stealth”

(async () => {

// configure the stealth plugin

puppeteer.use(StealthPlugin())

// set up the browser and launch it

const browser = await puppeteer.launch()

// open a new blank page

const page = await browser.newPage()

// navigate the page to the target page

await page.goto(“https://arh.antoinevastel.com/bots/areyouheadless”)

// extract the message of the test result

const resultElement = await page.$(“#res”)

const message = await resultElement.evaluate(e => e.textContent)

// print the resulting message

console.log(`The result of the test is “%s”`, message);

// close the current browser session

await browser.close()

})()

After running this snippet, it will now print this message:

The result of the test is “You are not Chrome headless”

When you get to this point, assuming you’ve done everything correctly, a page with bot detection capabilities will no longer be able to mark your Puppeteer automated script as a bot.

How to Set Up Puppeteer Extra Stealth for Advanced Scraping

set up puppeteer extra stealth

The steps shared above will help you navigate several basic web scraping tasks. If you want to take more advanced steps, though, you’ll need to do some additional work.

Here are a few examples of how you can configure Puppeteer Extra for advanced scenarios:

AJAX requests

AJAX (Asynchronous JavaScript and XML) is a technique used in web development to create dynamic and interactive user interfaces. AJAX allows web pages to request and send data to a server asynchronously without requiring the entire page to be refreshed. This enables the development of more responsive and user-friendly web applications.

AJAX requests are typically initiated using JavaScript and can be used to fetch or send data to a server in the background. This process occurs asynchronously, meaning that the rest of the web page can continue to function while the request is being processed, and the user does not have to wait for a full page to reload.

You can use the following tools with Puppeteer Stealth to help you navigate AJAX requests:

  • page.waitForSelector(‘.ajax-loaded-element’): Puppeteer Stealth will ensure the script waits for an element’s appearance with the class ajax-loaded-element before it proceeds. This waiting period creates time for AJAX requests to complete and makes sure dynamically loaded content is ready to be extracted.
  • page.$eval(‘.ajax-loaded-element’, data => data.textContent): Once the AJAX requests are complete, Puppeteer Stealth will enable data extraction from the loaded element. The $eval method retrieves the element’s text content.
  • console.log(‘AJAX-Loaded Data:’, ajaxData): Next, it logs the extracted data to the console to be verified and analyzed.

Here’s an example of the code in action:

// Waiting for AJAX requests to complete

await page.waitForSelector(‘.ajax-loaded-element’);

// Extracting data from the loaded content

const ajaxData = await page.$eval(‘.ajax-loaded-element’, data => data.textContent);

console.log(‘AJAX-Loaded Data:’, ajaxData);

await browser.close();

Form interactions

During web scraping processes, automating form interactions allows you to navigate through protected areas and initiate search queries. You can use the following code examples with Puppeteer Stealth to make form interactions more manageable:

  • await page.type(‘input#username’, ‘your-username’): Puppeteer Stealth handles the filling of the input field for the username with the provided value.
  • await page.type(‘input#password’, ‘your-password’): Puppeteer Stealth simulates typing a provided password into the password input field.
  • await page.click(‘button#submit-button’): Puppeteer Stealth allows the script to simulate a submit button click, triggering form submission.

The code might look like this:

// Filling and submitting a form

await page.type(‘input#username’, ‘your-username’);

await page.type(‘input#password’, ‘your-password’);

await page.click(‘button#submit-button’);

await browser.close();

Navigation events

Navigation events refer to specific occurrences or actions related to navigating a web page.

In the context of web development, these events are typically associated with changes in the browser’s location or web page loading and unloading. These events allow developers to capture and respond to various stages of the navigation process, enabling them to create more dynamic and interactive web applications.

Some examples of common navigation events include a BeforeUnload Event, which is fired just before a page is unloaded, and a DOMContentLoaded Event, which is fired when an initial HTML document has been completely loaded and parsed.

This code snippet shows how Puppeteer Stealth can handle navigation events during web scraping.

// Listening for navigation events

page.on(‘navigation’, async () => {

console.log(‘Page Navigated:’, page.url());

// Additional logic after each navigation event

});

Here’s a quick breakdown of what’s happening in this code snippet:

  • page.on(‘navigation’, async () => { … }): Here, Puppeteer Stealth is allowing the script to set up an event listener to capture specific navigation events.
  • console.log(‘Page Navigated:’, page.url()): After each event, Puppeteer Stealth logs the URL, which helps understand the website structure and make decisions for future scraping actions.

Can You Use Puppeteer-Extra-Plugin-Stealth Python?

use puppeteer-extra-plugin-stealth python

Technically, you cannot use the exact Puppeteer-Extra-Plugin-Stealth in Python. Don’t give up hope yet, though.

There is an unofficial Python wrapper known as Pyppeteer, which allows you to experience similar benefits.

Pyppeteer automates a Chromium browser with code and allows Python developers to access JavaScript-rendering capabilities, interact with modern websites, and better simulate human behavior.

Pyppeteer comes with a headless browser mode, giving the full functionality of a browser but without a graphical user interface, which increases speed and saves memory.

You can also use Pyppeteer Stealth, which functions similarly to Puppeteer Stealth.

Puppeteer Extra Plugin Steath Not Working: What to Do?

It’s normal to run into some challenges when you first start using Puppeteer-Extra-Plugin-Stealth. The following are some of the most common obstacles you might face, with tips on how to troubleshoot them:

Anti-scraping mechanism detection

Say your scraping activities have been detected by anti-scraping mechanisms, resulting in restrictions or blocks. This issue could be from insufficient stealth measures.

To combat it, you can enhance your stealth by adjusting the User-Agent rotation interval. This adjustment mimics a human’s typical pace and reduces your chances of being detected.

Here’s an example of the code you can use to adjust the User-Agent rotation interval:

puppeteer.use(stealthPlugin({ userAgentRotationInterval: 5000 }));

Unsuccessful form interactions

Websites often use anti-bot measures to detect when you are using automated tools that fill out forms. If you add small delays between keystrokes, you can make these interactions seem more human-like and reduce your chances of being flagged as a bot.

To optimize form interaction code and add those minute delays between keystrokes, you can use the following code:

await page.type(‘input#username’, ‘your-username’, { delay: 50 });

await page.type(‘input#password’, ‘your-password’, { delay: 50 });

Challenges with IP blocking and CAPTCHAs

Websites sometimes block IP addresses that have been associated with scraping activities. Proxy rotation helps you avoid IP restrictions, and CAPTCHA-solving services help with handling other challenges that can arise during the scraping process.

Here’s an example of the code you can use to circumvent this problem:

const puppeteer = require(‘puppeteer-extra’);

const stealthPlugin = require(‘puppeteer-extra-plugin-stealth’);

const proxyChain = require(‘puppeteer-extra-plugin-proxy-chain’);

puppeteer.use(stealthPlugin());

puppeteer.use(proxyChain({ proxies: [‘proxy1’, ‘proxy2’] }));

Failures with page-loading

Slow-loading pages and network issues can result in page load failures. By increasing the timeout and configuring the waitUntil option, you can ensure the script allows sufficient time for successful page-loading.

You can use this code when dealing with this issue:

const page = await browser.newPage();

await page.goto(‘https://example.com’, { waitUntil: ‘domcontentloaded’, timeout: 5000 });

Issues with element selection

If your selectors are too generic or don’t wait for certain elements to be present, you can encounter element selection issues. By using specific and robust selectors, you can ensure more reliable interaction with targeted elements, which in turn reduces selection issues.

This code may help you navigate this particular challenge:

await page.waitForSelector(‘div#targetElement’, { timeout: 5000 });

const targetElement = await page.$(‘div#targetElement’);

Other Helpful Puppeteer Extra Plugins

Beyond Puppeteer-Extra-Plugin-Stealth, Puppeteer has numerous other plugins that add extra functionalities. Here are some other useful ones you might want to utilize:

  • Puppeteer-extra-plugin-recaptcha: Solves reCAPTCHAs and hCaptchas automatically. It does this by automating third-party CAPTCHA-solving services like 2Captcha.
  • Puppeteer-extra-plugin-proxy: Allows you to route requests through proxies and avoid rate limiting in web scraping. It provides better code readability and improves maintenance.
  • Puppeteer-extra-plugin-adblocker: Removes ads and trackers, reducing bandwidth and load times.
  • Puppeteer-extra-plugin-devtools: Makes browser debugging possible from anywhere by creating a secure tunnel to the DevTools.
  • Puppeteer-extra-plugin-repl: Improves debugging with an interactive REPL (Read-Eval-Print-Loop) interface, which allows you to execute Puppeteer scripts straight from the command line.
  • Puppeteer-extra-plugin-block-resources: Dynamically blocks page resources, such as images, media, CSS, and JS files.
  • Puppeteer-extra-plugin-anonymize-ua: Anonymizes the User-Agent header on-page navigation.
  • Puppeteer-extra-plugin-user-preferences: Sets custom Chrome/Chromium user preferences, including enabling geolocation.

Try Our Residential Proxies Today!

Conclusion

conclusion on puppeteer

There you have it — everything you need to know about using the Puppeteer-Extra-Plugin-Stealth to stay unblocked while web scraping and carrying out other tasks.

Follow the guidelines shared above to simplify these processes and make your web scraping more effective and efficient.

If you need extra help with web scraping, Rayobyte has got you covered. We are an award-winning proxy provider — the largest one based in the United States — with a firm commitment to reliability and ethics. We offer a variety of services, from multiple proxy services to boutique data scraping.

Start your free trial today to see what our services can do for you.

The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.

Sign Up for our Mailing List

To get exclusive deals and more information about proxies.

Start a risk-free, money-back guarantee trial today and see the Rayobyte
difference for yourself!