News Feed Forums General Web Scraping How to scrape search results using a DuckDuckGo proxy with JavaScript?

  • How to scrape search results using a DuckDuckGo proxy with JavaScript?

    Posted by Raza Kenya on 12/10/2024 at 9:33 am

    Scraping search results through a DuckDuckGo proxy can be a powerful way to gather information without revealing your identity. JavaScript with Puppeteer is an excellent tool for such tasks, allowing you to automate a browser and send requests through a proxy server. Start by setting up a proxy in Puppeteer to route your traffic securely. Then, navigate to the DuckDuckGo search page, perform a search query, and extract the desired data like titles, URLs, and snippets. Managing request headers and delays ensures that your scraper mimics human behavior and avoids detection.Here’s an example using Puppeteer to scrape search results through a proxy:

    const puppeteer = require('puppeteer');
    
    (async () => {
        const browser = await puppeteer.launch({
            headless: true,
            args: ['--proxy-server=http://your-proxy-server:port']
        });
        const page = await browser.newPage();
        await page.goto('https://duckduckgo.com/');
        // Perform a search query
        await page.type('input[name="q"]', 'web scraping tools');
        await page.keyboard.press('Enter');
        await page.waitForSelector('.result');
        const results = await page.evaluate(() => {
            return Array.from(document.querySelectorAll('.result')).map(result => ({
                title: result.querySelector('.result__title')?.innerText.trim(),
                link: result.querySelector('.result__url')?.href,
                snippet: result.querySelector('.result__snippet')?.innerText.trim(),
            }));
        });
        console.log(results);
        await browser.close();
    })();
    

    Using a proxy helps bypass geographic restrictions and avoid rate limiting, especially for repeated or automated searches. Managing dynamic content loading ensures you get all the results effectively. How do you handle websites with strict anti-scraping measures like DuckDuckGo?

    Lena Celsa replied 1 week, 3 days ago 2 Members · 1 Reply
  • 1 Reply
  • Lena Celsa

    Member
    02/11/2025 at 9:49 am

    Scraping DuckDuckGo search results through a proxy is a great way to gather data while maintaining anonymity. While many opt for Puppeteer (a headless browser automation tool), it can be resource-intensive. A more lightweight and efficient approach is using Python’s requests library with a proxy, combined with BeautifulSoup for parsing the HTML.
    Why Use a Proxy?
    Avoid IP blocks – DuckDuckGo may limit repeated queries from the same IP.
    Bypass geographic restrictions – Useful if you want results from different regions.
    Improve anonymity – Keeps your real IP hidden.
    A Python Approach with requests and BeautifulSoup
    Instead of using a headless browser, you can send requests directly to DuckDuckGo’s search page and parse the results. Here’s how:

    python
    Copy
    Edit
    import requests
    from bs4 import BeautifulSoup
    # Define the search query
    query = "web scraping tools"
    duckduckgo_url = f"https://html.duckduckgo.com/html/?q={query}"
    # Set up a proxy
    proxies = {
    "http": "http://your-proxy-server:port",
    "https": "http://your-proxy-server:port",
    }
    # Custom headers to mimic a real browser
    headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
    }
    # Send a request via the proxy
    response = requests.get(duckduckgo_url, headers=headers, proxies=proxies)
    # Parse the response using BeautifulSoup
    soup = BeautifulSoup(response.text, "html.parser")
    # Extract search results
    results = []
    for result in soup.select(".result"):
    title = result.select_one(".result__title")
    link = result.select_one(".result__url")
    snippet = result.select_one(".result__snippet")
    if title and link and snippet:
    results.append({
    "title": title.text.strip(),
    "link": f"https://duckduckgo.com{link.get('href')}",
    "snippet": snippet.text.strip(),
    })
    # Print extracted results
    for r in results:
    print(r)
    

    Why Use This Approach Instead of Puppeteer?
    Faster Execution – No need to load an entire browser.
    Lower Resource Usage – Uses simple HTTP requests instead of launching a Chromium instance.
    Less Detectable – Looks more like a real user than a headless browser bot.
    Handling Anti-Scraping Measures
    DuckDuckGo is relatively scraper-friendly, but for tougher sites, consider:
    Rotating User-Agents – Change headers with different browsers.
    Using Residential Proxies – More trustworthy than data center IPs.
    Introducing Random Delays – Mimic human behavior to avoid rate limiting.

    • This reply was modified 1 week, 3 days ago by  Lena Celsa.

Log in to reply.