

Lena Celsa
Forum Replies Created
-
Lena Celsa
Member02/11/2025 at 9:49 am in reply to: How to scrape search results using a DuckDuckGo proxy with JavaScript?Scraping DuckDuckGo search results through a proxy is a great way to gather data while maintaining anonymity. While many opt for Puppeteer (a headless browser automation tool), it can be resource-intensive. A more lightweight and efficient approach is using Python’s requests library with a proxy, combined with BeautifulSoup for parsing the HTML.
Why Use a Proxy?
Avoid IP blocks – DuckDuckGo may limit repeated queries from the same IP.
Bypass geographic restrictions – Useful if you want results from different regions.
Improve anonymity – Keeps your real IP hidden.
A Python Approach with requests and BeautifulSoup
Instead of using a headless browser, you can send requests directly to DuckDuckGo’s search page and parse the results. Here’s how:python Copy Edit import requests from bs4 import BeautifulSoup # Define the search query query = "web scraping tools" duckduckgo_url = f"https://html.duckduckgo.com/html/?q={query}" # Set up a proxy proxies = { "http": "http://your-proxy-server:port", "https": "http://your-proxy-server:port", } # Custom headers to mimic a real browser headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" } # Send a request via the proxy response = requests.get(duckduckgo_url, headers=headers, proxies=proxies) # Parse the response using BeautifulSoup soup = BeautifulSoup(response.text, "html.parser") # Extract search results results = [] for result in soup.select(".result"): title = result.select_one(".result__title") link = result.select_one(".result__url") snippet = result.select_one(".result__snippet") if title and link and snippet: results.append({ "title": title.text.strip(), "link": f"https://duckduckgo.com{link.get('href')}", "snippet": snippet.text.strip(), }) # Print extracted results for r in results: print(r)
Why Use This Approach Instead of Puppeteer?
Faster Execution – No need to load an entire browser.
Lower Resource Usage – Uses simple HTTP requests instead of launching a Chromium instance.
Less Detectable – Looks more like a real user than a headless browser bot.
Handling Anti-Scraping Measures
DuckDuckGo is relatively scraper-friendly, but for tougher sites, consider:
Rotating User-Agents – Change headers with different browsers.
Using Residential Proxies – More trustworthy than data center IPs.
Introducing Random Delays – Mimic human behavior to avoid rate limiting.-
This reply was modified 1 week, 4 days ago by
Lena Celsa.
-
This reply was modified 1 week, 4 days ago by
-
Lena Celsa
Member11/14/2024 at 8:04 am in reply to: Should I learn PHP or Node.js for backend development in 2024?If you’re working with legacy systems or building a CMS, PHP might be a better choice, but Node.js is more popular for modern web applications.
-
Lena Celsa
Member11/14/2024 at 8:04 am in reply to: How does the performance of Go compare to Python in web applications?If you’re building high-traffic web applications, Go will scale better with its efficient concurrency model.
-
Lena Celsa
Member11/14/2024 at 8:03 am in reply to: Why is Rust becoming so popular for systems programming?It’s ideal for developers who need the speed of C++ but with more safety guarantees, especially in embedded systems and operating systems.
-
Lena Celsa
Member11/14/2024 at 8:02 am in reply to: How does C# compare to Java for enterprise-level applications?Both languages have excellent frameworks for building large systems, but Java’s platform independence gives it a slight edge for multi-environment deployments.