Replies – Discussions – Lena Celsa

Lena Celsa

Member

02/11/2025 at 9:49 am in reply to: How to scrape search results using a DuckDuckGo proxy with JavaScript?

Scraping DuckDuckGo search results through a proxy is a great way to gather data while maintaining anonymity. While many opt for Puppeteer (a headless browser automation tool), it can be resource-intensive. A more lightweight and efficient approach is using Python’s requests library with a proxy, combined with BeautifulSoup for parsing the HTML.
Why Use a Proxy?
Avoid IP blocks – DuckDuckGo may limit repeated queries from the same IP.
Bypass geographic restrictions – Useful if you want results from different regions.
Improve anonymity – Keeps your real IP hidden.
A Python Approach with requests and BeautifulSoup
Instead of using a headless browser, you can send requests directly to DuckDuckGo’s search page and parse the results. Here’s how:

python
Copy
Edit
import requests
from bs4 import BeautifulSoup
# Define the search query
query = "web scraping tools"
duckduckgo_url = f"https://html.duckduckgo.com/html/?q={query}"
# Set up a proxy
proxies = {
"http": "http://your-proxy-server:port",
"https": "http://your-proxy-server:port",
}
# Custom headers to mimic a real browser
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
}
# Send a request via the proxy
response = requests.get(duckduckgo_url, headers=headers, proxies=proxies)
# Parse the response using BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
# Extract search results
results = []
for result in soup.select(".result"):
title = result.select_one(".result__title")
link = result.select_one(".result__url")
snippet = result.select_one(".result__snippet")
if title and link and snippet:
results.append({
"title": title.text.strip(),
"link": f"https://duckduckgo.com{link.get('href')}",
"snippet": snippet.text.strip(),
})
# Print extracted results
for r in results:
print(r)

Why Use This Approach Instead of Puppeteer?
Faster Execution – No need to load an entire browser.
Lower Resource Usage – Uses simple HTTP requests instead of launching a Chromium instance.
Less Detectable – Looks more like a real user than a headless browser bot.
Handling Anti-Scraping Measures
DuckDuckGo is relatively scraper-friendly, but for tougher sites, consider:
Rotating User-Agents – Change headers with different browsers.
Using Residential Proxies – More trustworthy than data center IPs.
Introducing Random Delays – Mimic human behavior to avoid rate limiting.

This reply was modified 1 month, 3 weeks ago by Lena Celsa.

Lena Celsa

Member

11/14/2024 at 8:04 am in reply to: Should I learn PHP or Node.js for backend development in 2024?

If you’re working with legacy systems or building a CMS, PHP might be a better choice, but Node.js is more popular for modern web applications.

Lena Celsa

Member

11/14/2024 at 8:04 am in reply to: How does the performance of Go compare to Python in web applications?

If you’re building high-traffic web applications, Go will scale better with its efficient concurrency model.

Lena Celsa

Member

11/14/2024 at 8:03 am in reply to: Why is Rust becoming so popular for systems programming?

It’s ideal for developers who need the speed of C++ but with more safety guarantees, especially in embedded systems and operating systems.

Lena Celsa

Member

11/14/2024 at 8:02 am in reply to: How does C# compare to Java for enterprise-level applications?

Both languages have excellent frameworks for building large systems, but Java’s platform independence gives it a slight edge for multi-environment deployments.

Lena Celsa

Forum Replies Created

Lena Celsa

Lena Celsa

Lena Celsa

Lena Celsa

Lena Celsa