Build a Bing Scraper in Python to Extract Search Results
Create a Bing Scraper in Python to Extract Search Results
Bing is a significant search engine with valuable data for market analysis. In this tutorial, we’ll show you how to build a Bing scraper using Python. You’ll learn how to extract search results, including titles, URLs, and snippets, to analyze Bing’s SERP and gain insights into search performance.
Table of Content
Introduction
Why Scrape Bing?
Prerequisites
Step 1: Import the Libraries and Configure Logging
Step 2: Initialize the Browser
Step 3: Search Bing
Step 4: Scrape Search Results
Step 5: Save Results to CSV
Step 6: Putting It All Together
Expected Output
Best Practices for Scraping
Conclusion
Introduction
Bing is often overshadowed by Google but remains an essential source of information and insights for businesses and researchers alike. Scraping Bing allows users to gather data on search trends, competitor analysis, and content performance. In this guide, we will utilize Playwright, a powerful library for automating browsers, to build an efficient Bing scraper.
Why Scrape Bing?
Scraping Bing can provide numerous benefits, such as:
- Market Analysis: Understanding what competitors are doing.
- SEO Insights: Analyzing keywords and their performance.
- Content Strategy: Identifying trending topics and gaps in the market.
By extracting data from Bing’s search results pages (SERPs), businesses can make informed decisions based on real-time data.
Prerequisites
Before diving into the code, ensure you have the following tools installed:
- Python 3.8 or higher
- Node.js (required for Playwright installation)
- The following Python libraries:
- playwright
- pandas
- Logging
Install these libraries using pip:
pip install playwright pandas
Note: Initialize Playwright with the following command to download the necessary browser binaries:
playwright install
Step 1: Import the Libraries and Configure Logging
Start by importing the necessary libraries and configuring logging:
from playwright.sync_api import sync_playwright import pandas as pd import logging # Configure logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__)
This sets up logging to capture important events during the scraping process.
Step 2: Initialize the Browser
Next, create a function that initializes the Playwright browser:
def initialize_browser(): playwright = sync_playwright().start() browser = playwright.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() return playwright, browser, page
This function starts Playwright, launches a Chromium browser instance, and opens a new page for navigation.
Step 3: Search Bing
To scrape search results from Bing, you need to navigate to its search URL:
def search_bing(page, query): url = f"https://www.bing.com/search?q={query.replace(' ', '+')}" page.goto(url) page.wait_for_timeout(5000) # Wait for results to load
This function constructs the URL based on the provided query and navigates to it.
Step 4: Scrape Search Results
After performing a search, you can extract titles, URLs, and snippets from the results:
def scrape_results(page, query, max_pages=5): results = [] for current_page in range(max_pages): # Calculate the starting index for the current page start_index = current_page * 10 # Each page shows 10 results # Construct the URL for the current page url = f"https://www.bing.com/search?q={query.replace(' ', '+')}&first={start_index}" page.goto(url) page.wait_for_timeout(5000) # Wait for results to load logger.info(f"Scraping results from page {current_page + 1}...") # Extract results from the current page result_elements = page.locator("li.b_algo") for element in result_elements.all(): title = element.locator("h2").inner_text() url = element.locator("h2 a").get_attribute("href") snippet = element.locator("p").inner_text() results.append({"Title": title, "URL": url, "Snippet": snippet}) return results
This function scrapes up to five pages of search results by constructing URLs based on the first parameter that indicates which result is set to display.
Note: This scraper will scrape the data up to page 5. Modify the max_pages
to get more page.
Step 5: Save Results to CSV
Once you’ve extracted the data, save it in a structured format like CSV for further analysis:
def save_results_to_csv(results, filename="bing_results.csv"): df = pd.DataFrame(results) df.to_csv(filename, index=False) logger.info(f"Results saved to {filename}")
This function converts the scraped data into a DataFrame and saves it as a CSV file.
Step 6: Putting It All Together
Finally, tie everything together in a main function:
def main(): query = "latest technology news" # Replace with your desired search query # Initialize browser playwright, browser, page = initialize_browser() try: # Search by navigating directly to the URL of the first page search_bing(page, query) # Scrape up to 5 pages using constructed URLs results = scrape_results(page, query, max_pages=5) # Save results to CSV file save_results_to_csv(results) except Exception as e: logger.error(f"Unexpected error: {e}") finally: # Close the browser after a short delay page.wait_for_timeout(5000) browser.close() playwright.stop() if __name__ == "__main__": main()
This main function orchestrates all previous steps by initializing the browser, performing a search on Bing, scraping results across multiple pages, and saving them to a CSV file.
Expected Output
Once you run the scraper, a CSV file named bing_results.csv will be created in your working directory. Since we set max_pages=5
, you will get a total of 50 results (10 results per page across 5 pages). Here’s an example of what the contents of the CSV file might look like:
Title,URL,Snippet Tech News | Today's Latest Technology News | Reuters,https://www.reuters.com/technology/,"23 hours ago · Find latest technology news from every corner of the globe at Reuters.com, your online source for breaking international news coverage." "Technology News, Research & Innovations - SciTechDaily",https://scitechdaily.com/news/technology/,"Stay updated on the latest technology news and breakthroughs from various fields, such as artificial intelligence, robotics, green technologies, and more. SciTechDaily covers the most …" Google News - Technology - Latest,https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pWVXlnQVAB,"Find out the latest news and trends in technology, from mobile gadgets and internet to artificial intelligence and computing. Browse headlines, videos and articles from various sources and …" Tech - The Verge,https://www.theverge.com/tech,"From top companies like Google and Apple to tiny startups vying for your attention, Verge Tech has the latest in what matters in technology daily. Among other newly announced changes, a..." ,https://www.livemint.com/technology/latest-technology-news-today-on-december-13-2024-live-updates-11734048917828.html, The Latest News in Technology | PCMag,https://www.pcmag.com/news,"Get the latest technology news and in-depth analysis from the expert analysts at PCMag. Share your opinions about the software and services you use for personal finances, stock trading, paying..." ,https://www.gadgets360.com/news, ,https://apnews.com/technology, Technology | Latest News & Updates - BBC,https://www.bbc.co.uk/news/technology,"Get all the latest news, live updates and content about Technology from across the BBC." TechCrunch | Startup and Technology News,https://techcrunch.com/,"Nov 16, 2024 · TechCrunch | Reporting on the business of technology, startups, venture capital funding, and Silicon Valley" Engadget | Technology News & Reviews,https://www.engadget.com/,"2 days ago · Find the latest technology news and expert tech product reviews. Learn about the latest gadgets and consumer tech products for entertainment, gaming, lifestyle and more." 10 Breakthrough Technologies 2024 | MIT Technology …,https://www.technologyreview.com/2024/01/08/1085094/10-breakthrough-technologies-2024/,"Jan 8, 2024 · The latest iteration of a legacy. Founded at the Massachusetts Institute of Technology in 1899, MIT Technology Review is a world-renowned, independent media company whose insight, analysis ..." The Verge,https://www.theverge.com/,"The Verge is about technology and how it makes us feel. Founded in 2011, we offer our audience everything from breaking news to reviews to award-winning features and investigations, on our site ..." TechCrunch | Startup and Technology News,https://techcrunch.com/,"Nov 16, 2024 · TechCrunch | Reporting on the business of technology, startups, venture capital funding, and Silicon Valley" Technology | Latest News & Updates - BBC,https://www.bbc.co.uk/news/technology,"Get all the latest news, live updates and content about Technology from across the BBC." Latest tech news - TNW,https://thenextweb.com/latest,"Latest tech news Upvest — which powers stock trading on Revolut, N26, Bunq — secures €100M. Siôn Geschwindt; 1 minute ago; What we learned after taking part in a 100-day innovation sprint ..." TechRadar news and features | TechRadar,https://www.techradar.com/news,"17 hours ago · All of the latest technology news and features from TechRadar | TechRadar. Skip to main content. Open menu Close menu ... Sign up for breaking news, reviews, opinion, top tech deals, and more." Latest News - TechCrunch,https://techcrunch.com/latest/,"1 day ago · Latest News. Headlines Only You need to be logged in to use this feature. Load More Climate. ... Humba Ventures raises $40M fund to invest in deep tech, defense tech ..." Tech News - Latest Technology and Gadget News | Sky News,https://news.sky.com/science-climate-tech,"Sky News technology provides you with all the latest tech and gadget news, game reviews, Internet and web news across the globe. Visit us today." News - Tom's Hardware,https://www.tomshardware.com/news,"2 days ago · Read the latest computer hardware news, analysis and opinions on Tom's Hardware and get a glimpse into the future of cutting edge tech. | Tom's Hardware" Trending Tech News - TechSpot,https://www.techspot.com/trending/,"Discover the latest in display technology as we compare LCD vs. OLED, IPS vs. VA, and QD-OLED vs. WOLED. This explainer breaks down how these technologies work and which is right for you." Tech News: Latest Technology News and Updates,https://timesofindia.indiatimes.com/technology/tech-news,"1 day ago · Stay updated with the latest technology news and updates from Times Of India. Read breaking news, reviews, and in-depth analysis of gadgets and tech trends." The Latest Mobile Phone News and Analysis | PCMag,https://www.pcmag.com/news/categories/mobile-phones,"6 days ago · Latest Mobile Phone News Google Gives Android Users New Ways to Find, Stop Unwanted Trackers Users can now pause their phone's location sharing for up to 24 hours and track down the exact location ..." Google News - Technology - Latest,https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pWVXlnQVAB,"Get the latest news from around the world with Google News, a personalized news aggregator." Technology News: Get the latest Tech News with updates and …,https://www.livemint.com/technology,"Dec 6, 2024 · Technology News: Stay up-to-date with the latest in technology. Find out about launch dates and prices in India for iPhones, Samsung phones, iPads, and MacBooks" Science News | The latest news from all areas of science,https://www.sciencenews.org/,"Nov 30, 2024 · Science News was founded in 1921 as an independent, nonprofit source of accurate information on the latest news of science, medicine and technology." ScienceDaily: Your source for the latest research news,https://www.sciencedaily.com/,"2 days ago · ScienceDaily features breaking news about the latest discoveries in science, health, the environment, technology, and more -- from leading universities, scientific journals, and research ..." "Technology News: Latest Smartphones, Tech Deals Today, New …",https://indianexpress.com/section/technology/tech-news-technology/,"3 days ago · Technology News, Latest Smartphones, Tech Deals Today, New Mobile Phones Launch India: Get trending tech news, mobile phones, laptops, news, software updates, video games, internet and other technology updates on gadgets at indianexpress.com" technology latest news & coverage - CNA,https://www.channelnewsasia.com/topic/technology,"Oct 10, 2024 · Follow the latest news and comprehensive coverage on technology at CNA" AI News & Artificial Intelligence - TechCrunch,https://techcrunch.com/category/artificial-intelligence/,"Read the latest on artificial intelligence and machine learning tech, the companies that are building them, and the ethical issues AI raises today." Technology News & Reviews - Engadget,https://www.engadget.com/news/,"Find the latest technology news and expert tech product reviews. Learn about the latest gadgets and consumer tech products for entertainment, gaming, lifestyle and more." Latest Cyber Security & Tech News - Cybernews,https://cybernews.com/news/,1 day ago · Representatives of five major tech companies have been invited to meet with US President-elect Donald Trump’s team in mid December and discuss ways to curb online sales of drugs. Read more about Team Trump wants to talk online drug sales with big tech "Technology News - Latest Tech News Today, New …",https://www.ndtv.com/tech,"3 days ago · Latest Technology News and Daily Updates on Gadgets. Get trending tech news, mobile phones, laptops, reviews, software updates, video games, internet and other technology updates on gadgets from ..." Gizmodo | The Future Is Here,https://gizmodo.com/,"1 day ago · Dive into cutting-edge tech, reviews and the latest trends with the expert team at Gizmodo. Your ultimate source for all things tech. ... Get the best tech, science, and culture news in your inbox ..." "Tech News, Latest Technology, Mobiles, Laptops - Gadgets 360",https://www.gadgets360.com/,"Tech News, Latest technology news daily, new best tech gadgets reviews which include mobiles, tablets, laptops, video games. Being a tech news site we cover the latest tech news daily online from India and around the world, reviews, updates on technology today from companies like google, apple, samsung and others also new and upcoming mobiles, cameras, laptops, video …" "Gadgets | Latest gadget news, updates & reviews on TechCrunch",https://techcrunch.com/category/gadgets/,"Read the latest news, updates and reviews on the latest gadgets in tech. Coverage includes smartphones, wearables, laptops, drones and consumer electronics." How the top 10 emerging technologies of 2024 will impact the world,https://www.weforum.org/stories/2024/06/top-10-emerging-technologies-of-2024-impact-world/,"Jun 25, 2024 · The World Economic Forum's latest Top 10 Emerging Technologies of 2024 report – launched today and produced in collaboration with Frontiers ... Immersive technology for the built world and AI-driven blended reality tools could have critical parts to play in its cleaner future, helping anticipate challenges and optimize projects for delivery ..." Lowyat.NET | Technology News Malaysia,https://www.lowyat.net/,"Malaysia's leading online publication delivering breaking tech news, gadgets and mobile phone reviews, internet technologies and much more. ... LATEST NEWS. Gaming. First Trailer Of The Witcher IV Breaks Cover At The Game Awards 2024. …" Tech Xplore - Technology and Engineering news,https://techxplore.com/,"3 days ago · Google unveils latest AI model, Gemini 2.0. Google on Wednesday announced the launch of Gemini 2.0, its most advanced artificial intelligence model to date, as the world's tech giants race to take the lead in the fast developing technology." Tech | CNN Business,https://www.cnn.com/business/tech,"View the latest technology headlines, gadget and smartphone trends, and insights from tech industry leaders. ... Latest Market News . Stanley recalls more than 2.5 million travel mugs over ..." GeekWire – Breaking News in Technology & Business,https://www.geekwire.com/,"GeekWire is a fast-growing, national technology news site with strong roots in the Seattle region and a large audience of loyal, tech-savvy readers around the globe, who follow the site for ..." টেকজুম.TV | বিজ্ঞান ও প্রযুক্তির খবর,https://techzoom.tv/,"বিজ্ঞান ও প্রযুক্তির সর্বশেষ খবর, দেশ ও বিদেশের নতুন পুরাতন প্রযুক্তি পণ্য, প্রযুক্তি সম্পর্কিত তথ্য এবং সুযোগ-সুবিধা সংক্রান্ত নিউজ পেতে ভিজিট করুন ..." "Technology | Latest News, Photos & Videos | WIRED",https://www.wired.com/tag/technology/,"Find the latest Technology news from WIRED. See related science and technology articles, photos, slideshows and videos." Technology - WSJ.com,https://www.wsj.com/tech,"Read the latest Technology news covering smartphone trends, AI, insights from industry leaders and Personal Tech columnists from the Wall Street Journal." "The Latest Technology Product Reviews, News, Tips, and Deals",https://www.pcmag.com/,"PCMag is your complete guide to computers, peripherals and upgrades. We test and review tech products and services, report technology news and trends, and provide shopping advice with price ..." Tech News - Latest Technology News - CCN.com,https://www.ccn.com/news/technology/,"The most recent news about the tech industry at CCN.com. Latest news about artificial intelligence, social media, autonomous driving and more." Computing News - TechRadar,https://www.techradar.com/computing/news,"6 days ago · The MiniMate SSD offers fast, affordable storage expansion for Mac Mini users, seamlessly connecting via Thunderbolt, starting at $139.99. Get the best Black Friday deals direct to your inbox ..." "Tech News: Latest Technology News, Smartphone, Mobiles, Gadget News ...",https://www.indiatoday.in/technology/news,"India Today Technology News: Find latest technology news, new mobile launch details, upcoming smartphones, news on laptops, cameras and gadgets at India Today" China Tech News and Headlines - Caixin Global,https://www.caixinglobal.com/live/,"Dec 4, 2024 · Latest China tech news from Caixin Global. Caixin App; Newsletter; Go. Sections ... Tech Roundup: Huawei Takes a Bite of Apple’s Market Share, Nvidia Plunges Despite Big Earnings Didi invests $94 million in smart cockpit developer, Meituan shares jump 10% after results defy China consumer slump." News - Computerworld,https://www.computerworld.com/news/,"2 days ago · Discover the latest enterprise technology news and analysis at Computerworld, the trusted source for IT professionals since 1967."
This CSV file will allow you to analyze the search results easily and extract valuable insights for your research or business needs. Feel free to integrate this section into your tutorial to provide users with clear expectations regarding the output of their scraping efforts!
Best Practices for Scraping
- Add Delays Between Requests
Websites often monitor the frequency of requests to prevent scraping. Adding small delays between actions can mimic human behavior and reduce the risk of getting blocked.
- Use Proxy Rotation
For scraping a large number of reviews, especially across multiple businesses, proxy rotation is essential. It ensures your requests originate from different IPs, avoiding detection and blocking. Rayobyte offers reliable proxy services.
Proxy Rotation Setup with Rayobyte.
Below is a script to integrate proxies into the Playwright setup:
def initialize_browser(proxy_config): playwright = sync_playwright().start() # Default launch options launch_options = { "headless": False } # Add proxy configuration if provided if proxy_config: launch_options["proxy"] = proxy_config browser = playwright.chromium.launch(**launch_options) context = browser.new_context() page = context.new_page() return playwright, browser, page
Also modify the main()
function:
def main(): # Proxy configuration - can be set via environment variables or hardcoded proxy_server = os.getenv('PROXY_SERVER') proxy_port = os.getenv('PROXY_PORT') proxy_username = os.getenv('PROXY_USERNAME') proxy_password = os.getenv('PROXY_PASSWORD') # Prepare proxy configuration proxy_config = None if proxy_server and proxy_port: proxy_config = { "server": f"{proxy_server}:{proxy_port}" } # Add authentication if credentials are provided if proxy_username and proxy_password: proxy_config["username"] = proxy_username proxy_config["password"] = proxy_password query = "latest technology news" # Replace with your desired search query # Initialize browser with optional proxy playwright, browser, page = initialize_browser(proxy_config) try: # Search by navigating directly to the URL of the first page search_bing(page, query) # Scrape up to 5 pages using constructed URLs results = scrape_results(page, query, max_pages=5) # Save results to CSV file save_results_to_csv(results) except Exception as e: logger.error(f"Unexpected error: {e}") finally: # Close the browser after a short delay page.wait_for_timeout(5000) browser.close() playwright.stop() if __name__ == "__main__": main()
To include the proxy, you have two main options:
i. Set environment variables before running the script. Create a .env
file:
PROXY_SERVER=your_proxy_server PROXY_PORT=your_proxy_port PROXY_USERNAME=your_proxy_username PROXY_PASSWORD=your_proxy_password
ii. Modify the proxy_config
directly in the main()
function:
proxy_config = { "server": "proxy_server:port", "username": "your_username", "password": "your_password" }
- Respect Website Terms of Service
Before scraping, always check the website’s Terms of Service. Use the data responsibly and ensure compliance with local laws and regulations.
Conclusion
Building a Bing scraper using Python and Playwright is straightforward and effective for extracting valuable data from one of the world’s largest search engines. By following this tutorial step-by-step, you can create your own scraper tailored to your specific needs and gain insights that can drive your business decisions forward.
If you encounter any issues or have questions while following this tutorial, feel free to leave a comment below. I’d be happy to help you troubleshoot and provide additional guidance!
Happy scraping!
Responses