{"id":1040,"date":"2024-09-21T23:11:25","date_gmt":"2024-09-21T23:11:25","guid":{"rendered":"https:\/\/rayobyte.com\/community\/?post_type=scraping_project&#038;p=1040"},"modified":"2024-10-08T17:31:22","modified_gmt":"2024-10-08T17:31:22","slug":"web-scraping-ebay","status":"publish","type":"scraping_project","link":"https:\/\/rayobyte.com\/community\/scraping-project\/web-scraping-ebay\/","title":{"rendered":"Web Scraping eBay"},"content":{"rendered":"<h1><span style=\"font-weight: 400;\">How to Build an eBay Scraper using Puppeteer, a Powerful Node.js Library<\/span><\/h1>\n<p style=\"text-align: center;\"><iframe loading=\"lazy\" title=\"YouTube video player\" src=\"https:\/\/www.youtube.com\/embed\/OHUC9rvofwo?si=GwPOWmf2wNcpWnAA\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">eBay, one of the largest online marketplaces, contains a wealth of product data, including prices, descriptions, and seller ratings. In this tutorial, we\u2019ll walk you through web scraping eBay using Python. You\u2019ll learn how to extract product listings, prices, and seller information, enabling you to compare products, track pricing trends, and gather valuable insights from eBay\u2019s vast marketplace. This guide includes the necessary source code and techniques for scraping eBay efficiently.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can find the complete source code for this project on my GitHub: <\/span><a href=\"https:\/\/github.com\/Anas12312\/eBay-Scraping\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">GitHub Repo<\/span><\/a><\/p>\n<h1><span style=\"font-weight: 400;\">Table of Content<\/span><\/h1>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#intro\">Introduction<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#what-is-puppeteer\">What is Puppeteer<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#prerequisites\">Prerequisites<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#setting\">Setting Up Puppeteer<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#navigating\">Navigating and Executing Search Query<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#extract\">Extracting Products Data<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#more-data\">Getting More Data<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#save\">Saving Data to CSV<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#wrap\">Wrapping Things Up<\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"#conclusion\">Conclusion<\/a><\/span><\/li>\n<\/ul>\n<h2 id=\"intro\"><span style=\"font-weight: 400;\">Introduction<\/span><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1043 size-full aligncenter\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/cover-4.jpg\" alt=\"\" width=\"686\" height=\"386\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/cover-4.jpg 686w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/cover-4-300x169.jpg 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/cover-4-624x351.jpg 624w\" sizes=\"auto, (max-width: 686px) 100vw, 686px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">eBay is one of the largest e-commerce platforms, offering an immense variety of products ranging from everyday essentials to rare collectibles. For businesses, researchers, and data enthusiasts, eBay\u2019s marketplace is a valuable source of information on product trends, pricing strategies, customer behavior, and more. Whether you&#8217;re interested in tracking the latest sales data, comparing prices, or analyzing product listings, having access to this data can help drive informed decisions.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, manually extracting this information is a daunting task given the volume of products listed on eBay. That\u2019s where web scraping comes into play.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this tutorial, we will show you how to build an eBay scraper using Puppeteer, a powerful Node.js library that provides a high-level API to control headless Chrome or Chromium browsers. By the end of this guide, you will have a fully functional scraper capable of automating the extraction of product data from eBay, allowing you to easily gather valuable insights.<\/span><\/p>\n<h2 id=\"what-is-puppeteer\"><span style=\"font-weight: 400;\">What is Puppeteer<\/span><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-882 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/puppeteer.png\" alt=\"\" width=\"893\" height=\"666\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/puppeteer.png 893w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/puppeteer-300x224.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/puppeteer-768x573.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/puppeteer-624x465.png 624w\" sizes=\"auto, (max-width: 893px) 100vw, 893px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Puppeteer is a popular Node.js library developed by Google that provides a high-level API to control Chrome or Chromium browsers via the DevTools Protocol. It is commonly used for web scraping, automated testing, and browser automation, allowing developers to interact with web pages programmatically.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With Puppeteer, you can automate tasks that would otherwise require a manual browser interaction, such as navigating to pages, clicking buttons, filling out forms, and capturing data from web elements. One of the key advantages of Puppeteer is its ability to run in a <\/span><b>headless mode<\/b><span style=\"font-weight: 400;\">, meaning it operates without a visible UI, which makes it faster and more resource-efficient for scraping tasks.<\/span><\/p>\n<h3><b>Key Features of Puppeteer:<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\"><b>Headless browser automation:<\/b><span style=\"font-weight: 400;\"> Perform tasks without a browser UI, improving performance.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Accurate rendering:<\/b><span style=\"font-weight: 400;\"> Scrape web pages as they appear in a real browser, ensuring you&#8217;re working with up-to-date and rendered data.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Flexible navigation control:<\/b><span style=\"font-weight: 400;\"> Simulate user interactions, handle redirects, and automate navigation through dynamic pages.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Built-in network interception:<\/b><span style=\"font-weight: 400;\"> Modify requests and responses, handle API calls, and interact with server-side data.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Puppeteer\u2019s flexibility and reliability make it an excellent choice for scraping dynamic websites like eBay, where product listings change frequently and JavaScript-driven content needs to be handled smoothly.<\/span><\/p>\n<h2 id=\"prerequisites\"><span style=\"font-weight: 400;\">Prerequisites<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Before we dive into the code, let&#8217;s make sure you have everything set up. Here&#8217;s what you&#8217;ll need:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>Node.js<\/b><span style=\"font-weight: 400;\">: Puppeteer runs on Node.js, so make sure it&#8217;s installed on your machine. You can download it from<\/span><a href=\"https:\/\/nodejs.org\/\" rel=\"nofollow noopener\" target=\"_blank\"> <span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Puppeteer<\/b><span style=\"font-weight: 400;\">: Puppeteer is a Node.js library that provides a high-level API for controlling Chrome or Chromium.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Objects-to-CSV<\/b><span style=\"font-weight: 400;\">: This library will help convert the scraped data into a CSV file.<\/span><\/li>\n<\/ul>\n<h3><b>Installing Dependencies<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">First, let\u2019s set up our project and install the required dependencies.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\"># Initialize a new Node.js project\r\nnpm init -y\r\n\r\n# Install Puppeteer and Objects-to-CSV\r\nnpm install puppeteer objects-to-csv<\/pre>\n<p><span style=\"font-weight: 400;\">Now that we have our environment ready, let\u2019s move on to setting up Puppeteer.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With Puppeteer installed, let\u2019s create the main script that will handle all our web scraping logic.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the root of your project, create a file named index.js.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1051 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-202710.png\" alt=\"\" width=\"339\" height=\"182\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-202710.png 339w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-202710-300x161.png 300w\" sizes=\"auto, (max-width: 339px) 100vw, 339px\" \/><\/p>\n<h2 id=\"setting\"><span style=\"font-weight: 400;\">Setting Up Puppeteer<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In the index.js file start by requiring the Puppeteer package we just installed:<\/span><\/p>\n<p><code class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">const puppeteer = require('puppeteer')<\/code><\/p>\n<p><span style=\"font-weight: 400;\">Next, we\u2019ll create an asynchronous function where all our scraping logic will live. This function will take a searchQuery as a parameter, which we\u2019ll use to search for items on eBay:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">async function run(searchQuery) {\r\n   \/\/ All of our logic goes here!\r\n}\r\nrun(\"Nike Air Jordan\")<\/pre>\n<p><span style=\"font-weight: 400;\">Let\u2019s bring the browser to life! We\u2019ll use Puppeteer\u2019s launch function to create a new browser instance:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">const browser = await puppeteer.launch({\r\n    headless: false \/\/ We want to see what's happening!\r\n})\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">The headless: false option keeps the browser UI visible so you can watch Puppeteer in action. It\u2019s super helpful when you\u2019re just getting started.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Next, we\u2019ll create a new page where all our browser actions will take place:<\/span><\/p>\n<p><code class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">const page = await browser.newPage()<\/code><\/p>\n<p><span style=\"font-weight: 400;\">Now your code should look something like this:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">const puppeteer = require('puppeteer')\r\nasync function run(searchQuery) {\r\n    const browser = await puppeteer.launch({\r\n        headless: false,\r\n        defaultViewport: false\r\n    })\r\n    const page = await browser.newPage()\r\n}\r\n\r\n\r\nrun(\"Nike Air Jordan\")<\/pre>\n<p><span style=\"font-weight: 400;\">You\u2019ve set up everything, let\u2019s see it in action! Run the project by typing the following command in your terminal: <code class=\"EnlighterJSRAW\" data-enlighter-language=\"powershell\">node\u00a0 index.js<\/code><\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1068 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-231548.png\" alt=\"\" width=\"1920\" height=\"1020\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-231548.png 1920w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-231548-300x159.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-231548-1024x544.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-231548-768x408.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-231548-1536x816.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-231548-624x332.png 624w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Now that we\u2019ve got our project up and running, it\u2019s time to navigate to the eBay website and start scraping some data<\/span><\/p>\n<h2 id=\"navigating\"><span style=\"font-weight: 400;\">Navigating and Executing Search Query<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">After setting up the browser, the next step is to navigate to the eBay website, perform a search on eBay, and extract relevant data like product titles, prices, and links. For this example, we\u2019ll search for &#8220;Nike Air Jordan&#8221;.<\/span><\/p>\n<p><b>Step 1: Navigate to the main website<\/b><\/p>\n<p><span style=\"font-weight: 400;\">To get there, we\u2019ll use the page.goto() function. This nifty function takes the URL of the page we want to visit and an options object that controls how we navigate. We\u2019re particularly interested in the waitUntil property, which we\u2019ll set to &#8220;domcontentloaded&#8221; to ensure that all the DOM elements are fully loaded before we proceed.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">await page.goto(\"https:\/\/www.ebay.com\/\", {\r\n    waitUntil: \"domcontentloaded\"\r\n})\r\n<\/pre>\n<p><b><i>Pro Tip:<\/i><\/b><i><span style=\"font-weight: 400;\"> Don\u2019t forget to add the await keyword before any asynchronous function calls, like page.goto(). This ensures that your code waits for the navigation to complete before moving on to the next step.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">With the page loaded, we\u2019re all set to start interacting with the eBay website and search for the products we\u2019re interested in.<\/span><\/p>\n<p><b>Step 2: Executing the search query<\/b><\/p>\n<p><span style=\"font-weight: 400;\">First, we must open our local browser and navigate to the eBay page we\u2019ve been working with. To interact with the page elements, we\u2019ll need to use Chrome\u2019s DevTools, which you can easily open by pressing F12.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1057 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203353.png\" alt=\"\" width=\"1915\" height=\"990\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203353.png 1915w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203353-300x155.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203353-1024x529.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203353-768x397.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203353-1536x794.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203353-624x323.png 624w\" sizes=\"auto, (max-width: 1915px) 100vw, 1915px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">With DevTools open, select the inspector tool (the little arrow icon at the top left of the DevTools window) and click on the search button on the page. This will highlight the element in the DOM tree.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1046 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/inspector.png\" alt=\"\" width=\"507\" height=\"215\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/inspector.png 507w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/inspector-300x127.png 300w\" sizes=\"auto, (max-width: 507px) 100vw, 507px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1062 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203518.png\" alt=\"\" width=\"1158\" height=\"264\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203518.png 1158w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203518-300x68.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203518-1024x233.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203518-768x175.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203518-624x142.png 624w\" sizes=\"auto, (max-width: 1158px) 100vw, 1158px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Now we can copy either the ID or the class name for the element we just selected<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1066 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203658.png\" alt=\"\" width=\"377\" height=\"115\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203658.png 377w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/Screenshot-2024-09-21-203658-300x92.png 300w\" sizes=\"auto, (max-width: 377px) 100vw, 377px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Now we can do this for any element we want to interact with or extract data from it<\/span><\/p>\n<p><b>Step 3: Performing a Search<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Here\u2019s how you can simulate entering a search query and clicking the search button:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">await page.type('#gh-ac', searchQuery)\r\nawait page.click('#gh-btn');\r\nawait page.waitForNavigation();<\/pre>\n<p><em><b>page.type()<\/b><span style=\"font-weight: 400;\">: Types the search query (in this case, &#8220;Nike Air Jordan&#8221;) into eBay\u2019s search bar.<\/span><\/em><\/p>\n<p><em><b>page.click()<\/b><span style=\"font-weight: 400;\">: Simulates clicking the search button.<\/span><\/em><\/p>\n<p><em><b>page.waitForNavigation()<\/b><span style=\"font-weight: 400;\">: Ensures the page fully loads after the search is performed.<\/span><\/em><\/p>\n<h2 id=\"extract\"><span style=\"font-weight: 400;\">Extracting Product Data<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">We now want to scrape data from the search results, including the product title, price, link, and additional details like the item&#8217;s location and shipping price.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">const items = await page.evaluate(() =&gt; {\r\n    return Array.from(document.querySelectorAll('.s-item')).map(item =&gt; ({\r\n        title: item.querySelector('.s-item__title')?.innerText,\r\n        price: item.querySelector('.s-item__price')?.innerText,\r\n        link: item.querySelector('.s-item__link')?.href,\r\n        image: item.querySelector('img')?.src,\r\n        status: item.querySelector('span.SECONDARY_INFO')?.innerText,\r\n        location: item.querySelector('.s-item__itemLocation')?.innerText.slice(5),\r\n        shippingPrice: item.querySelector('.s-item__shipping')?.innerText\r\n    }));\r\n});\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">We will start by evaluating the page and querying different items, using the dev tool\u2019s inspector we notice that each item listed on the page has the class name \u201c.s-item\u201d, so we start by querying all the items in the page using document.querySelectorAll()<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Then we map each item to our response object which will contain all the details we need to scrap by querying in the item itself with each class name extracted from the website\u2019s DOM<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The extracted data should look something like this:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"json\">{\r\n    title: 'Nike Air Jordan 12 Retro Neoprene Nylon Size 11 130690-004',\r\n    price: '$39.99',\r\n    link: 'https:\/\/www.ebay.com\/itm\/146043984554?_skw=Nike+Air+Jordan&amp;epid=3039805884&amp;itmmeta=01J8B67X1KF2T9X9SEYCA36PMP&amp;hash=item2200e65aaa:g:-gkAAOSwvRNiJVFH&amp;itmprp=enc%3AAQAJAAAA8HoV3kP08IDx%2BKZ9MfhVJKln8UADzlkgSZ0aU6TLWzt99BeJHA0IFazs9V%2FJYxXTuAWc%2BD7Hb%2F5qJW96HOFfRt6kV2RuFlSWxTiZPtmnnrFl2KasCUsxQ%2FP%2FVxEqaTZbFwpkTTjB9tBFrhOYNUejqJqLXWZc16hrE3dmgoF%2F8HI9hHLMdIsxSLH7C9%2BBDPiuE7sJ8%2FBm6cGz1hhI0sxGKf4%2BcojTVtkMemnYFMI73q9RrYmCtPlOP6iRHMk9G4zJVbUzkgzDqKPZc%2BOhp86K4h4ahNVJcPCLVhQFayCTcKLWfow4u5sDXlT7l54M6p1dVw%3D%3D%7Ctkp%3ABk9SR4bRn-bCZA',\r\n    image: 'https:\/\/i.ebayimg.com\/images\/g\/-gkAAOSwvRNiJVFH\/s-l140.webp',\r\n    status: 'Pre-Owned',\r\n    location: 'United States',\r\n    shippingPrice: '+$89.85 shipping'\r\n  }\r\n<\/pre>\n<p><b><i>document.querySelectorAll<\/i><\/b><i><span style=\"font-weight: 400;\">: Targets all elements with the class <\/span><\/i><i><span style=\"font-weight: 400;\">.s-item<\/span><\/i><i><span style=\"font-weight: 400;\"> (which are the product listings).<\/span><\/i><\/p>\n<p><b><i>map<\/i><\/b><i><span style=\"font-weight: 400;\">: Iterates each item and extracts the desired information like the title, price, and link.<\/span><\/i><\/p>\n<h2 id=\"more-data\"><span style=\"font-weight: 400;\">Getting More Data<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">When scraping eBay, the default search results page displays only a limited number of products typically 60 per page. However, you can collect more data by implementing two strategies:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><b>Scraping Multiple Pages<\/b><span style=\"font-weight: 400;\">: By navigating through multiple pages of search results, you can gather product information across different listings, significantly increasing the dataset for your analysis.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Changing the Number of Products per Page<\/b><span style=\"font-weight: 400;\">: eBay allows users to change the number of items displayed per page. By adjusting the query parameters in the URL, you can increase the number of products listed on each page, enabling you to scrape more data in a single request. In our case, we modify the query parameter <\/span><span style=\"font-weight: 400;\">_ipg<\/span><span style=\"font-weight: 400;\"> to display up to 240 items per page.<\/span><\/li>\n<\/ol>\n<h3><b>Understanding Query Parameters<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Query parameters are key-value pairs that are appended to the URL after a question mark (<\/span><span style=\"font-weight: 400;\">?<\/span><span style=\"font-weight: 400;\">). They allow websites to customize the content displayed based on user input, such as search terms, filters, and pagination. For example, when navigating eBay search results, query parameters like <\/span><span style=\"font-weight: 400;\">_ipg<\/span><span style=\"font-weight: 400;\"> control the number of items per page, and <\/span><span style=\"font-weight: 400;\">_pgn<\/span><span style=\"font-weight: 400;\"> controls which page of the results is displayed.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In our eBay scraper, we take advantage of these parameters to:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>_ipg<\/b><span style=\"font-weight: 400;\">: Specify how many products to display per page (e.g., 60, 120, or 240).<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>_pgn<\/b><span style=\"font-weight: 400;\">: Move through different pages of search results by incrementing the page number.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">By combining these techniques scraping multiple pages and increasing the items per page you can significantly enhance the volume of data collected from eBay.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let\u2019s get to work, firstly we need to move the extracting logic into a separate function so we can use it for multiple pages, this function will the page object and the pageNumber as parameters<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">async function extractItemsPerPage(page, pageNumber) {\r\n   const items = await page.evaluate(() =&gt; {\r\n        return Array.from(document.querySelectorAll('.s-item')).map(item =&gt; ({\r\n            title: item.querySelector('.s-item__title')?.innerText,\r\n            price: item.querySelector('.s-item__price')?.innerText,\r\n            link: item.querySelector('.s-item__link')?.href,\r\n            image: item.querySelector('img')?.src,\r\n            status: item.querySelector('span.SECONDARY_INFO')?.innerText,\r\n            location: item.querySelector('.s-item__itemLocation')?.innerText.slice(5),\r\n            shippingPrice: item.querySelector('.s-item__shipping')?.innerText\r\n        }));\r\n    });\r\n    return items\r\n}\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">Now let&#8217;s manipulate the URL to change the query parameters, and navigate to the new URL<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">let currentUrl = page.url();\r\n\r\nlet newUrl = new URL(currentUrl);\r\n\r\nnewUrl.searchParams.set('_ipg', '240');\r\nnewUrl.searchParams.set('_pgn', pageNumber);\r\n\r\nawait page.goto(newUrl.toString(), {\r\n   waitUntil: \"domcontentloaded\"\r\n})\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">Finally, the function should look like this:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">async function extractItemsPerPage(page, pageNumber) {\r\n    let currentUrl = page.url();\r\n\r\n    let newUrl = new URL(currentUrl);\r\n\r\n    newUrl.searchParams.set('_ipg', '240');\r\n    newUrl.searchParams.set('_pgn', pageNumber);\r\n\r\n    await page.goto(newUrl.toString(), {\r\n        waitUntil: \"domcontentloaded\"\r\n    })\r\n\r\n    const items = await page.evaluate(() =&gt; {\r\n        return Array.from(document.querySelectorAll('.s-item')).map(item =&gt; ({\r\n            title: item.querySelector('.s-item__title')?.innerText,\r\n            price: item.querySelector('.s-item__price')?.innerText,\r\n            link: item.querySelector('.s-item__link')?.href,\r\n            image: item.querySelector('img')?.src,\r\n            status: item.querySelector('span.SECONDARY_INFO')?.innerText,\r\n            location: item.querySelector('.s-item__itemLocation')?.innerText.slice(5),\r\n            shippingPrice: item.querySelector('.s-item__shipping')?.innerText\r\n        }));\r\n    });\r\n    return items\r\n}\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">Now we can call this function multiple times with different page values in the run function, and concat all results together in one array<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">let items = []\r\n\r\ntry {\r\n    const page1_items = await extractItemsPerPage(page, 1)\r\n    items = items.concat(page1_items)\r\n} catch (e) {\r\n    console.error(\"error extracting data page:\", 1)\r\n}\r\n\r\ntry {\r\n    const page2_items = await extractItemsPerPage(page, 2)\r\n    items = items.concat(page2_items)\r\n} catch (e) {\r\n    console.error(\"error extracting data page:\", 2)\r\n}\r\n\r\ntry {\r\n    const page3_items = await extractItemsPerPage(page, 3)\r\n    items = items.concat(page3_items)\r\n} catch (e) {\r\n    console.error(\"error extracting data page:\", 3)\r\n}\r\n<\/pre>\n<p><i><span style=\"font-weight: 400;\">Tip: Wrap your function calls with a try-catch to avoid the program crashing upon error<\/span><\/i><\/p>\n<h2 id=\"save\"><span style=\"font-weight: 400;\">Save Data to CSV<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">After scraping the data, we\u2019ll store it in a CSV file for easy access and analysis. For this, we\u2019ll use the <\/span><b>objects-to-csv<\/b><span style=\"font-weight: 400;\"> npm library.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We will need to require it at the top of the file and create a function that takes an array as input and saves it to a CSV file called data.csv<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">const ObjectsToCSV = require('objects-to-csv')\r\n\r\nasync function saveToCSV(items) {\r\n    const csv = new ObjectsToCSV(items);\r\n    await csv.toDisk('.\/data.csv', {\r\n        allColumns: true\r\n    });\r\n}\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">Now the data should be saved into a CSV file in the project root directory<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1044 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csv-1.png\" alt=\"\" width=\"335\" height=\"243\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csv-1.png 335w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csv-1-300x218.png 300w\" sizes=\"auto, (max-width: 335px) 100vw, 335px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1070 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csvss.png\" alt=\"\" width=\"1147\" height=\"212\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csvss.png 1147w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csvss-300x55.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csvss-1024x189.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csvss-768x142.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/09\/csvss-624x115.png 624w\" sizes=\"auto, (max-width: 1147px) 100vw, 1147px\" \/><\/p>\n<h2 id=\"wrap\"><span style=\"font-weight: 400;\">Wrapping Things Up<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The last thing we need to do to wrap things up we would need to close the browser after the scraping is done<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">await browser.close()<\/pre>\n<p><em><span style=\"font-weight: 400;\">Tip: after the project is done and tested now we can switch the headless mode to true to be faster and avoid any issues or bugs<\/span><\/em><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"js\">const browser = await puppeteer.launch({\r\n   headless: true,\r\n   defaultViewport: false\r\n})\r\n<\/pre>\n<h2 id=\"conclusion\"><span style=\"font-weight: 400;\">Conclusion<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Congratulations! You\u2019ve just built a web scraper that extracts product information from eBay using Puppeteer. By following this tutorial, you\u2019ve learned how to:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Set up Puppeteer to navigate websites<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Extract data from eBay product listings<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Handle pagination for multiple pages<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Save the scraped data to a CSV file for future use<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">You can find the complete source code on my GitHub: <\/span><a href=\"https:\/\/github.com\/Anas12312\/eBay-Scraping\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">GitHub Repo<\/span><\/a><\/p>\n<p>You can also watch the full tutorial on Youtube <a href=\"https:\/\/youtu.be\/OHUC9rvofwo\" rel=\"nofollow noopener\" target=\"_blank\">here<\/a><\/p>\n<p><span style=\"font-weight: 400;\">Feel free to experiment with different search queries and extend this project further by adding more features or extracting additional details.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to Build an eBay Scraper using Puppeteer, a Powerful Node.js Library eBay, one of the largest online marketplaces, contains a wealth of product data,&hellip;<\/p>\n","protected":false},"author":24,"featured_media":1191,"comment_status":"open","ping_status":"closed","template":"","meta":{"rank_math_lock_modified_date":false},"categories":[],"class_list":["post-1040","scraping_project","type-scraping_project","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project\/1040","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project"}],"about":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/types\/scraping_project"}],"author":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/users\/24"}],"replies":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/comments?post=1040"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media\/1191"}],"wp:attachment":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media?parent=1040"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/categories?post=1040"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}