Web Scraping With Laravel Language
The Laravel language offers many benefits to users, and for those who have experience with it, it’s easier to use than you may realize for complicated tasks like web scraping. Laravel can offer numerous benefits, including its many powerful libraries. If you do not have much experience with this programming language, or you want to ensure you are making the most out of web scraping with Laravel, this step-by-step guide will help you.
What Is Laravel Language?
The first step is to understand what Laravel language is and why it is used. The creator calls it a PHP framework for “web artisans,” and that is what it is really like. It is an elegant and expressive syntax programming language that gives you a great deal of versatility in its application and use. The code is clean and simplistic. At the same time, the versatility of the functionality is excellent. For many developers, it is the perfect combination of performance and expressiveness for writing code.
What is Laravel used for then? It is a PHP-based back-end framework for web application development. It can be used for a variety of tasks, including developing web applications and various components of the web pages you desire with ease. You can use it for any type of PHP project.
In this guide, we will focus on using the language Laravel provides specifically for web scraping. Web scraping is the process of capturing detailed information from various websites with ease. You can use web scraping to monitor prices, capture inventory, watch your competitors, and engage in various other tasks.
We have provided numerous guides in the past for web scraping, including web scraping with Scrapy, and provided a full list of tools for web scraping (the best of 2024 is a not-to-miss list). Now, let’s take a closer look at Laravel language and how it can be applied to web scraping with both ease and numerous benefits.
How to Use Laravel Language for Web Scraping
Laravel is an excellent PHP framework for its expressive syntax. It lets you create APIs for scraping data on the web and does not take very long to use. To achieve this, Laravel relies on numerous libraries. Libraries are tools that help to make it easier for you to capture information and create code quickly.
Laravel is easy to integrate into other tools you may already be using, and you can scale it with ease. There’s also a good amount of community support available in situations where you have very specific concerns. If you are creating complex or large web scraping projects, the MVC architecture that is present in this language is also very helpful.
To achieve these goals and use Laravel web application development for web scraping, you need to use libraries. There are several that are excellent and worth taking a closer look at. Once you complete the Laravel download, make sure you gain access to the following libraries:
- BrowserKit: This library simulates the API of a web browser interacting with HTML documents. It uses DomCrawler to navigate and scrape HTML documents. If you are going to extract data from static pages using PHP, this is a library you need.
- HTTPClient: Another option to send HTTP requests is this one, and it is rather easy to integrate with BrowserKit (making them both ideal).
- Guzzle: This is another HTTP client that can send web requests to servers and handle them responsibly. If you are using HTML documents with web pages, you need this one.
- Panther: The final library to utilize is for dynamic sites, like those that need JavaScript rendering. This particular platform is a headless browser perfect for web scraping.
It is a good idea to use all of these in your Laravel development tasks, and we recommend them because of their functionality. However, there are many others out there. Once you have PHP 8+ and Composer, and we also recommend Visual Studio Code with the PHP extension, you can follow these steps to learn how to run old Laravel project data.
Step-by-Step Web Scraping API in Laravel Language
In this project, we will be setting up a web scraping tool to pull quotes, and it will be the Quotes scraping sandbox site. The goal will be to select a quote HTML element from the page, then extract the data from it, and return the scraped data in JSON.
How do we do that? Follow these steps.
#1: Set Up a Laravel Project: To set up a Laravel project, you need to open the terminal. Launch the Composer create-command to initialize your Laravel web scraping app:
composer create-project laravel/laravel laravel-scraper
Once you do this, you have a blank Laravel project. Then, load it to Visual Studio Code or your other PHP IDE. When you do this, you will have a file structure outlined, and your project will be in place.
#2: Initialize your scraping API: Now, you need to launch the Artisan command. This will add a new Laravel controller to your project:
php artisan make:controller HelloWorldController
This will then create the following ScrapingController.php file within the /app/Http/Controllers directory:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
class ScrapingController extends Controller
{
//
}
Next, add the following scrapQuote() method in the ScrapingController file:
public function scrapeQuotes(): JsonResponse
{
// scraping logic…
return response()->json(‘Hello, World!’);
}
You will also need to add the following import:
use Illuminate\Http\JsonResponse;
If you have done everything right to this point (don’t worry – mistakes happen often), you should get the placeholder “Hello, World!” as a response. This should be the JSON message you receive. That may not seem like anything, but it is exactly what you need to get this process started. You can then add some scraping logic in Laravel to achieve the goals you want.
Then associate the scrapQuote() method to a dedicated endpoint by adding the following lines to routes/api.php:
use App\Http\Controllers\ScrapingController;
Route::get(‘/v1/scraping/scrape-quotes’, [ScrapingController::class, ‘scrapeQuotes’]);
This is the point where you need to test to make sure the process works for you. One thing to remember is that the Laravel APIs are found under the /api path. The complete API endpoint is:
/api/v1/scraping/scrape-quotes.
To verify that the Laravel scraping API is working, you will need to follow these steps. First, launch your Laravel application using this command:
php artisan serve
That means your server is not listening to your local on port 8000. Use the cURL to make a GET request to the /api/v1/scraping/scrape-quotes endpoint:
curl -X GET ‘http://localhost:8000/api/v1/scraping/scrape-quotes’
If done well, this should give you the “Hello, World!” result. That means that the API for web scraping is working the way you need it to. The next step is to provide direction and define the scraping logic using Laravel language.
Step 3: Install your scraping libraries. In this step, you will need to choose the libraries that fit your specific project. We provided some examples above, but there are many options to choose from, which is one of the best factors about Laravel development.
Before you go further, let’s explore the actual process of finding out which web scraping libraries are the right ones based on your project objectives.
Start by going to a target site you want to scrape. When you open the site, right-click on the page. Click “Inspect.” When you do, this will open the Developer Tools. Move through that page to the “Network” tab. Then, reload the page and access the “Fetch/XHR” section.
When you do this, if the webpage does not have any AJAX requests, it is a static page. You can then find all of the information and data you want within the HTML documents. Keep in mind that other websites will require a headless browser library instead. If you are planning to obtain content from any type of dynamic page, this will be critical. As noted, though, there are other libraries that can work well for this.
In this situation, we are going to use BrowserKit and HttpClient for the libraries in place. So, to do that, we need to add these to your project. You can do that with this code:
composer require symfony/browser-kit symfony/http-client
Step 4: Download the target page. Now that we’re set up and ready to go, the next part is where things start to really come together for you. We need to import BrowserKit and HttpClient in ScrapingController. To do that, use this code:
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
Next, use scrapeQuotes() and then initialize a new HttpBrowser object by using the following code:
$browser = new HttpBrowser(HttpClient::create());
By utilizing that code, you can then make HTTP requests by using what seems like normal browser behavior. This library provides features that are like a browser. However, it does not actually execute requests.
Now, we need to use the request() method to perform an HTTP GET request. This is done using the URL of the target page – remember to update this code to reflect the page you are targeting:
$crawler = $browser->request(‘GET’, ‘https://quotes.toscrape.com/’);
Once done, this is a Crawler object. It will parse the HTML document returned by the server. It also provides node selection and data extraction abilities. You can use the following command to extract the HTML of the page from the crawler:
$html = $crawler->outerHtml();
At this point, the scrapQuotes() function will look like this code if you have done everything that we have recommended thus far:
public function scrapeQuotes(): JsonResponse
{
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// download and parse the HTML of the target page
$crawler = $browser->request(‘GET’, ‘https://quotes.toscrape.com/’);
// get the page outer HTML and return it
$html = $crawler->outerHtml();
return response()->json($html);
}
Go ahead and pop that in. See what comes out. To make sure every detail is just right, review what the API should return to you:
<!DOCTYPE html>
<html lang=”en”>
<head>
<meta charset=”UTF-8″>
<title>Quotes to Scrape</title>
<link rel=”stylesheet” href=”/static/bootstrap.min.css”>
<link rel=”stylesheet” href=”/static/main.css”>
</head>
<!– omitted for brevity … –>
Step 5: Inspect the page content. Now that you have done the hard part, it is time to inspect the page content. You will need to first define the data extract logic. To do that, examine the HTML structure of the target page. You will need to open Quotes To Scrape on your browser. Right-click on the quote HTML element and then click on “Inspect.”
In the information and code that pops up, we are looking for several things. The quote card is a .quote HTML node. It will contain .text elements with the quote text. It should also contain a .author node with the name of the person who provided the quote. It should also contain .tag elements, which will display a single tag.
Now that you have completed all of these steps, you have the basics of what you need to use Laravel language to use web scraping to your advantage. In the following steps, we provide the specifics for extracting data from the DOM elements of interest. We will continue to seek out quotes online, but remember that you can change your target to fit any need.
Step 6: Set up web scraping. Let’s assume that the page you are targeting has more than one quote. You want to capture them all. To do that, you need to create a data structure that will store the scraped data for you. For example, use:
quotes = []
You can then use the filter() method from the Crawler class to select the quotes that you desire. Use this:
$quote_html_elements = $crawler->filter(‘.quote’);
This returns all of the DOM nodes on the page to match the specified .quote CSS selector. Iterate over them next and then apply the data extraction logic on each. Use this:
for each ($quote_html_elements as $quote_html_element) {
// create a new quote crawler
$quote_crawler = new Crawler($quote_html_element);
// scraping logic…
}
One of the key factors to consider here is that the DOMNode objects returned by the filter() method don’t offer methods for node selection. That means you must build a local Crawler instance limited to your specific HTML quote element. In other words, for the code that we have provided thus far to result in the desired information, you will need to add the following import:
use Symfony\Component\DomCrawler\Crawler;
Step 7: Implement the data scraping. Now, for the inside of each loop, we need to do the following. Start with extracting the data you desire from the .text, .author, and .tag elements created previously. Then, populate a new $quote object with them and add the new $quote object to the $quotes.
Here’s how this breaks down for you to use:
Choose the .text element in the HTML quote element. Use the text() method to extract the inner text using this code:
$text_html_element = $quote_crawler->filter(‘.text’);
$raw_text = $text_html_element->text();
Each of the quotes is enclosed by the \u201c and \u201d special characters. It is possible to remove them with this code:
$text = str_replace([“\u{201c}”, “\u{201d}”], ”, $raw_text);
You can do the same thing with other information you need, such as capturing the author code as well as the tags, though tags are more complex since some of your quotes may have more than one tag. For tags, use the following code:
$tag_html_elements = $quote_crawler->filter(‘.tag’);
$tags = [];
foreach ($tag_html_elements as $tag_html_element) {
$tag = $tag_html_element->textContent;
$tags[] = $tag;
}
Now, if you have done everything thus far using our target site, you should have the Laravel language blow in hand and pretty much ready for your project:
// create a new quote crawler
$quote_crawler = new Crawler($quote_html_element);
// perform the data extraction logic
$text_html_element = $quote_crawler->filter(‘.text’);
$raw_text = $text_html_element->text();
// remove special characters from the raw text information
$text = str_replace([“\u{201c}”, “\u{201d}”], ”, $raw_text);
$author_html_element = $quote_crawler->filter(‘.author’);
$author = $author_html_element->text();
$tag_html_elements = $quote_crawler->filter(‘.tag’);
$tags = [];
foreach ($tag_html_elements as $tag_html_element) {
$tag = $tag_html_element->textContent;
$tags[] = $tag;
}
Step 8: Return the data. Now that we are getting the data picked up, we need to return the data to the location where we wish to share it and store it for later use. To do that, you can create a $quote object using the scraped data and then add it to $quotes. To do that, input the following in this project:
$quote = [
‘text’ => $text,
‘author’ => $author,
‘tags’ => $tags
];
$quotes[] = $quote;
When you do that, you will then need to update the API response data. Do this to include the $quotes list using the following:
return response()->json([‘quotes’ => $quotes]);
Alright – are you ready to check your work with the language Laravel project? If you have done all of this right, the end scraping look $quotes will continue the following code:
array(10) {
[0]=>
array(3) {
[“text”]=>
string(113) “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
[“author”]=>
string(15) “Albert Einstein”
[“tags”]=>
array(4) {
[0]=>
string(6) “change”
[1]=>
string(13) “deep-thoughts”
[2]=>
string(8) “thinking”
[3]=>
string(5) “world”
}
}
// omitted for brevity…
[9]=>
array(3) {
[“text”]=>
string(48) “A day without sunshine is like, you know, night.”
[“author”]=>
string(12) “Steve Martin”
[“tags”]=>
array(3) {
[0]=>
string(5) “humor”
[1]=>
string(7) “obvious”
[2]=>
string(6) “simile”
}
}
}
When you do this, the data will then be placed into JSOn. It will then return by the Laravel scraping API you created.
Step 9: The final steps. Now that you have a great deal of insight (we recommend trying out this method using this model site first and then upgrading to the targets you have), your code for the ScrapingController using Laravel Language should have the following:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use Illuminate\Http\JsonResponse;
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
use Symfony\Component\DomCrawler\Crawler;
class ScrapingController extends Controller
{
public function scrapeQuotes(): JsonResponse
{
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// download and parse the HTML of the target page
$crawler = $browser->request(‘GET’, ‘https://quotes.toscrape.com/’);
// where to store the scraped data
$quotes = [];
// select all quote HTML elements on the page
$quote_html_elements = $crawler->filter(‘.quote’);
// iterate over each quote HTML element and apply
// the scraping logic
foreach ($quote_html_elements as $quote_html_element) {
// create a new quote crawler
$quote_crawler = new Crawler($quote_html_element);
// perform the data extraction logic
$text_html_element = $quote_crawler->filter(‘.text’);
$raw_text = $text_html_element->text();
// remove special characters from the raw text information
$text = str_replace([“\u{201c}”, “\u{201d}”], ”, $raw_text);
$author_html_element = $quote_crawler->filter(‘.author’);
$author = $author_html_element->text();
$tag_html_elements = $quote_crawler->filter(‘.tag’);
$tags = [];
foreach ($tag_html_elements as $tag_html_element) {
$tag = $tag_html_element->textContent;
$tags[] = $tag;
}
// create a new quote object
// with the scraped data
$quote = [
‘text’ => $text,
‘author’ => $author,
‘tags’ => $tags
];
// add the quote object to the quotes array
$quotes[] = $quote;
}
var_dump($quotes);
return response()->json([‘quotes’ => $quotes]);
}
}
At this point, you can go ahead and test the process. Give it a try now and find out what shows up.
Now that you know how to run an old Laravel project customize it to fit your objectives and goals. What you will find is that Laravel is front-end or back-end efficient, meaning it can work seamlessly to accomplish your goals. It is always a good idea to create a project that you’ve built from scratch so you can have a good idea of how well you have learned the language.
If you want to learn more about the Laravel language, there are numerous tutorials available at Laravel.com, and there is a strong support system available, too. This Laravel basic task list is a good place to start.
You Can Do a Great Deal with Laravel Language and Rayobyte
You can add in web crawling, schedule your web scraping tasks, and integrate a proxy to use with Laravel. When it comes to building a strong web scraping tool, Laravel works well.
One of the benefits of web scraping is capturing data from a wide range of sources that can influence the decisions you make. However, today’s target websites are very good at spotting your API and limiting your access to the information you need. That is where Rayobyte can help you.
You can integrate our residential and data center proxies to help you navigate around some of the most challenging of limitations – the anti-bot tools on many target websites. Check out how Rayobyte is helping organizations get better data faster and get around everything from CAPTCHAs to location blocks.
The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.