Web Scraping in Laravel using Goutte

Web Scraping in Laravel using Goutte

The web scraping means gets HTML information from a web-page. Web Scrapping is a easy way to display information of any website to your website. There are tons of libraries to scrap data, In this blog you can learn to scrap data with goutte in laravel. Goutte is a web scraping and the web crawling library. In Goutte we can also scrap data using a particular element i.e. class, id, count, etc. Goutte is built by Symfony developer on Guzzle and Symfony components. In this blog i will show you to scrap a post by its URL `https://usingphp.com/post/get-next-and-previous-post-link-in-post-page`

First we need to Install goutte via the Composer package manager:

composer require fabpot/goutte

After successfully install the package, add the ServiceProvider and Facade to the providers & facades array in config/app.php

ServiceProvider

'Goutte' => Weidner\Goutte\GoutteFacade::class,

Facades

Weidner\Goutte\GoutteServiceProvider::class,

Create a Controller using run below command in terminal

php artisan make:controller TestController

Add Route in web.php

Route::get('test', 'TestController@index');

Add below code in TestController

<?php

namespace App\Http\Controllers;
use Goutte;
use Illuminate\Http\Request;
use Spatie\ArrayToXml\ArrayToXml;
class TestController extends Controller

{
    public function index(){
        $url = 'https://ecode-learn.com/post/get-next-and-previous-post-link-in-post-page';
        $crawler = Goutte::request('GET', $url)
        //page title is in h1 tag
//get title of the page

        echo $crawler->filter('h1')->first()->text();

        //get content of the page
        ech $crawler->filter('.single-page-content')->text();
    }
}

You can also scrap and get all the child element of the parent using each with goutte. Assume if you want to get all elements of <li> in a <ul> then you can easily get them by each loop.

<?php
$crawler->filter('ul li')->each(function ($node) { echo $node->text(); });

Same if you want to scrap all the images from any particular page, then you can use below code in your controller, it will return all image sources.

<?php
$crawler->filter('img')->each(function ($node) { if ($node->hasAttribute('src')) { return $node->getAttribute('src'); }
});

You can also scrap and get element by last and first child of a parent. I.e. if you want to get the first paragraph from a page then use below code.

<?php
$crawler->filter('p')->eq(0); //first paragraph
$crawler->filter('p')->eq(1); //second paragraph
$crawler->filter('p')->eq(2); //third paragraph
- - - - - - - -AND SO ON- - - - - - - - - - - -
$crawler->filter('p')->eq(n); //n paragraph

Features of Goutte:

  • Suitable for large projects
  • Better parsing speed
  • It is an OOP's based library
  • Simply scrap data based on HTML element

Hi, I'm Saurav, the developer behind usingphp. Donate to help me keep usingphp free and maintained.

Please let me know what your thoughts or comments are on this article. If you have any suggestion or found any mistake in this article then please let us know.

Latest Comments

Riya
Riya
20 Dec 2020

Working on first attempt. very simple and easy. Thanks

Goutam
Goutam
22 Dec 2020

Goutte is best way to scrap a website

Add your comment