承接 edulazaro/larascraper 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

edulazaro/larascraper

最新稳定版本:1.1.4

Composer 安装命令:

composer require edulazaro/larascraper

包简介

Scraper for Laravel

README 文档

README

Total Downloads Latest Stable Version

Introduction

Larascrape allows you to scrape any URL using Laravel. It uses Puppeteer under the hood. Unlikely Sapatie Crawler or Browsershot, this scraper focuses on simplicity. While Spatie Crawler can leave opened many Chromium instances, filling your server memory, Larascrape starts the scraping process using Node, making sure the Chromium instance is closed before existint.

Unlikely Spatie Crawler, it supports Proxy authentication and in general is faster.

Install

Run this command via Composer:

composer require edulazaro/larascraper

Then install the required Node dependencies:

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

These packages are required for the internal Puppeteer script to run.

Please note that when you run the scraper via a scheduled task, chances are a non interactive terminal is used. Usually Node will be available, but it may not be the case when installing Node via NVM. In this scenario, check the issues section at the end.

Basic Usage

Create a scraper class (manually or via the built-in command):

php artisan make:scraper BikeScraper

This generates a file like:

namespace App\Scrapers;

use EduLazaro\Larascraper\Scraper;

class BikeScraper extends Scraper
{
    protected function handle(): array
    {
        return [
            'title' => $this->crawler->filter('title')->text('')
        ];
    }
}

You can now scrape a URL like this:

use App\Scrapers\BikeScraper;

$data = BikeScraper::scrape('https://whatever.com/bikes/4')
    ->proxy('ip:port', 'username', 'password') // Optional
    ->timeout(10000) // Optional timeout in ms
    ->headers(['Accept-Language' => 'en']) // Optional headers
    ->run();

dd($data);

You can pass parameters to the run method as long as they are handled:

namespace App\Scrapers;

use EduLazaro\Larascraper\Scraper;

class BikeScraper extends Scraper
{
    protected function handle(string $name): array
    {
        return [
            'title' => $this->crawler->filter($name)->text('')
        ];
    }
}

And then you can do:

use App\Scrapers\BikeScraper;

BikeScraper::scrape('https://whatever.com/bikes/4')->run(name: 'title');

Proxy Support

Larascraper supports proxies with or without authentication:

->proxy('200.20.14.84:40200')

Or if using authentication:

->proxy('200.20.14.84:40200', 'username', 'password')

Timeout

To add a custom timeout (20000 ms by default):

->timeout(10000) // Timeout in milliseconds

Headers

To append custom headers:

->headers([
    'Accept-Language' => 'en',
    'X-Custom-Header' => 'Hello'
])

Retry logic

You can add the number of attempts and the number of seconds to wait between attempts:

->retry(3, 5)

Retry 3 times and wait 5 seconds betwee attempts. Please note only the error codes 408, 429, 500, 502, 503 and 504 will be retried.

Artisan Commands

You can generate a scraper instance with:

php artisan make:scraper MyScraper

List all scrapers in app/Scrapers directory:

php artisan list:scrapers

Testing a scraper

You can easily test a scraper with Tinker:

php artisan tinker

And the running:

$data = \App\Scrapers\TestScraper::scrape('https://whatever.com')->run();
dd($data);

Issues

This section contains common configuration issues.

Using Node via NVM

If you use Node via NVM and you try to run the scraper via a scheduled task, chances are Node is not available. To make it available, edit your bash_profile with an editor like Vi, Vim or Nano:

nano ~/.bash_profile

Then make sure this is included at the top:

export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion

Save the file and run:

source ~/.bash_profile

Now Node will be available for non interative terminals and the scraping process should run successfully.

In general, it's not recommended the usage of NVM on production environments.

统计信息

  • 总下载量: 27
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 1
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2025-03-23

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固