定制 dmoraschi/sitemap-common 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

dmoraschi/sitemap-common

Composer 安装命令:

composer require dmoraschi/sitemap-common

包简介

Sitemap generator and crawler library

README 文档

README

Build Status Scrutinizer Quality Score

This package provides all of the components to crawl a website and build and write sitemaps file.

Example of console application using the library: dmoraschi/sitemap-app

Installation

Run the following command and provide the latest stable version (e.g v1.0.0):

composer require dmoraschi/sitemap-common

or add the following to your composer.json file :

"dmoraschi/sitemap-common": "1.0.*"

SiteMapGenerator

Basic usage

$generator = new SiteMapGenerator(
    new FileWriter($outputFileName),
    new XmlTemplate()
);

Add a URL:

$generator->addUrl($url, $frequency, $priority);

Add a single SiteMapUrl object or array:

$siteMapUrl = new SiteMapUrl(
    new Url($url), $frequency, $priority
);

$generator->addSiteMapUrl($siteMapUrl);

$generator->addSiteMapUrls([
    $siteMapUrl, $siteMapUrl2
]);

Set the URLs of the sitemap via SiteMapUrlCollection:

$siteMapUrl = new SiteMapUrl(
    new Url($url), $frequency, $priority
);

$collection = new SiteMapUrlCollection([
    $siteMapUrl, $siteMapUrl2
]);

$generator->setCollection($collection);

Generate the sitemap:

$generator->execute();

Crawler

Basic usage

$crawler = new Crawler(
    new Url($baseUrl),
    new RegexBasedLinkParser(),
    new HttpClient()
);

You can tell the Crawler not to visit certain url's by adding policies. Below the default policies provided by the library:

$crawler->setPolicies([
    'host' => new SameHostPolicy($baseUrl),
    'url'  => new UniqueUrlPolicy(),
    'ext'  => new ValidExtensionPolicy(),
]);
// or
$crawler->setPolicy('host', new SameHostPolicy($baseUrl));

SameHostPolicy, UniqueUrlPolicy, ValidExtensionPolicy are provided with the library, you can define your own policies by implementing the interface Policy.

Calling the function crawl the object will start from the base url in the contructor and crawl all the web pages with the specified depth passed as a argument. The function will return with the array of all unique visited Url's:

$urls = $crawler->crawl($deep);

You can also instruct the Crawler to collect custom data while visiting the web pages by adding Collector's to the main object:

$crawler->setCollectors([
    'images' => new ImageCollector()
]);
// or
$crawler->setCollector('images', new ImageCollector());

And then retrive the collected data:

$crawler->crawl($deep);

$imageCollector = $crawler->getCollector('images');
$data = $imageCollector->getCollectedData();

ImageCollector is provided by the library, you can define your own collector by implementing the interface Collector.

统计信息

  • 总下载量: 106
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2016-08-18

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固