定制 yoozi/miner 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

yoozi/miner

Composer 安装命令:

composer require yoozi/miner

包简介

PHP library to extract the metadata from a public web page and/or summarize it.

README 文档

README

![Gitter](https://badges.gitter.im/Join Chat.svg)

This library is part of Project Golem, see yoozi/golem for more info.

Miner is a PHP library that extracting metadata and interesting text content (like author, summary, and etc.) from HTML pages. It acts like a simplified HTML metadata parser in Apache Tika.

WTF is Miner?

Ta-da! Consider the screenshot taken from LinkedIn below:

image

When you post a link to your connections on LinkedIn, it will automatically extract the title, summary, and even cover image for you. Miner can be typically used to achieve tasks like this.

Installation

The best and easy way to install the Golem package is with Composer.

  1. Open your composer.json and add the following to the require array:

    "yoozi/miner": "1.0.*"
    
  2. Run Composer to install or update the new package dependencies.

    php composer install
    

    or

    php composer update
    

Usage

Parsers

  • Meta: Summarize a webpage by parsing its HTML meta tags. In most cases it favors Open Graph (OG) markup, and will fall back to standard meta tags if necessary.
  • Readability: Summarize a webpage using Arc90's Readability alogrithm. All credit goes to @feelinglucky's PHP Port.
  • Hybrid: In combination with the above two parsers, it simply takes Readability as the primary parser, and Meta as its fallback.

Hybrid is enabled by default. You can change parsers to best fit your needs:

// Use the Readability Parser.
$extractor->getConfig()->set('parser', 'readability');

// Or...use the Hybrid Parser.
// $extractor->getConfig()->set('parser', 'hybrid');
// Or...use the Meta Parser.
// $extractor->getConfig()->set('parser', 'meta');

Example

We can parse a remote url and extract its metadata directly.

<?php

use Yoozi\Miner\Extractor;
use Buzz\Client\Curl;

$extractor = new Extractor();

// Use the Hybrid Parser.
$extractor->getConfig()->set('parser', 'hybrid');
// Strip all HTML tags in the description we parsed.
$extractor->getConfig()->set('strip_tags', true);

$meta = $extractor->fromUrl('http://www.example.com/', new Curl)->run();
var_dump($meta);

Data returned:

array(9) {
  ["title"]=>
  string(14) "Example Domain"
  ["author"]=>
  NULL
  ["keywords"]=>
  array(0) {
  }
  ["description"]=>
  string(220) "
    Example Domain
    This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.
    More information...
"
  ["image"]=>
  NULL
  ["url"]=>
  string(23) "http://www.example.com/"
  ["host"]=>
  string(22) "http://www.example.com"
  ["domain"]=>
  string(11) "example.com"
  ["favicon"]=>
  string(52) "http://www.google.com/s2/favicons?domain=example.com"
}

统计信息

  • 总下载量: 77
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 15
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 14
  • Watchers: 12
  • Forks: 6
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2014-04-08

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固