jbo/pdf-extractor 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

jbo/pdf-extractor

最新稳定版本:1.0.0

Composer 安装命令:

composer require jbo/pdf-extractor

包简介

This Library helps extracting content from a pdf file

README 文档

README

A PHP library for extracting text content from PDF files with multiple extraction methods.

Overview

This library provides a flexible way to extract text from PDF files using different extraction methods. It currently supports:

  1. SmalotPdfParser - A PHP-based PDF parser
  2. Pdftotext - Command-line utility from Poppler tools

Requirements

  • PHP 8.1 or higher
  • Composer
  • For Pdftotext extractor: Poppler tools installed on your system

Installation

Install via Composer:

composer require jbo/pdf-extractor

Usage

Basic Usage

<?php
require 'vendor/autoload.php';

use Jbo\PdfExtractor\PdfTextExtractor;
use Jbo\PdfExtractor\Extractor\SmalotPdfParserExtractor;

// 1. Choose an extractor
$extractor = new SmalotPdfParserExtractor();

// 2. Initialize the service
$service = new PdfTextExtractor($extractor);

// 3. Extract text from a PDF file
try {
    $text = $service->extract('/path/to/document.pdf');
    echo $text;
} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . PHP_EOL;
}

Using Pdftotext Extractor (Windows)

<?php
require 'vendor/autoload.php';

use Jbo\PdfExtractor\PdfTextExtractor;
use Jbo\PdfExtractor\Extractor\PdftotextExtractor;

// Specify the path to pdftotext.exe from Poppler for Windows
$extractor = new PdftotextExtractor('C:\\path\\to\\poppler\\bin\\pdftotext.exe');
$service = new PdfTextExtractor($extractor);

// Extract text
$text = $service->extract('/path/to/document.pdf');

Extractors

SmalotPdfParserExtractor

Uses the smalot/pdfparser library to extract text from PDF files. This is a pure PHP solution that doesn't require external dependencies.

PdftotextExtractor

Uses the pdftotext command-line utility from Poppler tools to extract text. This method may provide better results for certain PDF files but requires the Poppler tools to be installed on your system.

Error Handling

The library throws exceptions in the following cases:

  • InvalidArgumentException: When the PDF file doesn't exist or isn't readable
  • RuntimeException: When text extraction fails

License

This library is licensed under the MIT License. See the LICENSE file for details.

Author

Jens Bourry

统计信息

  • 总下载量: 2
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 2
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2025-05-09

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固