1tomany/pdf-pack
最新稳定版本:v0.7.2
Composer 安装命令:
composer require 1tomany/pdf-pack
包简介
A simple PHP library that makes rasterizing pages and extracting text from PDFs for large language models easy
关键字:
README 文档
README
pdf-pack is a simple PHP library that makes rasterizing pages and extracting text from PDFs for large language models easy. It uses a single dependency, the Symfony Process Component, to interface with the Poppler command line tools from the xpdf library.
Installation
Install the library using Composer:
composer require 1tomany/pdf-pack
Installing Poppler
Before beginning, ensure the pdfinfo, pdftoppm, and pdftotext binaries are installed and located in your $PATH.
macOS
brew install poppler
Debian and Ubuntu
apt-get install poppler-utils
Usage
This library has three main features:
- Read PDF metadata such as the number of pages
- Rasterize one or more pages to JPEG or PNG images
- Extract text from one or more pages
Extracted data is stored in memory and can be written to the filesystem or converted to a data: URI. Because extracted data is stored in memory, this library returns a \Generator object for each page that is extracted or rasterized.
Using the library is easy, and you have two ways to interact with it:
- Direct Instantiate the
OneToMany\PdfPack\Client\Poppler\PopplerClientclass and call the methods directly. This method is easier to use, but comes with the cost that your application will be less flexible and testable. - Actions Create a container of
OneToMany\PdfPack\Contract\Client\ClientInterfaceobjects, and use theOneToMany\PdfPack\Factory\ClientFactoryclass to instantiate them.
Note: A Symfony bundle is available if you wish to integrate this library into your Symfony applications with autowiring and configuration support.
Direct usage
See examples/direct.php.
Credits
License
The MIT License
统计信息
- 总下载量: 43
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 4
- 点击次数: 7
- 依赖项目数: 1
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-03-05