定制 opencat/filter-docx 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

opencat/filter-docx

Composer 安装命令:

composer require opencat/filter-docx

包简介

DOCX (.docx) file filter for the OpenCAT Framework

README 文档

README

Microsoft Word DOCX file filter for the CAT Framework.

Installation

composer require catframework/filter-docx

Requires ext-dom, ext-libxml, and ext-zip.

Usage

use CatFramework\FilterDocx\DocxFilter;

$filter = new DocxFilter();

// Extract translatable segments
$document = $filter->extract('report.docx', 'en', 'fr');

foreach ($document->getSegmentPairs() as $pair) {
    $pair->target = new Segment('seg-t', [$translatedText]);
}

// Write the translated DOCX
$filter->rebuild($document, 'report.fr.docx');

What gets extracted

Each non-empty <w:p> paragraph in the document is one segment. Adjacent runs with identical formatting (<w:rPr>) are merged before extraction, reducing the number of inline code placeholders a translator sees.

Extracted locations (in order):

  1. word/document.xml — main body
  2. word/header1.xmlword/header10.xml — headers
  3. word/footer1.xmlword/footer10.xml — footers
  4. word/footnotes.xml, word/endnotes.xml — notes

Formatting runs within a paragraph become InlineCode pairs so translators see {<bold>}translated text{</bold>} instead of raw XML.

RTL support

When the target language is Arabic, Hebrew, Farsi, Urdu, or another RTL language, <w:rtl/> is injected into each run's <w:rPr> and <w:bidi/> is added to paragraph properties on rebuild.

Supported RTL language prefixes: ar, he, fa, ur, yi, dv, ps, sd.

Skeleton format

The skeleton is a temporary DOCX file written to the system temp directory at extract time:

['path' => '/tmp/cat-<uniqid>.skl']

The skeleton file is a copy of the original DOCX ZIP with paragraph content replaced by {{SEG:NNN}} tokens. Do not delete it between extract() and rebuild() calls. It is not automatically cleaned up.

Limitations

  • Tables: cell text is extracted as individual paragraph segments; table structure is preserved in the skeleton.
  • Text boxes and shapes: content inside drawing anchors is not currently extracted.
  • Comments and revisions: tracked changes and comment text are not extracted.
  • Skeleton lifetime: the .skl temp file must survive between extract() and rebuild(). For long-lived workflows, persist $document->skeleton['path'] and ensure the file is not cleaned up by the OS.

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 1
  • 点击次数: 9
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-05-09

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固