opencat/filter-html 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

opencat/filter-html

Composer 安装命令:

composer require opencat/filter-html

包简介

HTML file filter for the OpenCAT Framework

README 文档

README

HTML file filter for the CAT Framework.

Installation

composer require catframework/filter-html

Requires ext-dom and ext-libxml.

Usage

use CatFramework\FilterHtml\HtmlFilter;

$filter = new HtmlFilter();

// Extract translatable segments
$document = $filter->extract('page.html', 'en', 'fr');

foreach ($document->getSegmentPairs() as $pair) {
    $pair->target = new Segment('seg-t', [$translatedText]);
}

// Write the translated file
$filter->rebuild($document, 'page.fr.html');

What gets extracted

The filter uses a block element taxonomy to decide segmentation boundaries:

Block elements (each becomes at most one segment): <p>, <div>, <h1><h6>, <li>, <td>, <th>, <dt>, <dd>, <blockquote>, <figcaption>, <caption>

  • A block element with only text / inline children → extracted as one segment.
  • A block element that itself contains other block elements → recursed into (not extracted as a whole).

Inline elements inside a segment become InlineCode pairs so translators see placeholders rather than raw HTML tags: <b>, <strong>, <i>, <em>, <a>, <span>, <sub>, <sup>, <code>, <abbr>, <u>, <small>, <mark>

Void elements (<br>, <img>, <input>, etc.) become standalone InlineCode placeholders.

Whitespace-only blocks are silently skipped.

Skeleton format

[
    'html'    => string,   // serialized DOMDocument with {{SEG:NNN}} tokens in place of block content
    'seg_map' => [         // segId => token string
        'seg-1' => '{{SEG:001}}',
        // …
    ],
]

Limitations

  • Full HTML documents only: the filter expects a <body> element. Fragment-only strings (no wrapping body) will produce no segments.
  • Structural elements outside the body (<head>, <title>, <meta>) are not extracted.
  • Unknown non-block elements are treated as inline and wrapped as InlineCode pairs.
  • Invalid nesting (block element inside an inline context) is silently ignored.

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 1
  • 点击次数: 9
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-05-09

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固