承接 opencat/filter-xml 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

opencat/filter-xml

Composer 安装命令:

composer require opencat/filter-xml

包简介

Generic XML file filter for the OpenCAT Framework

README 文档

README

Generic XML file filter for the CAT Framework.

Works with any well-formed XML file: Android string resources, app config files, custom XML formats, etc.

Installation

composer require catframework/filter-xml

Usage

use CatFramework\FilterXml\XmlFilter;

$filter = new XmlFilter();

// Extract translatable segments from an XML file
$document = $filter->extract('strings.xml', 'en', 'fr');

foreach ($document->getSegmentPairs() as $pair) {
    echo $pair->source->getPlainText() . PHP_EOL;
    // … send to MT, TM lookup, or human translator …
    $pair->target = new Segment('seg-t', [$translatedText]);
}

// Write the translated XML file
$filter->rebuild($document, 'strings.fr.xml');

Extraction heuristic

The filter uses a structural heuristic to decide what to extract:

  • Translatable element — has at least one non-whitespace direct text node. Its full content (text + any child elements) is extracted as one segment.
  • Container element — has only element children. Recursed into; not extracted itself.

Child elements inside a translatable segment are represented as InlineCode pairs so translators see placeholders (<b>, </b>) rather than raw markup.

Example — given:

<resources>
    <string name="greeting">Hello <b>world</b></string>
    <container>
        <item>First item</item>
    </container>
</resources>

Three segments are extracted: Hello {<b>}world{</b>}, First item.

Skeleton format

The skeleton stored in BilingualDocument::$skeleton is:

[
    'xml'     => string,    // full DOMDocument::saveXML() output with tokens in place of segment text
    'seg_map' => [          // segId => token string
        'seg-1' => '{{SEG:001}}',
        'seg-2' => '{{SEG:002}}',
        // …
    ],
]

Tokens are valid XML character data, so the skeleton is always parseable XML.

Limitations

  • Generic heuristic: the filter has no knowledge of application-specific schemas. Elements that should not be translated (e.g. <version>, <id>) will be extracted if they contain text. For schema-aware extraction, subclass XmlFilter and override walkElement().
  • Whitespace-only nodes: text nodes containing only whitespace (indentation, newlines) are silently skipped.
  • CDATA sections: treated as text content by the DOM; extracted and re-encoded as regular text on rebuild.
  • XML namespace prefixes are preserved in InlineCode data as-is.

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 1
  • 点击次数: 8
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-05-09

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固