opencat/filter-plaintext
Composer 安装命令:
composer require opencat/filter-plaintext
包简介
Plain text (.txt) file filter for the OpenCAT Framework
README 文档
README
Plain text (.txt) file filter for the CAT Framework.
Installation
composer require catframework/filter-plaintext
Usage
use CatFramework\FilterPlaintext\PlainTextFilter; $filter = new PlainTextFilter(); // Extract translatable segments $document = $filter->extract('article.txt', 'en', 'fr'); foreach ($document->getSegmentPairs() as $pair) { $pair->target = new Segment('seg-t', [$translatedText]); } // Write the translated file $filter->rebuild($document, 'article.fr.txt');
How segments are split
The filter splits on two or more consecutive newlines (blank-line paragraph breaks). Each non-whitespace block becomes one segment. Single newlines within a block are preserved as-is and are part of the segment text.
First paragraph. → segment 1
→ (separator, not a segment)
Second paragraph. → segment 2
Third paragraph. → segment 3
Whitespace-only blocks (e.g. multiple blank lines between paragraphs) are passed through unchanged and do not become segments.
Encoding
Input files are auto-detected as UTF-8, ISO-8859-1, or Windows-1252. All output is written in UTF-8. If encoding detection fails, the file is treated as UTF-8.
Skeleton format
[
'parts' => string[], // file split by paragraph boundaries, separators included
'seg_map' => [int => string], // parts array index => segId
]
Limitations
- No inline markup support — the entire segment is plain text; no
InlineCodeelements are produced. - No sentence-level segmentation — each paragraph is one segment regardless of length. Use
catframework/segmentationfor sentence splitting. - Encoding detection relies on
mb_detect_encoding; unusual encodings (e.g. Shift-JIS) are not supported.
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 1
- 点击次数: 9
- 依赖项目数: 1
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-05-09