iserter/php-ai-text-sanitizer
Composer 安装命令:
composer require iserter/php-ai-text-sanitizer
包简介
A dependency-free PHP library that detects and removes invisible watermarks and normalizes tell-tale typography from AI-generated text.
README 文档
README
A small, dependency-free PHP library that detects and removes invisible watermarks — and other tell-tale characters — that AI providers leave in generated text.
Large language models routinely emit characters that render as nothing but
survive copy/paste: zero-width spaces and joiners, Unicode tag characters
(U+E0000–E007F) and variation selectors used for steganographic
watermarking, bidirectional controls, and unusual spaces such as the narrow
no-break space (U+202F). This library strips them, and can optionally
normalize the visible typography (smart quotes, dashes, ellipses) that also
signals machine authorship.
- Zero dependencies. Pure PHP 8.1+, PCRE with the
/umodifier, andmbstring. - Framework-agnostic. Drop it into any PSR-4 project and adjust the namespace.
- Detect and clean. Use it as a filter, or as a "was this AI-watermarked?" check.
Install
composer require iserter/php-ai-text-sanitizer
Usage
One-liner
use iSerter\AiTextSanitizer\AITextSanitizer; $clean = AITextSanitizer::clean_text($aiOutput);
Configured instance
$sanitizer = new AITextSanitizer([ 'normalize_smart_quotes' => true, 'normalize_dashes' => true, ]); $clean = $sanitizer->sanitize($aiOutput);
Clean with a report of what changed
$result = $sanitizer->clean($aiOutput); if ($result->wasModified()) { error_log($result->getReport()->summary()); } echo $result->getText();
Detect only (don't modify)
$report = $sanitizer->detect($aiOutput); if ($report->hasWatermarks()) { foreach ($report->getFindings() as $f) { printf("%s %s [%s] x%d\n", $f->notation(), $f->name, $f->category, $f->count); } }
Options
| Option | Default | Effect |
|---|---|---|
remove_zero_width |
true |
Remove ZWSP, ZWNJ, ZWJ, word joiner, BOM (U+200B–200D, U+2060, U+FEFF). |
remove_bidi |
true |
Remove bidirectional marks/controls (U+200E/200F, U+202A–202E, U+2066–2069, U+061C). |
remove_invisible_math |
true |
Remove invisible math operators (U+2061–2064). |
remove_variation_selectors |
true |
Remove variation selectors (U+FE00–FE0F, U+E0100–E01EF). |
remove_tag_chars |
true |
Remove the Unicode Tags block (U+E0000–E007F). |
remove_format_chars |
true |
Remove soft hyphen, CGJ, Hangul fillers, interlinear annotations, etc. |
remove_braille_blank |
true |
Remove U+2800 BRAILLE PATTERN BLANK. |
strip_control |
true |
Remove stray C0/C1 control characters (keeps TAB, LF, CR). |
keep_emoji |
true |
Preserve ZWJ and variation selectors when they are part of a valid emoji sequence. |
remove_citations |
false |
Remove AI-generated citations like (oaicite:5){index=5} or 【13†source】. |
normalize_homoglyphs |
false |
Apply NFKC normalization to neutralize visually identical homoglyphs (requires ext-intl). |
normalize_spaces |
true |
Fold unusual spaces (NBSP, thin space, ideographic space, …) to U+0020. |
normalize_line_separators |
true |
U+2028 → \n, U+2029 → \n\n. |
normalize_smart_quotes |
false |
Curly/angle quotes → straight ' and ". |
normalize_dashes |
false |
Dashes and minus (U+2010–2015, U+2212) → -. |
normalize_ellipsis |
false |
U+2026 → .... |
collapse_whitespace |
false |
Collapse runs of horizontal whitespace, trim line-end spaces, cap blank lines. |
trim |
false |
Trim leading/trailing whitespace of the whole string. |
Docker
docker build -t php-ai-text-sanitizer .
docker run --rm php-ai-text-sanitizer
Demo
php src/examples/demo.php
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 3
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-07-04