定制 masakielastic/striter 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

masakielastic/striter

最新稳定版本:0.1.0

Composer 安装命令:

pie install masakielastic/striter

包简介

PHP extension for string iteration with grapheme, codepoint, and byte modes

README 文档

README

A PHP extension that provides advanced string iteration capabilities for UTF-8 strings with support for grapheme clusters, Unicode codepoints, and byte-level iteration.

Features

  • Grapheme Cluster Iteration: Iterate over grapheme clusters (user-perceived characters) using PCRE2
  • Unicode Codepoint Iteration: Iterate over individual Unicode codepoints
  • Byte-level Iteration: Iterate over individual bytes for low-level string processing
  • UTF-8 Safe: Proper handling of multibyte UTF-8 characters
  • Standard PHP Interfaces: Implements Iterator, IteratorAggregate, and Countable interfaces for seamless integration

Installation

Requirements

  • PHP 8.1 or higher
  • PCRE2 library (libpcre2-dev)

Using PIE (Recommended)

PIE (PHP Installer for Extensions) is the recommended way to install this extension.

# Install PIE if you haven't already
composer global require php/pie

# Install the extension
pie install masakielastic/striter

PIE automatically handles building and enabling the extension.

Build from Source

# Install dependencies (Ubuntu/Debian)
sudo apt-get install libpcre2-dev

# Build extension
cd ext
phpize
./configure --enable-striter
make
sudo make install

Enable Extension

Add to your php.ini:

extension=striter.so

Usage

Basic Usage

<?php
// Create a string iterator
$iterator = str_iter("Hello World");

// Iterate using foreach
foreach ($iterator as $index => $char) {
    echo "[$index] => '$char'\n";
}

Iteration Modes

Grapheme Mode (Default)

Iterates over grapheme clusters (user-perceived characters):

<?php
$text = "Hello🌍";
$iterator = str_iter($text, "grapheme");

foreach ($iterator as $index => $char) {
    echo "[$index] => '$char'\n";
}
// Output:
// [0] => 'H'
// [1] => 'e'
// [2] => 'l'
// [3] => 'l'
// [4] => 'o'
// [5] => '🌍'

Codepoint Mode

Iterates over individual Unicode codepoints:

<?php
$text = "Hello🌍";
$iterator = str_iter($text, "codepoint");

foreach ($iterator as $index => $char) {
    echo "[$index] => '$char'\n";
}

Byte Mode

Iterates over individual bytes:

<?php
$text = "Hello";
$iterator = str_iter($text, "byte");

foreach ($iterator as $index => $byte) {
    echo "[$index] => '" . ord($byte) . "'\n";
}

Using Countable Interface

<?php
$text = "Hello🌍";
$iterator = str_iter($text, "grapheme");

echo "Total characters: " . count($iterator) . "\n"; // Output: 6

Using IteratorAggregate Interface

<?php
$text = "ABC";
$iterator = str_iter($text);

// Get inner iterator for advanced operations
$innerIterator = $iterator->getIterator();
foreach ($innerIterator as $key => $value) {
    echo "[$key] => '$value'\n";
}

API Reference

Functions

str_iter(string $str, string $mode = "grapheme")

Creates a new string iterator.

Parameters:

  • $str (string): The string to iterate over
  • $mode (string, optional): Iteration mode - "grapheme", "codepoint", or "byte"

Returns: _StrIterIterator object

Iterator Methods

The returned iterator implements PHP's IteratorAggregate and Countable interfaces:

IteratorAggregate Methods:

  • getIterator(): Returns the iterator itself for nested iteration

Countable Methods:

  • count(): Returns the total number of elements in the iterator

Examples

Working with Emoji and Complex Characters

<?php
// Complex emoji with skin tone modifiers
$text = "👨‍👩‍👧‍👦👋🏽";
$iterator = str_iter($text, "grapheme");

foreach ($iterator as $index => $char) {
    echo "Grapheme $index: '$char'\n";
}

Processing Japanese Text

<?php
$text = "こんにちは世界";
$iterator = str_iter($text, "grapheme");

foreach ($iterator as $index => $char) {
    echo "Character $index: '$char'\n";
}

Binary Data Processing

<?php
$data = "\x48\x65\x6C\x6C\x6F"; // "Hello" in hex
$iterator = str_iter($data, "byte");

foreach ($iterator as $index => $byte) {
    echo "Byte $index: 0x" . dechex(ord($byte)) . "\n";
}

Technical Details

Grapheme Cluster Detection

The extension uses PCRE2's \X pattern to detect grapheme clusters, which properly handles:

  • Base characters with combining marks
  • Emoji sequences
  • Regional indicator sequences
  • Hangul syllable sequences

UTF-8 Validation

The extension includes proper UTF-8 validation and handles invalid sequences gracefully by treating them as individual bytes.

Memory Management

The extension properly manages memory for string copies and PCRE2 objects, preventing memory leaks.

Testing

Run the included test files:

php tests/test_basic.php
php tests/test_grapheme.php
php tests/test_byte_mode.php
php tests/test_emoji_bug.php
php tests/test_invalid_utf8.php

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is open source. Please refer to the project's license file for details.

Changelog

Version 0.1.0

  • Initial release
  • Support for grapheme, codepoint, and byte iteration modes
  • PCRE2 integration for proper grapheme cluster detection
  • Full Iterator interface implementation

统计信息

  • 总下载量: 1
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 6
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: C

其他信息

  • 授权协议: Unknown
  • 更新时间: 2026-02-24

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固