包简介

Text classification for PHP

README 文档

README

Lightweight zero-dependency text classification library for PHP 8.1+.

This package provides an implementation of the Naive Bayes algorithm for text classification, sentiment analysis and spam filtering. It includes built-in support for N-Grams, stop-word filtering and Laplace smoothing.

Installation

Install the package through Composer.

composer require albertvankiel/classification

Basic usage

The most common use for Naive Bayes is binary classification, such as detecting spam messages or sentiment analysis.

Training the model

To train the model, you must provide an array of text samples and an array of corresponding labels:

use AlbertvanKiel\Classification\Classifiers\NaiveBayes;

$classifier = new NaiveBayes();

// 1. Prepare your training data
$samples = [
    "Win a FREE iPhone today! Click here to claim your prize.",
    "Cheap medication for sale, limited time offer!",
    "Hey John, are we still on for the marketing meeting at 10?",
    "Can you please send me the Q3 financial report by Friday?"
];
$labels = [
    "spam",
    "spam",
    "not_spam",
    "not_spam"
];

// 2. Train the classifier
$classifier->train($samples, $labels);

Making predictions

Once trained, you can use the classifier to predict the category of text:

// Predict the single most likely category
$prediction = $classifier->predict("Click here to get your free gift card!");
echo $prediction; // Outputs: 'spam'

// Get the exact probability percentages
$probabilities = $classifier->predictProbabilities("Are we meeting tomorrow?");
print_r($probabilities); 
// Outputs: ['not_spam' => 0.98, 'spam' => 0.02]

Loading training datasets from JSON

For loading larger datasets with training data from a JSON file you can use the Dataset factory. The JSON file should be an array of objects containing a text and label key, for example:

dataset.json

[
  {
    "text": "Win a FREE iPhone today! Click here to claim your prize.",
    "label": "spam"
  },
  {
    "text": "Hey John, are we still on for the marketing meeting at 10?",
    "label": "not_spam"
  },
  {
    "text": "Cheap medication for sale, limited time offer!",
    "label": "spam"
  }
]

Loading the dataset in PHP:

use AlbertvanKiel\Classification\Data\Json;

// Load the data from the file
$dataset = Json::fromFile('/path/to/dataset.json');

// Extract the data and train the classifier
$classifier->train($dataset->getSamples(), $dataset->getLabels());

Saving and loading models

You can train the model once and then save it to a disk and then load it later:

// Save the trained math to a file

$classifier->save('/path/to/storage/spam_model.txt');

// Later, load it without training
$fastClassifier = new NaiveBayes();
$fastClassifier->load('/path/to/storage/spam_model.txt');
$result = $fastClassifier->predict($_POST['message']);

Customizing the tokenizer

By default, the built-in tokenizer filters out common English stop words (such as "the", "and", "is") and uses Unigrams (single words).

You can inject a custom tokenizer for supporting different languages or use N-Grams to give the algorithm context about word order:

use AlbertvanKiel\Classification\Tokenizer\Tokenizer;
use AlbertvanKiel\Classification\Tokenizer\StopWords;

// Example 1: Use Spanish stop words
$spanishStopWords = ['el', 'la', 'los', 'las', 'un', 'una', 'y', 'o', 'pero'];
$spanishTokenizer = new Tokenizer($spanishStopWords);

// Example 2: Use Bigrams (pairs of words) for better context
// "not good" becomes "not_good" instead of ["not", "good"]
$bigramTokenizer = new Tokenizer(StopWords::english(), 2);

// Example 3: Disable stop word filtering entirely
$rawTokenizer = new Tokenizer([]);

// Inject the custom tokenizer into the classifier
$classifier = new NaiveBayes($bigramTokenizer);

License

The MIT License (MIT). Please see License File for more information.

albertvankiel/classification 适用场景与选型建议

albertvankiel/classification 是一款基于 PHP 开发的 Composer 扩展包，目前已累计 8 次下载、GitHub Stars 达 6，最近一次更新时间为 2026 年 04 月 18 日，在 PHP 生态内属于活跃度较高的组件。

我们在过去多个企业项目中使用过 albertvankiel/classification 或与其功能相近的方案，如果你在选型或落地过程中遇到问题，例如 版本兼容、二次改造、私有化封装、与内部系统对接、生产 BUG 排查，欢迎联系我们协助评估。

围绕 albertvankiel/classification 我们能提供哪些服务？

定制开发 / 二次开发

基于 albertvankiel/classification 在你已有业务上做功能扩展、字段裁剪、UI 适配、与内部账号 / 权限 / 日志系统的深度对接。

BUG 修复 & 性能优化

线上偶发问题、内存泄漏、慢查询、并发异常等排查修复；针对高流量场景做缓存、队列、索引层面的调优。

项目外包 & 长期维护

承接完整的项目从需求 → 设计 → 开发 → 上线 → 长期运维；也可按月提供技术保姆服务。

yvsm@zunyunkeji.com QQ：316430983 微信：yvsm316 西安尊云信息科技 · 专注 PHP / Go / 分布式系统研发

albertvankiel/classification

包简介