定制 byjg/text-classifier 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

byjg/text-classifier

最新稳定版本:6.0.0

Composer 安装命令:

composer require byjg/text-classifier

包简介

A PHP text classifier supporting binary spam filtering (Robinson-Fisher Bayesian) and multi-class Naive Bayes classification, with optional LLM-assisted active learning fallback.

README 文档

README

sidebar_key text-classifier
tags
php text-classification ai

text-classifier — Bayesian Text Classifier

A PHP library for statistical text classification. Provides two independent engines:

Sponsor Build Status Opensource ByJG GitHub source GitHub license GitHub release

  • BinaryClassifier — Binary Robinson-Fisher Bayesian filter. Classifies text as spam or ham. Designed for high-accuracy two-class filtering with word degeneration support.
  • NaiveBayes — Multi-class Naive Bayes classifier. Classifies text into any number of user-defined categories. Suitable for language detection, topic tagging, content routing, and similar tasks.

Both engines return a ClassificationResult with the winning category, confidence score, and all category scores. Both support optional LLM injection for automatic escalation when the statistical model is uncertain — the LLM decision is fed back as training data, improving the model over time (active learning).

Both engines share the same tokenisation pipeline (StandardLexer, StandardDegenerator) and support pluggable storage backends (in-memory, SQLite, MySQL, PostgreSQL, GDBM).

Installation

composer require byjg/text-classifier

Requires PHP >=8.3. The GDBM storage backend additionally requires ext-dba.

Quick Example

Spam filter:

use ByJG\TextClassifier\BinaryClassifier;
use ByJG\TextClassifier\ConfigBinaryClassifier;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\Lexer\ConfigLexer;
use ByJG\TextClassifier\Degenerator\StandardDegenerator;
use ByJG\TextClassifier\Degenerator\ConfigDegenerator;
use ByJG\TextClassifier\Storage\Rdbms;
use ByJG\Util\Uri;

$storage = new Rdbms(new Uri('sqlite:///tmp/spam.db'), new StandardDegenerator(new ConfigDegenerator()));
$storage->createDatabase();

$classifier = new BinaryClassifier(new ConfigBinaryClassifier(), $storage, new StandardLexer(new ConfigLexer()));

$classifier->learn('Buy cheap pills now!!!', BinaryClassifier::SPAM);
$classifier->learn('Meeting at 3pm in the conference room', BinaryClassifier::HAM);

$result = $classifier->classify('buy pills online cheap');
// $result->choice === 'spam'
// $result->score  is close to 1.0

Multi-class classifier:

use ByJG\TextClassifier\NaiveBayes\NaiveBayes;
use ByJG\TextClassifier\NaiveBayes\Storage\Memory;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\Lexer\ConfigLexer;

$nb = new NaiveBayes(new Memory(), new StandardLexer(new ConfigLexer()));

$nb->train('PHP is a programming language', 'tech');
$nb->train('The cat sat on the mat', 'animals');

$result = $nb->classify('programming language');
// $result->choice          === 'tech'
// $result->score           === 0.93
// $result->scores          === ['tech' => 0.93, 'animals' => 0.07]

Documentation

Section Description
Getting Started Installation, requirements, first working example
Guides: Spam Filter Training, classifying, choosing storage
Guides: Multi-class Training categories, classifying, persistence
Guide: LLM-Assisted Classification Automatic LLM fallback and active learning
Concepts How the algorithms work, architecture overview
Reference Full API, configuration parameters, error codes

Acknowledgements

This library is inspired by the original b8 spam filter written by Tobias Leupold. The core algorithm, Robinson-Fisher probability model, token degeneration approach, and the tc* internal variable convention all originate from his work. This project modernises the codebase for PHP 8.3+, replaces the storage layer with byjg/micro-orm and byjg/migration, and adds a multi-class NaiveBayes engine built on the same tokenisation pipeline.

Dependencies

flowchart TD
    byjg/text-classifier --> byjg/micro-orm
    byjg/text-classifier --> byjg/migration
    byjg/text-classifier --> byjg/llm-api-objects
    byjg/text-classifier --> openai-php/client
Loading

统计信息

  • 总下载量: 10
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 9
  • 点击次数: 8
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 9
  • Watchers: 1
  • Forks: 3
  • 开发语言: PHP

其他信息

  • 授权协议: Unknown
  • 更新时间: 2026-03-07

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固