包简介

Linguistics and language tools in PHP.

README 文档

README

NEW --> Support to NYSIIS encoding What's next? --> Support to Caverphone, Arpabet

This package aims to provide a comprehensive group of new functions and methods to deal with linguistics and phonetics algorithms commonly used for developing or information technology. While PHP already offers functions to encode strings in metaphone and soundex algorithms, some other useful algorithms can't be reached from native functions.

Also, this package brings a dictionary to provide immediate conversion from almost any English word, from text to IPA phonetic symbols. For this moment, just en_US is available, but there are plans to include other languages or dialects eventually.

Installation

The easier way of using this package is to require it using Composer - although the package can be simply cloned and used, as long as the namespaces are respected.

composer require carloswph/linguistics

Usage

This has been organized in independent classes. The first class Phonetics provide three different methods. The method symbols() converts a string in IPA phonetic symbols. If a longer string is provided, the class splits the string in words, returning the respective symbology to all words, excluding repetitions. Additionally, the class provides a bridge for applying the existent functions of PHP - metaphone() and soundex().

All methods offer three different possibilities of response: TXT, JSON or PHP array. It returns TXT by default, so if you want a different format, you can pass the additional argument in the method. A few examples will make it clearer:

use Linguistics\Phonetics;

require __DIR__ . '/vendor/autoload.php';

$str = 'To be or not to be, that is the question';

Phonetics::symbols($str);
/*
Returns:

[ to ] => /ˈtu/, /tə/, /tɪ/
[ be ] => /ˈbi/, /bi/
[ or ] => /ˈɔɹ/, /ɝ/
[ not ] => /ˈnɑt/
[ that ] => /ˈðæt/, /ðət/
[ is ] => /ˈɪz/, /ɪz/
[ the ] => /ˈðə/, /ðə/, /ði/
[ question ] => /ˈkwɛstʃən/, /ˈkwɛʃən/
*/

Phonetics::soundex($str);
/*
Returns:

[ to ] => T000
[ be ] => B000
[ or ] => O600
[ not ] => N300
[ that ] => T300
[ is ] => I200
[ the ] => T000
[ question ] => Q235
*/
Phonetics::metaphone($str);
/*
Returns:

[ to ] => T
[ be ] => B
[ or ] => OR
[ not ] => NT
[ that ] => 0T
[ is ] => IS
[ the ] => 0
[ question ] => KSXN
*/

Phonetics::symbols($str, 'array');
/*
Returns:

array(8) { ["to"]=> array(3) { [0]=> string(6) "/ˈtu/" [1]=> string(6) " /tə/" [2]=> string(6) " /tɪ/" } ["be"]=> array(2) { [0]=> string(6) "/ˈbi/" [1]=> string(5) " /bi/" } ["or"]=> array(2) { [0]=> string(8) "/ˈɔɹ/" [1]=> string(5) " /ɝ/" } ["not"]=> array(1) { [0]=> string(8) "/ˈnɑt/" } ["that"]=> array(2) { [0]=> string(9) "/ˈðæt/" [1]=> string(8) " /ðət/" } ["is"]=> array(2) { [0]=> string(7) "/ˈɪz/" [1]=> string(6) " /ɪz/" } ["the"]=> array(3) { [0]=> string(8) "/ˈðə/" [1]=> string(7) " /ðə/" [2]=> string(6) " /ði/" } ["question"]=> array(2) { [0]=> string(15) "/ˈkwɛstʃən/" [1]=> string(14) " /ˈkwɛʃən/" } }
*/

Phonetics::symbols($str, 'json');
/*
Returns:

string(410) "{"to":["\/\u02c8tu\/"," \/t\u0259\/"," \/t\u026a\/"],"be":["\/\u02c8bi\/"," \/bi\/"],"or":["\/\u02c8\u0254\u0279\/"," \/\u025d\/"],"not":["\/\u02c8n\u0251t\/"],"that":["\/\u02c8\u00f0\u00e6t\/"," \/\u00f0\u0259t\/"],"is":["\/\u02c8\u026az\/"," \/\u026az\/"],"the":["\/\u02c8\u00f0\u0259\/"," \/\u00f0\u0259\/"," \/\u00f0i\/"],"question":["\/\u02c8kw\u025bst\u0283\u0259n\/"," \/\u02c8kw\u025b\u0283\u0259n\/"]}"
*/

NYSIIS encoding

From v1.1.0, the Phonetics class was reinforced with an additional method which returns The New York State Identification and Intelligence System Phonetic Code, or NYSIIS, to every single word in a sentence (excluding repeated words). The use follows the same logic of the previous static methods.

Phonetics::nysiis($str);
/*
Returns:

[ to ] => T
[ be ] => B
[ or ] => AR
[ not ] => NAT
[ that ] => THAT
[ is ] => A
[ the ] => TH
[ question ] => GAASTAAN
*/

Underway

Three other classes are currently underway:

An encoding class for Caverphone algorithm, versions 1.0 and 2.0
An encoding class for Match Rating Approach comparison and string encoding implementation.
An interesting legacy encoding on Arpabet algorithm
Roger Root encoding?

The next stable version, 1.2.0, should already bring the Caverphone class, at least compatible for encoding the 1.0 version of this algorithm.

carloswph/linguistics 适用场景与选型建议

carloswph/linguistics 是一款基于 PHP 开发的 Composer 扩展包，目前已累计 1.04k 次下载、GitHub Stars 达 10，最近一次更新时间为 2021 年 03 月 20 日，在 PHP 生态内属于活跃度较高的组件。

它主要适用于以下技术方向：「language」「Algorithm」「soundex」「ipa」「metaphone」「Phonetics」等业务场景。在实际项目中，围绕这些方向常见需要落地的问题包括：接口对接、性能调优、并发安全、与既有框架（Laravel / ThinkPHP / Yii / Webman 等）的兼容适配，以及生产环境的日志埋点与稳定性保障。

我们在过去多个企业项目中使用过 carloswph/linguistics 或与其功能相近的方案，如果你在选型或落地过程中遇到问题，例如 版本兼容、二次改造、私有化封装、与内部系统对接、生产 BUG 排查，欢迎联系我们协助评估。

围绕 carloswph/linguistics 我们能提供哪些服务？

定制开发 / 二次开发

基于 carloswph/linguistics 在你已有业务上做功能扩展、字段裁剪、UI 适配、与内部账号 / 权限 / 日志系统的深度对接。

BUG 修复 & 性能优化

线上偶发问题、内存泄漏、慢查询、并发异常等排查修复；针对高流量场景做缓存、队列、索引层面的调优。

项目外包 & 长期维护

承接完整的项目从需求 → 设计 → 开发 → 上线 → 长期运维；也可按月提供技术保姆服务。

yvsm@zunyunkeji.com QQ：316430983 微信：yvsm316 西安尊云信息科技 · 专注 PHP / Go / 分布式系统研发

与 carloswph/linguistics 相关的其它包

同方向 / 同关键字的高下载量 PHP Composer 包推荐，方便对比选型：

dg/texy 159

Texy converts plain text in easy to read Texy syntax into structurally valid (X)HTML. It supports adding of images, links, nested lists, tables and has full support for CSS. Texy supports hyphenation of long words (which reflects language rules), clickable emails and URL (emails are obfuscated again

delfimov/translate 16

Easy to use i18n translation PHP class for multi-language websites

awes-io/localization-helper 35

Package for convenient work with Laravel's localization features

novius/laravel-translation-loader 1

Store your language lines in the database, yaml or other sources

keller/soundex-fr-bundle 6

Compute a french soundex of a string.

cmsexperts/link2language 17

Set Links with a specific language parameter