ssola/monachus 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

ssola/monachus

Composer 安装命令:

composer require ssola/monachus

包简介

Library to handle texts, includes: Spell checker, Stemer, Language detection

README 文档

README

Monachus is a library that helps you working with text, in any language. Monachus means Monk in Latin language, I think it's a good name to define this library. Monks were used to work a lot with books (strings) in a wide range of languages.

This library has been created keeping in mind these PHP versions: 5.5, 5.4, 5.3

Install

The simplest way is with Composer, just add these lines to your composer.json:

"repositories": [
    {
    "type": "git",
    "url": "https://github.com/ssola/monachus.git"
    }
]

How it works

String

The first thing we need to know is how to use the String class. This class generates an object with a specific text. It will preserve that text in UTF-8 charset along the way.

include_once("./vendor/autoload.php");

use Monachus\String as String;

$text = new String("Hello World!");
echo $text;

Obviously this code is generating a new String object with a value and then it's printed.

Then you can do things like:

include_once("./vendor/autoload.php");

use Monachus\String as String;

$text = new String("Hello World!");
echo $text->length();
echo $text->find("World");
echo $text->toUppercase();

if($text->equals("Hello World!"))
  echo $text->toLowercase();

This kind of objects is used extensively in this library in order to perform all the actions with the proper charset.

Tokenizer

Do you need to tokenize a string? Monachus can do it for you! We support a lot of languages, Japanese included! But if your language is not supported... relax! You can create your own adapters in order to tokenize different languages.

Let's do a simple example:

include_once("./vendor/autoload.php");

use Monachus\String as String;
use Monachus\Tokenizer as Tokenizer;

$text = new String("This is a text");
$tokenizer = new Tokenizer();

var_dump($tokenizer->tokenize($text));

// Now imagine you need to tokenize a Japanase text
$textJp = new String("は太平洋側を中心に晴れた所が多いが");
$tokenizerJp = new Tokenizer(new Monachus\Tokenizers\Japanase());

var_dump($tokenizerJp);

As you have seen, we can use our own adapters to tokenize complex languages like Japanase or Chinese. Now it's time to explain you how to create these adapters.

class MyTokenizer implements Monachus\Interfaces\TokenizerInterface
{
  public function tokenize(Monachus\String $string)
  {
    // your awesome code!
  }
}

$tokenizer = new Monachus\Tokenizer(new MyTokenizer());
var_dump($tokenizer->tokenize(new Monachus\String("Поиск информации в интернете"));

N-Gram

Yeah! Monachus is able to generate different levels of N-gram sequences, for example a bigram or trigram. But let's see how it works.

include_once("./vendor/autoload.php");

use Monachus\String as String;
use Monachus\Ngram as Ngram;
use Monachus\Config as Config;

$text = new String("This is an awesome text");

$config = new Config();
$config->max = 3; // we're creating trigrams.

$ngram = new Ngram($config);
var_dump($ngram->parse($text));

Do you need your own N-gram parser? No problem! You can create your own parsers as well.

class MyParser implements Monachus\Interfaces\NgramParserInterface
{
  public function parse(String $string, $level)
  {
    // your awesome code!
  }
}

And then...

include_once("./vendor/autoload.php");

use Monachus\String as String;
use Monachus\Ngram as Ngram;
use Monachus\Config as Config;

$text = new String("This is an awesome text");

$config = new Config();
$config->max = 3; // we're creating trigrams.

$ngram = new Ngram($config);
$ngram->setParser(new MyParser());
var_dump($ngram->parse($text));

统计信息

  • 总下载量: 35
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 4
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 4
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: Unknown
  • 更新时间: 2014-03-14

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固