ttree/contentinsight 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

ttree/contentinsight

Composer 安装命令:

composer require ttree/contentinsight

包简介

Tools to extract basic content inventory information from an existing website

README 文档

README

![Build Status](http://gitlab.ttree.ch:8080/buildStatus/icon?job=OSS ContentInsight Master Commit) Total Downloads

This TYPO3 Flow package provider a CLI tools to extract Content Inventory CSV from existing website.

This package is under development and considered beta. This package require Flow 2.3.

Features

  • Extract website structure and basic meta data
  • Support crawling presets
  • Flexible report building (include a CSV report builder, but you can register your own report builder)
  • Skip URI with regular expression
  • Sort inventory based on document tree structure

Todos

  • Generate human readable page ID (like, 1, 1.1, 1.2, 2, 2.1, 2.2, ...)
  • Update report / multiple index support
  • Get analytics data from Google Analytics

Configuration

Check the Configuration/Settings.yaml for detailed configurations.

By default, this package cache all Raw HTTP request for one day. You can change this settings in you own Settings.yaml and Caches.yaml.

Base Preset

The base preset is automatically merged with all preset. You can enabled or disabled any property with the settings presets.[preset_name].properties.[property_name].enabled.

Ttree:
  ContentInsight:
    presets:
      '*':
        properties:
          'pageTitle':
            enabled: TRUE
          'navigationTitle':
            enabled: TRUE

Custom Preset

You can define custom preset to crawle different kind of informations. With the class setting you can use your own processor implementation to get information from the current URI. Your processor must implement Ttree\ContentInsight\CrawlerProcessor\ProcessorInterface:

Ttree:
  ContentInsight:
    presets:
      'custom':
        properties:
          'pageTitle':
            class: 'Your\Package\CrawlerProcessor\PageTitleProcessor'
          'metaDescription':
            enabled: TRUE
          'metaKeywords':
            enabled: TRUE
          'firstLevelHeader':
            enabled: TRUE

How to build a report ?

The package support CSV reporting, but you can register your own Report builder. Check the Settings.yaml:

Ttree:
  ContentInsight:
    presets:
      'custom':
        reportConfigurations:
          'csv':
            enabled: TRUE
            renderType: 'Csv'
            renderTypeOptions:
              displayColumnHeaders: TRUE
            reportPath: '%FLOW_PATH_DATA%Reports/Ttree.ContentInsight'
            reportPrefix: 'content-inventory-report'
            properties:
              'id':
                label: 'ID'
              'pageTitle':
                label: 'Page Title'
              'navigationTitle':
                label: 'Navigation Title'
              'externalLink':
                label: 'External Link'
                postProcessor: 'Boolean'
              'currentUri':
                label: 'URL'
              'metaDescription':
                label: 'Meta Description'
              'metaKeywords':
                label: 'Meta Keywords'
              'firstLevelHeaderCount':
                label: 'Main Header Count (H1)'
              'firstLevelHeaderContent':
                label: 'Main Header Content (H1)'
              'remark':
                label: 'Crawling Remark'

The keys in the properties section must match the key produced by the CrawlerProcessor object.

The position of each column could be specified with the following syntax : position: '<position-string>' The <position-string> supports one of the following syntax:

    start (<weight>)
    end (<weight>)
    before <key> (<weight>)
    after <key> (<weight>)
    <numerical-order>

Example

Ttree:
  ContentInsight:
    presets:
      'custom':
        reportConfigurations:
          'csv':
            enabled: TRUE
            renderType: 'Csv'
            renderTypeOptions:
              displayColumnHeaders: TRUE
            reportPath: '%FLOW_PATH_DATA%Reports/Ttree.ContentInsight'
            reportPrefix: 'content-inventory-report'
            properties:
              'id':
                label: 'ID'
                position: '<position-string>',
              'pageTitle':
                label: 'Page Title'
                position:'<position-string>'

For a single crawling preset you can register multiple reports if required. Foreach property you can register a post processor if you need to manipulate the property in the report, see BooleanPostProcessor for a basic example.

How to skip specific URI ?

You can define invalid URIs patterns in your crawling presets:

Ttree:
  ContentInsight:
    presets:
      'custom':
        invalidUriPatterns:
          'javascript':
            pattern: '@^javascript\:void\(0\)$@'
          'mailto':
            pattern: '@^mailto\:.*@'
          'anchor':
            pattern: '@^#.*@'
            message: 'Link to anchor'

If the pattern has a message all URL matching the pattern will be logged. By default the crawler skip those URLs silently.

Usage

To get the complete website inventory:

# flow contentinventor:extract --base-url http://www.domain.com

Or to limit the crawler to a part of the website

# flow contentinventor:extract --base-url http://www.domain.com/products

You can select a crawling presets

# flow contentinventor:extract --base-url http://www.domain.com/products --preset default

统计信息

  • 总下载量: 289
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 2
  • 点击次数: 4
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: Unknown
  • 更新时间: 2014-11-13

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固