SeoSanitizer::deleteSegmentsContainingSitename() - Code Metrics - Inspection of "style, cosmetic" - Dispositif/Wikibot - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Push — master ( 2556d0...3ec723 )

by Dispositif

created 2023-04-08 13:54 UTC

SeoSanitizer::deleteSegmentsContainingSitename() A

↳ Parent: SeoSanitizer

Complexity

Conditions	3
Paths	1

Size

Total Lines	20
Code Lines	13

Duplication

Lines	0
Ratio	0 %

Importance

Changes	1
Bugs	0	Features	0

Metric	Value
cc	3
eloc	13
nc	1
nop	2
dl	0
loc	20
rs	9.8333
c	1
b	0
f	0

<?php
/*
 * This file is part of dispositif/wikibot application (@github)
 * 2019-2023 © Philippe M./Irønie  <[email protected]>
 * For the full copyright and MIT license information, view the license file.
 */

declare(strict_types=1);

namespace App\Domain\Publisher;

use App\Domain\Utils\TextUtil;

class SeoSanitizer
{
    private const MAX_LENGTH_FIRST_SEG_ALLOWING_SECOND_SEG = 30;
    private const REBUILD_SEPARATOR = ' - ';

    /**
     * Naive SEO sanitization of web page title.
     * pretty domain name as "google.com" or "google.co.uk"
     */
    public function cleanSEOTitle(string $prettyDomainName, ?string $title): ?string
    {
        if (empty(trim($title))) {
        if (empty(trim(/** @scrutinizer ignore-type */ $title))) {
            return null;
        }
        $title = str_replace(['–', '—', '\\'], ['-', '-', '/'], $title); // replace em dash with hyphen

        $seoSegments = $this->extractSEOSegments($title);
        // No SEO segmentation found
        if (count($seoSegments) < 2) {
            return $title;
        }
        $seoSegmentsFiltered = $this->deleteSegmentsContainingSitename($prettyDomainName, $seoSegments);
        if (count($seoSegmentsFiltered) === 0) {
            return trim($seoSegments[0]);
        }

        // if only one segment or first segment is long enough, return it
        if (
            count($seoSegmentsFiltered) === 1
            || mb_strlen($seoSegmentsFiltered[0]) >= self::MAX_LENGTH_FIRST_SEG_ALLOWING_SECOND_SEG
        ) {
            return trim($seoSegmentsFiltered[0]);
        }

        // rebuild bestTitle but keep only the first 2 SEO segments
        return trim($seoSegmentsFiltered[0]) . self::REBUILD_SEPARATOR . trim($seoSegmentsFiltered[1]);
    }

    private function extractSEOSegments(string $title): array
    {
        $seoSeparator = $this->getSEOSeparator($title);
        if (null === $seoSeparator) {
            return [$title];
        }

        return explode($seoSeparator, $title);
    }

    private function getSEOSeparator(string $title): ?string
    {
        if (strpos($title, ' | ') !== false) {
            return ' | ';
        }
        if (strpos($title, ' / ') !== false) {
            return ' / ';
        }
        if (strpos($title, ' - ') !== false) {
            return ' - ';
        }
        if (strpos($title, ' : ') !== false) {
            return ' : ';
        }

        return null;
    }

    /**
     * Remove SEO segments as containing same words as the website domain name.
     */
    private function deleteSegmentsContainingSitename(string $prettyDomainName, array $seoSegments): array
    {
        // strip string after last dot in prettyDomainName : blabla.com => blabla
        $siteName = preg_replace('/\.[^.]*$/', '', $prettyDomainName);
        // strip string if only 2 chars after dot : so 'blabla.co.uk' => blabla
        $siteName = preg_replace('/\.[^.]{2}$/', '', $siteName);
        $siteName = TextUtil::stripPunctuation($siteName); // bla-bla => blabla

        return array_values(array_filter(
            $seoSegments,
            function ($segment) use ($prettyDomainName, $siteName) {
                $strippedSegment = str_replace(
                    [' ', '-'],
                    '',
                    mb_strtolower(TextUtil::stripPunctuation(TextUtil::stripAccents($segment)))
                );

                return !empty(trim($segment))
                    && false === strpos($strippedSegment, str_replace(['.', '-'], '', $prettyDomainName))
                    && false === strpos($strippedSegment, str_replace(['.', '-'], '', $siteName));
            }
        ));
    }
}

Dispositif / Wikibot

Push — master ( 2556d0...3ec723 )

SeoSanitizer::deleteSegmentsContainingSitename() A

Complexity

Size

Duplication

Importance

Duplication Side-by-Side

Filter issues like