Passed
Push — develop ( 6d5733...09225f )
by Dylan
02:55
created

SEOToolboxControllerExtension::beforeParseField()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 3
Code Lines 1

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 3
rs 10
c 0
b 0
f 0
cc 1
eloc 1
nc 1
nop 2
1
<?php
2
/**
3
 * Plugin: SEOToolbox
4
 * Author: Dylan Grech
5
 * Copyright: 2016
6
 *
7
 * SEOtoolbox Controller Extension decorates the Content Controller
8
 * to add the auotamted links to a page where needed
9
 *
10
 * @see AutomatedLink
11
 */
12
class SEOToolboxControllerExtension extends Extension {
1 ignored issue
show
Coding Style Compatibility introduced by
PSR1 recommends that each class must be in a namespace of at least one level to avoid collisions.

You can fix this by adding a namespace to your class:

namespace YourVendor;

class YourClass { }

When choosing a vendor namespace, try to pick something that is not too generic to avoid conflicts with other libraries.

Loading history...
13
14
    private $maxLinksPerPage;
15
    private $settings       = null;
16
    private $linkCount      = 0;
17
    private $addLinks       = true;
18
    private $excludeTags    = array();
19
    private $maxLinks       = 0;
20
21
    public function index(){
22
        $this->addAutomatedLinks();
23
24
        // If we have a crawl request check the CrawlID so we're sure we didn't hit another SS site running our module
25
        if( $crawl_id = $this->owner->request->getHeader('X-Crawl-Id') ){
26
            return( $crawl_id == GlobalAutoLinkSettings::get_current()->CrawlID )
27
                ? $this->crawl_response()
28
                : $this->owner->redirect(SEOTestSiteTreeController::getPermissionDeniedPage()->Link());
29
        }
30
31
        return array();
32
    }
33
34
    private function crawl_response(){
35
        // Encoded version to detect which fields are being used
36
        $customize = array();
37
        foreach( Config::inst()->get($this->owner->ClassName, 'db') as $field => $type ){
0 ignored issues
show
Bug introduced by
The expression \Config::inst()->get($th...owner->ClassName, 'db') of type array|integer|double|string|boolean is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
38
            if( strtolower( $type ) == 'htmltext' ){
39
                $data = ($this->owner->hasMethod($field)) ? $this->owner->$field() : $this->owner->$field;
40
                if( !$data ) {
41
                    continue;
42
                }
43
                $tmp = new HTMLText('tmp');
44
                $tmp->setValue($data);
45
                $data = base64_encode($tmp->forTemplate());
46
                $customize[$field] = "[**[$field]**[$data]**]";
47
            }
48
        }
49
50
        if (in_array($this->owner->ClassName, ClassInfo::subclassesFor('ErrorPage'))) {
51
            header("HTTP/1.0 405 Instance of ErrorPage");
52
            die();
1 ignored issue
show
Coding Style Compatibility introduced by
The method crawl_response() contains an exit expression.

An exit expression should only be used in rare cases. For example, if you write a short command line script.

In most cases however, using an exit expression makes the code untestable and often causes incompatibilities with other libraries. Thus, unless you are absolutely sure it is required here, we recommend to refactor your code to avoid its usage.

Loading history...
53
        }
54
55
        // Clean out the html before sending it back to minimize response size
56
        die(
1 ignored issue
show
Coding Style Compatibility introduced by
The method crawl_response() contains an exit expression.

An exit expression should only be used in rare cases. For example, if you write a short command line script.

In most cases however, using an exit expression makes the code untestable and often causes incompatibilities with other libraries. Thus, unless you are absolutely sure it is required here, we recommend to refactor your code to avoid its usage.

Loading history...
57
            preg_replace(array(
58
                '/<style(.*?)[>]/im',
59
                '/<(script|noscript)(.*?)<\/(script|noscript)[>]/im',
60
                '/<!--(.*?)-->/im',
61
            ), '', $this->owner->customise($customize)->render())
62
        );
63
    }
64
65
    /**
66
     * Get the global settings and check if we should be adding
67
     * links to this page
68
     *
69
     * @return GlobalAutoLinkSettings
70
     */
71
    private function getSettings() {
72
        if ($this->settings === null) {
73
            $this->settings = GlobalAutoLinkSettings::get_current();
74
            if (!$this->settings) return $this->addLinks = false;
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $this->addLinks = false; (false) is incompatible with the return type documented by SEOToolboxControllerExtension::getSettings of type GlobalAutoLinkSettings.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
75
76
            $this->excludeTags = (array) $this->settings->ExcludeTags();
77
            $this->maxLinks = (int) ($this->settings->MaxLinksPerPage) ? $this->settings->MaxLinksPerPage : PHP_INT_MAX;
78
79
            if (!in_array($this->owner->ClassName, $this->settings->AllowedIn())) $this->addLinks = false;
80
        }
81
82
        return $this->settings;
83
    }
84
85
    /**
86
     * Goes through all the automated link settings and adds
87
     * the links where necessary
88
     *
89
     * @return void
90
     */
91
    public function addAutomatedLinks(){
92
        if( GlobalAutoLinkSettings::$enabled && $this->owner->class != 'RedirectorPage' ) {
93
            $this->getSettings();
94
            if( !$this->addLinks ) {
95
                return;
96
            }
97
98
            foreach( $this->getSettings()->IncludeInFields() as $field ){
99
                // Check that the field provided by user exists in this object, is of type HTMLText and has content
100
                if( AutomatedLink::isFieldParsable( $this->owner->data(), $field ) ){
101
102
                    // Create dummy object so we can parse the HTML
103
                    $dummy = new HTMLText( $field );
104
                    $dummy->setValue( $this->owner->$field );
105
                    // Create DOMDocument Object
106
                    $content = mb_convert_encoding( $dummy->forTemplate(), 'html-entities', GlobalAutoLinkSettings::$encoding );
107
108 View Code Duplication
                    if( class_exists( 'HTML5_Parser' ) ){
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
109
                        $dom = HTML5_Parser::parse( $content );
110
                    } else{
111
                        $dom = new DOMDocument();
112
                        $dom->loadHTML( $content );
113
                    }
114
115
                    // Check current link count and if it's already exceeded do nothing
116
                    $this->linkCount += (int) $dom->getElementsByTagName( 'a' )->length;
0 ignored issues
show
Bug introduced by
The method getElementsByTagName does only exist in DOMDocument, but not in DOMNodeList.

It seems like the method you are trying to call exists only in some of the possible types.

Let’s take a look at an example:

class A
{
    public function foo() { }
}

class B extends A
{
    public function bar() { }
}

/**
 * @param A|B $x
 */
function someFunction($x)
{
    $x->foo(); // This call is fine as the method exists in A and B.
    $x->bar(); // This method only exists in B and might cause an error.
}

Available Fixes

  1. Add an additional type-check:

    /**
     * @param A|B $x
     */
    function someFunction($x)
    {
        $x->foo();
    
        if ($x instanceof B) {
            $x->bar();
        }
    }
    
  2. Only allow a single type to be passed if the variable comes from a parameter:

    function someFunction(B $x) { /** ... */ }
    
Loading history...
117
                    if( $this->linkCount >= $this->maxLinks ) {
118
                        return;
119
                    }
120
121
                    $parsed = $this->parseField( $dom, $field );
0 ignored issues
show
Bug introduced by
It seems like $dom defined by \HTML5_Parser::parse($content) on line 109 can also be of type object<DOMNodeList>; however, SEOToolboxControllerExtension::parseField() does only seem to accept object<DOMDocument>, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
122
                    $this->owner->data()->$field = $parsed;
123
                    $this->owner->$field         = $parsed;
124
                }
125
            }
126
        }
127
    }
128
129
    /**
130
     * Parse the provided field and add the necessary links
131
     *
132
     * @param DOMDocument $html
133
     * @param String $field
134
     * @return string
135
     */
136
    private function parseField( DOMDocument $html, $field ){
137
        $this->owner->extend( 'beforeParseField', $html, $field );
138
139
        // Remove Tags from Content we wown't be using
140
        $excluded = array();
141
        foreach( $this->excludeTags as $eTag ){
142
            while( $tags = $html->getElementsByTagName( $eTag ) ){
143
                if( !$tags->length ) break 1;
144
                $tag	= $tags->item(0);
145
                $value  = $html->saveHTML( $tag );
146
                $key    = (string) crc32( $value );
147
148
                // Convert back children nodes of this node if they were already hashed
149
                $excluded[$key] = str_replace( array_keys( $excluded ), array_values( $excluded ), $value );
150
151
                $tag->parentNode->replaceChild( $html->createTextNode( $key ), $tag );
152
            }
153
        }
154
155
        $body    = (string)$html->saveHTML( $html->getElementsByTagName('body')->item(0) );
156
        $content = preg_replace( array( '/\<body\>/is', '/\<\/body\>/is' ), '', $body, 1 );
157
158
        // Create the links
159
        $links = AutomatedLink::get()->sort('Priority');
160
        foreach( $links as $link ){
161
            // Check if self-linking is allowed and if current pagetype is allowed
162
            if( !$link->canBeAdded( $this->owner, $field ) ) continue;
163
164
            $max    = (int) ( $link->MaxLinksPerPage > 0 ) ? $link->MaxLinksPerPage : PHP_INT_MAX;
165
            $escape = (string) preg_quote( $link->Phrase, '/' );
166
            $regex  = (string) ( $link->CaseSensitive ) ? "/(\b{$escape}\b)/" : "/(\b{$escape}\b)/i";
167
168
            // Count the matches
169
            preg_match_all( $regex, $content, $count );
170
            $count = ( is_array( $count ) && isset( $count[0] ) ) ? count( $count[0] ) : 0;
171
            if( $count < 1 ) continue;
172
173
            if( isset( $this->maxLinksPerPage[ $link->ID ] ) )
174
                $max -= $this->maxLinksPerPage[ $link->ID ];
175
            else
176
                $this->maxLinksPerPage[ $link->ID ] = 0;
177
178
            for( $x = 0; $x < $count; $x++ ){
179
                // Stop adding links if we reached the link or page limit
180
                if( $x >= $max || $this->linkCount >= $this->maxLinks ) break;
181
182
                // Check if there is anything else to replace else stop
183
                preg_match( $regex, $content, $match );
184
                if( !is_array( $match ) || !count( $match ) ) break;
185
186
                if( !$html = (string) $link->getHTML( $match[0] ) ) continue;
187
                $key              = (string) crc32( $html );
188
                $excluded[ $key ] = (string) $html;
189
190
                $content = preg_replace( $regex, $key, $content, 1 );
191
                $this->linkCount++;
192
                $this->maxLinksPerPage[ $link->ID ]++;
193
            }
194
195
            // Stop Adding links if we reached the page limit
196
            if( $this->linkCount >= $this->maxLinks ) break;
197
        }
198
199
        // Re-add the excluded Tags
200
        $content = str_replace( array_keys( $excluded ), array_values( $excluded ), $content );
201
202
        return $content;
203
    }
204
}
205