Completed
Push — master ( 60ba61...279e03 )
by Auke
87:58 queued 69:03
created

Html2Text   B

Complexity

Total Complexity 44

Size/Duplication

Total Lines 197
Duplicated Lines 0 %

Coupling/Cohesion

Components 1
Dependencies 1
Metric Value
dl 0
loc 197
rs 8.3396
wmc 44
lcom 1
cbo 1

8 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 4 1
A convert() 0 8 2
B getLine() 0 21 5
C addWordToLine() 0 42 11
C getWord() 0 68 18
A splitWords() 0 7 2
A htmlDecode() 0 5 1
A getIndentation() 0 11 4

How to fix   Complexity   

Complex Class

Complex classes like Html2Text often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Html2Text, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/*
4
 * Copyright (c) 2003 Jose Solorzano.  All rights reserved.
5
 * Redistribution of source must retain this copyright notice.
6
 */
7
8
include ("htmlparser.inc");
9
10
/**
11
 * Class Html2Text. (HtmlParser example.)
12
 * Converts HTML to ASCII attempting to preserve
13
 * document structure. 
14
 * To use, create an instance of Html2Text passing
15
 * the text to convert and the desired maximum
16
 * number of characters per line. Then invoke 
17
 * convert() which returns ASCII text.
18
 */
19
class Html2Text {
20
21
    // Private fields
22
  
23
    var $iCurrentLine = "";
24
    var $iCurrentWord = "";
25
    var $iCurrentWordArray;
26
    var $iCurrentWordIndex;
27
    var $iInScript;
28
    var $iListLevel = 0;
29
    var $iHtmlText;
30
    var $iMaxColumns;
31
    var $iHtmlParser;
32
33
    // Constants
34
35
    var $TOKEN_BR       = 0;
36
    var $TOKEN_P        = 1;
37
    var $TOKEN_LI       = 2;
38
    var $TOKEN_AFTERLI  = 3;
39
    var $TOKEN_UL       = 4;
40
    var $TOKEN_ENDUL    = 5;
41
   
42
    function __construct ($aHtmlText, $aMaxColumns) {
43
        $this->iHtmlText = $aHtmlText;
44
        $this->iMaxColumns = $aMaxColumns;
45
    }
46
47
    function convert() {
48
        $this->iHtmlParser = new h2tHtmlParser($this->iHtmlText);
49
        $wholeText = "";
50
        while (($line = $this->getLine()) !== false) {
51
            $wholeText .= ($line . "\r\n");
52
        }
53
        return $wholeText;
54
    }
55
56
    function getLine() {
57
        while (true) {
58
            if (!$this->addWordToLine($this->iCurrentWord)) {
59
                $retvalue = $this->iCurrentLine;
60
                $this->iCurrentLine = "";
61
                return $retvalue;
62
            }                
63
            $word = $this->getWord();
64
            if ($word === false) {
65
                if ($this->iCurrentLine == "") {
66
                    break;
67
                }
68
                $retvalue = $this->iCurrentLine;
69
                $this->iCurrentLine = "";
70
                $this->iInText = false;
0 ignored issues
show
Bug introduced by
The property iInText does not exist. Did you maybe forget to declare it?

In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:

class MyClass { }

$x = new MyClass();
$x->foo = true;

Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion:

class MyClass {
    public $foo;
}

$x = new MyClass();
$x->foo = true;
Loading history...
71
                $this->iCurrentWord = "";
72
                return $retvalue;                
73
            }
74
        }
75
        return false;
76
    }
77
78
    function addWordToLine ($word) {
79
        if ($this->iInScript) {
80
            return true;
81
        }
82
        $prevLine = $this->iCurrentLine;
83
        if ($word === $this->TOKEN_BR) {
84
            $this->iCurrentWord = "";
85
            return false;
86
        }
87
        if ($word === $this->TOKEN_P) {
88
            $this->iCurrentWord = $this->TOKEN_BR;
0 ignored issues
show
Documentation Bug introduced by
The property $iCurrentWord was declared of type string, but $this->TOKEN_BR is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
89
            return false;
90
        }
91
        if ($word === $this->TOKEN_UL) {
92
            $this->iCurrentWord = $this->TOKEN_BR;
93
            return false;
94
        }
95
        if ($word === $this->TOKEN_ENDUL) {
96
            $this->iCurrentWord = $this->TOKEN_BR;
97
            return false;
98
        }
99
        if ($word === $this->TOKEN_LI) {
100
            $this->iCurrentWord = $this->TOKEN_AFTERLI;
0 ignored issues
show
Documentation Bug introduced by
The property $iCurrentWord was declared of type string, but $this->TOKEN_AFTERLI is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
101
            return false;
102
        }
103
        $toAdd = $word;
104
        if ($word === $this->TOKEN_AFTERLI) {
105
            $toAdd = "";
106
        }
107
        if ($prevLine != "") {
108
            $prevLine .= " ";
109
        }
110
        else {
111
            $prevLine = $this->getIndentation($word === $this->TOKEN_AFTERLI);
112
        }
113
        $candidateLine = $prevLine . $toAdd;
114
        if (strlen ($candidateLine) > $this->iMaxColumns && $prevLine != "") {
115
            return false;
116
        }
117
        $this->iCurrentLine = $candidateLine;
118
        return true;
119
    }
120
121
    function getWord() {
122
        while (true) {
123
            if ($this->iHtmlParser->iNodeType == NODE_TYPE_TEXT) {
0 ignored issues
show
Bug introduced by
The property iNodeType cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
124
                if (!$this->iInText) {
125
                    $words = $this->splitWords($this->iHtmlParser->iNodeValue);
0 ignored issues
show
Bug introduced by
The property iNodeValue cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
126
                    $this->iCurrentWordArray = $words;
127
                    $this->iCurrentWordIndex = 0;
128
                    $this->iInText = true;
129
                }
130
                if ($this->iCurrentWordIndex < count($this->iCurrentWordArray)) {
131
                    $this->iCurrentWord = $this->iCurrentWordArray[$this->iCurrentWordIndex++];
132
                    return $this->iCurrentWord;
133
                }
134
                else {
135
                    $this->iInText = false;
136
                }
137
            }
138
            else if ($this->iHtmlParser->iNodeType == NODE_TYPE_ELEMENT) {
0 ignored issues
show
Bug introduced by
The property iNodeType cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
139
                if (strcasecmp ($this->iHtmlParser->iNodeName, "br") == 0) {
0 ignored issues
show
Bug introduced by
The property iNodeName cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
140
                    $this->iHtmlParser->parse();
141
                    $this->iCurrentWord = $this->TOKEN_BR;
0 ignored issues
show
Documentation Bug introduced by
The property $iCurrentWord was declared of type string, but $this->TOKEN_BR is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
142
                    return $this->iCurrentWord;
143
                }
144
                else if (strcasecmp ($this->iHtmlParser->iNodeName, "p") == 0) {
0 ignored issues
show
Bug introduced by
The property iNodeName cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
145
                    $this->iHtmlParser->parse();
146
                    $this->iCurrentWord = $this->TOKEN_P;
0 ignored issues
show
Documentation Bug introduced by
The property $iCurrentWord was declared of type string, but $this->TOKEN_P is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
147
                    return $this->iCurrentWord;
148
                }
149
                else if (strcasecmp ($this->iHtmlParser->iNodeName, "script") == 0) {
0 ignored issues
show
Bug introduced by
The property iNodeName cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
150
                    $this->iHtmlParser->parse();
151
                    $this->iCurrentWord = "";
152
                    $this->iInScript = true;
153
                    return $this->iCurrentWord;
154
                }
155
                else if (strcasecmp ($this->iHtmlParser->iNodeName, "ul") == 0 || strcasecmp ($this->iHtmlParser->iNodeName, "ol") == 0) {
0 ignored issues
show
Bug introduced by
The property iNodeName cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
156
                    $this->iHtmlParser->parse();
157
                    $this->iCurrentWord = $this->TOKEN_UL;
0 ignored issues
show
Documentation Bug introduced by
The property $iCurrentWord was declared of type string, but $this->TOKEN_UL is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
158
                    $this->iListLevel++;
159
                    return $this->iCurrentWord;
160
                }
161
                else if (strcasecmp ($this->iHtmlParser->iNodeName, "li") == 0) {
0 ignored issues
show
Bug introduced by
The property iNodeName cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
162
                    $this->iHtmlParser->parse();
163
                    $this->iCurrentWord = $this->TOKEN_LI;
0 ignored issues
show
Documentation Bug introduced by
The property $iCurrentWord was declared of type string, but $this->TOKEN_LI is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
164
                    return $this->iCurrentWord;
165
                }
166
            }
167
            else if ($this->iHtmlParser->iNodeType == NODE_TYPE_ENDELEMENT) {
0 ignored issues
show
Bug introduced by
The property iNodeType cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
168
                if (strcasecmp ($this->iHtmlParser->iNodeName, "script") == 0) {
0 ignored issues
show
Bug introduced by
The property iNodeName cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
169
                    $this->iHtmlParser->parse();
170
                    $this->iCurrentWord = "";
171
                    $this->iInScript = false;
172
                    return $this->iCurrentWord;
173
                }
174
                else if (strcasecmp ($this->iHtmlParser->iNodeName, "ul") == 0 || strcasecmp ($this->iHtmlParser->iNodeName, "ol") == 0) {
0 ignored issues
show
Bug introduced by
The property iNodeName cannot be accessed from this context as it is declared private in class h2tHtmlParser.

This check looks for access to properties that are not accessible from the current context.

If you need to make a property accessible to another context you can either raise its visibility level or provide an accessible getter in the defining class.

Loading history...
175
                    $this->iHtmlParser->parse();
176
                    $this->iCurrentWord = $this->TOKEN_ENDUL;
0 ignored issues
show
Documentation Bug introduced by
The property $iCurrentWord was declared of type string, but $this->TOKEN_ENDUL is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
177
                    if ($this->iListLevel > 0) {
178
                        $this->iListLevel--;
179
                    }
180
                    return $this->iCurrentWord;
181
                }
182
            }
183
            if (!$this->iHtmlParser->parse()) {
184
                break;
185
            }
186
        }
187
        return false;
188
    }
189
190
    function splitWords ($text) {
191
        $words = preg_split ("/[ \t\r\n]+/", $text);
192
        for ($idx = 0; $idx < count($words); $idx++) {
0 ignored issues
show
Performance Best Practice introduced by
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
193
            $words[$idx] = $this->htmlDecode($words[$idx]);
194
        }
195
        return $words;
196
    }
197
198
    function htmlDecode ($text) {
199
		$srcTable	= Array('&nbsp;', '&amp;', '&lt;', '&gt;', '&quote;');
200
		$dstTable	= Array(' ', '&', '<', '>', '"');
201
		return str_replace($srcTable, $dstTable, $text);
202
    } 
203
204
    function getIndentation ($hasLI) {
205
        $indent = "";
206
        $idx = 0;
0 ignored issues
show
Unused Code introduced by
$idx is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
207
        for ($idx = 0; $idx < ($this->iListLevel - 1); $idx++) {
208
            $indent .= "  ";
209
        }
210
        if ($this->iListLevel > 0) {
211
            $indent = $hasLI ? ($indent . "- ") : ($indent . "  ");
212
        }
213
        return $indent;
214
    }
215
}