This project does not seem to handle request data directly as such no vulnerable execution paths were found.
include
, or for example
via PHP's auto-loading mechanism.
These results are based on our legacy PHP analysis, consider migrating to our new PHP analysis engine instead. Learn more
1 | <?php |
||
2 | /****************************************************************************** |
||
3 | * Copyright (c) 2010 Jevon Wright and others. |
||
4 | * All rights reserved. This program and the accompanying materials |
||
5 | * are made available under the terms of the Eclipse Public License v1.0 |
||
6 | * which accompanies this distribution, and is available at |
||
7 | * http://www.eclipse.org/legal/epl-v10.html |
||
8 | * |
||
9 | * Contributors: |
||
10 | * Jevon Wright - initial API and implementation |
||
11 | ****************************************************************************/ |
||
12 | |||
13 | /** |
||
14 | * Tries to convert the given HTML into a plain text format - best suited for |
||
15 | * e-mail display, etc. |
||
16 | * |
||
17 | * <p>In particular, it tries to maintain the following features: |
||
18 | * <ul> |
||
19 | * <li>Links are maintained, with the 'href' copied over |
||
20 | * <li>Information in the <head> is lost |
||
21 | * </ul> |
||
22 | * |
||
23 | * @param string $html |
||
24 | * |
||
25 | * @throws Html2TextException |
||
26 | * @internal param \the $html input HTML |
||
27 | * @return string HTML converted, as best as possible, to text |
||
28 | */ |
||
29 | function convert_html_to_text($html) |
||
30 | { |
||
31 | $html = fix_newlines($html); |
||
32 | |||
33 | $doc = new DOMDocument(); |
||
34 | if (!$doc->loadHTML($html)) { |
||
35 | throw new Html2TextException('Could not load HTML - badly formed?', $html); |
||
36 | } |
||
37 | |||
38 | $output = iterate_over_node($doc); |
||
39 | |||
40 | // remove leading and trailing spaces on each line |
||
41 | $output = preg_replace("/[ \t]*\n[ \t]*/im", "\n", $output); |
||
42 | |||
43 | // remove leading and trailing whitespace |
||
44 | $output = trim($output); |
||
45 | |||
46 | return $output; |
||
47 | } |
||
48 | |||
49 | /** |
||
50 | * Unify newlines; in particular, \r\n becomes \n, and |
||
51 | * then \r becomes \n. This means that all newlines (Unix, Windows, Mac) |
||
52 | * all become \ns. |
||
53 | * |
||
54 | * @param mixed $text |
||
55 | * |
||
56 | * @return string fixed text |
||
57 | */ |
||
58 | function fix_newlines($text) |
||
59 | { |
||
60 | // replace \r\n to \n |
||
61 | $text = str_replace("\r\n", "\n", $text); |
||
62 | // remove \rs |
||
63 | $text = str_replace("\r", "\n", $text); |
||
64 | |||
65 | return $text; |
||
66 | } |
||
67 | |||
68 | /** |
||
69 | * @param $node |
||
70 | * |
||
71 | * @return null|string |
||
72 | */ |
||
73 | View Code Duplication | function next_child_name($node) |
|
0 ignored issues
–
show
|
|||
74 | { |
||
75 | // get the next child |
||
76 | $nextNode = $node->nextSibling; |
||
77 | while (null !== $nextNode) { |
||
78 | if ($nextNode instanceof DOMElement) { |
||
79 | break; |
||
80 | } |
||
81 | $nextNode = $nextNode->nextSibling; |
||
82 | } |
||
83 | $nextName = null; |
||
84 | if ($nextNode instanceof DOMElement && null !== $nextNode) { |
||
85 | $nextName = mb_strtolower($nextNode->nodeName); |
||
86 | } |
||
87 | |||
88 | return $nextName; |
||
89 | } |
||
90 | |||
91 | /** |
||
92 | * @param $node |
||
93 | * |
||
94 | * @return null|string |
||
95 | */ |
||
96 | View Code Duplication | function prev_child_name($node) |
|
0 ignored issues
–
show
This function seems to be duplicated in your project.
Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation. You can also find more detailed suggestions in the “Code” section of your repository. ![]() |
|||
97 | { |
||
98 | // get the previous child |
||
99 | $nextNode = $node->previousSibling; |
||
100 | while (null !== $nextNode) { |
||
101 | if ($nextNode instanceof DOMElement) { |
||
102 | break; |
||
103 | } |
||
104 | $nextNode = $nextNode->previousSibling; |
||
105 | } |
||
106 | $nextName = null; |
||
107 | if ($nextNode instanceof DOMElement && null !== $nextNode) { |
||
108 | $nextName = mb_strtolower($nextNode->nodeName); |
||
109 | } |
||
110 | |||
111 | return $nextName; |
||
112 | } |
||
113 | |||
114 | /** |
||
115 | * @param DOMNode $node |
||
116 | * |
||
117 | * @return string |
||
118 | */ |
||
119 | function iterate_over_node($node) |
||
120 | { |
||
121 | if ($node instanceof DOMText) { |
||
122 | return preg_replace('/\\s+/im', ' ', $node->wholeText); |
||
123 | } |
||
124 | if ($node instanceof DOMDocumentType) { |
||
125 | // ignore |
||
126 | return ''; |
||
127 | } |
||
128 | |||
129 | $nextName = next_child_name($node); |
||
130 | $prevName = prev_child_name($node); |
||
0 ignored issues
–
show
$prevName is not used, you could remove the assignment.
This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently. $myVar = 'Value';
$higher = false;
if (rand(1, 6) > 3) {
$higher = true;
} else {
$higher = false;
}
Both the ![]() |
|||
131 | |||
132 | $name = mb_strtolower($node->nodeName); |
||
133 | |||
134 | // start whitespace |
||
135 | switch ($name) { |
||
136 | case 'hr': |
||
137 | return "------\n"; |
||
138 | case 'style': |
||
139 | case 'head': |
||
140 | case 'title': |
||
141 | case 'meta': |
||
142 | case 'script': |
||
143 | // ignore these tags |
||
144 | return ''; |
||
145 | case 'h1': |
||
146 | case 'h2': |
||
147 | case 'h3': |
||
148 | case 'h4': |
||
149 | case 'h5': |
||
150 | case 'h6': |
||
151 | // add two newlines |
||
152 | $output = "\n"; |
||
153 | break; |
||
154 | case 'p': |
||
155 | case 'div': |
||
156 | // add one line |
||
157 | $output = "\n"; |
||
158 | break; |
||
159 | default: |
||
160 | // print out contents of unknown tags |
||
161 | $output = ''; |
||
162 | break; |
||
163 | } |
||
164 | |||
165 | // debug |
||
166 | //$output .= "[$name,$nextName]"; |
||
167 | |||
168 | for ($i = 0; $i < $node->childNodes->length; ++$i) { |
||
169 | $n = $node->childNodes->item($i); |
||
170 | |||
171 | $text = iterate_over_node($n); |
||
172 | |||
173 | $output .= $text; |
||
174 | } |
||
175 | |||
176 | // end whitespace |
||
177 | switch ($name) { |
||
178 | case 'style': |
||
179 | case 'head': |
||
180 | case 'title': |
||
181 | case 'meta': |
||
182 | case 'script': |
||
183 | // ignore these tags |
||
184 | return ''; |
||
185 | case 'h1': |
||
186 | case 'h2': |
||
187 | case 'h3': |
||
188 | case 'h4': |
||
189 | case 'h5': |
||
190 | case 'h6': |
||
191 | $output .= "\n"; |
||
192 | break; |
||
193 | case 'p': |
||
194 | case 'br': |
||
195 | // add one line |
||
196 | if ('div' !== $nextName) { |
||
197 | $output .= "\n"; |
||
198 | } |
||
199 | break; |
||
200 | case 'div': |
||
201 | // add one line only if the next child isn't a div |
||
202 | if ('div' !== $nextName && null !== $nextName) { |
||
203 | $output .= "\n"; |
||
204 | } |
||
205 | break; |
||
206 | case 'a': |
||
207 | // links are returned in [text](link) format |
||
208 | $href = $node->getAttribute('href'); |
||
209 | if (null === $href) { |
||
210 | // it doesn't link anywhere |
||
211 | if (null !== $node->getAttribute('name')) { |
||
212 | $output = "[$output]"; |
||
213 | } |
||
214 | } else { |
||
215 | if ($href == $output) { |
||
216 | // link to the same address: just use link |
||
217 | $output; |
||
218 | } else { |
||
219 | // replace it |
||
220 | $output = "[$output]($href)"; |
||
221 | } |
||
222 | } |
||
223 | |||
224 | // does the next node require_once additional whitespace? |
||
225 | switch ($nextName) { |
||
226 | case 'h1': |
||
227 | case 'h2': |
||
228 | case 'h3': |
||
229 | case 'h4': |
||
230 | case 'h5': |
||
231 | case 'h6': |
||
232 | $output .= "\n"; |
||
233 | break; |
||
234 | } |
||
235 | |||
236 | // no break |
||
237 | default: |
||
238 | // do nothing |
||
239 | } |
||
240 | |||
241 | return $output; |
||
242 | } |
||
243 | |||
244 | /** |
||
245 | * Class Html2TextException |
||
246 | */ |
||
247 | class Html2TextException extends Exception |
||
248 | { |
||
249 | public $more_info; |
||
250 | |||
251 | /** |
||
252 | * @param string $message |
||
253 | * @param string $more_info |
||
254 | */ |
||
255 | public function __construct($message = '', $more_info = '') |
||
256 | { |
||
257 | parent::__construct($message); |
||
258 | $this->more_info = $more_info; |
||
259 | } |
||
260 | } |
||
261 |
Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.
You can also find more detailed suggestions in the “Code” section of your repository.