Passed
Pull Request — master (#349)
by
unknown
01:53
created

Parser   B

Complexity

Total Complexity 46

Size/Duplication

Total Lines 279
Duplicated Lines 0 %

Test Coverage

Coverage 95.49%

Importance

Changes 21
Bugs 4 Features 2
Metric Value
eloc 129
c 21
b 4
f 2
dl 0
loc 279
ccs 127
cts 133
cp 0.9549
rs 8.72
wmc 46

7 Methods

Rating   Name   Duplication   Size   Complexity  
A parseContent() 0 26 4
A parseFile() 0 14 1
A parseTrailer() 0 20 5
A __construct() 0 3 1
A parseHeader() 0 14 2
C parseObject() 0 86 15
D parseHeaderElement() 0 57 18

How to fix   Complexity   

Complex Class

Complex classes like Parser often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Parser, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\Element\ElementArray;
34
use Smalot\PdfParser\Element\ElementBoolean;
35
use Smalot\PdfParser\Element\ElementDate;
36
use Smalot\PdfParser\Element\ElementHexa;
37
use Smalot\PdfParser\Element\ElementName;
38
use Smalot\PdfParser\Element\ElementNull;
39
use Smalot\PdfParser\Element\ElementNumeric;
40
use Smalot\PdfParser\Element\ElementString;
41
use Smalot\PdfParser\Element\ElementXRef;
42
use Smalot\PdfParser\RawData\RawDataParser;
43
44
/**
45
 * Class Parser
46
 */
47
class Parser
48
{
49
    /**
50
     * @var PDFObject[]
51
     */
52
    protected $objects = [];
53
54
    protected $rawDataParser;
55
56 23
    public function __construct($cfg = [])
57
    {
58 23
        $this->rawDataParser = new RawDataParser($cfg);
59 23
    }
60
61
    /**
62
     * @param string $filename
63
     *
64
     * @return Document
65
     *
66
     * @throws \Exception
67
     */
68 23
    public function parseFile($filename)
69
    {
70 23
        $content = file_get_contents($filename);
71
        /*
72
         * 2018/06/20 @doganoo as multiple times a
73
         * users have complained that the parseFile()
74
         * method dies silently, it is an better option
75
         * to remove the error control operator (@) and
76
         * let the users know that the method throws an exception
77
         * by adding @throws tag to PHPDoc.
78
         *
79
         * See here for an example: https://github.com/smalot/pdfparser/issues/204
80
         */
81 23
        return $this->parseContent($content);
82
    }
83
84
    /**
85
     * @param string $content PDF content to parse
86
     *
87
     * @return Document
88
     *
89
     * @throws \Exception if secured PDF file was detected
90
     * @throws \Exception if no object list was found
91
     */
92 23
    public function parseContent($content)
93
    {
94
        // Create structure from raw data.
95 23
        list($xref, $data) = $this->rawDataParser->parseData($content);
96
97 22
        if (isset($xref['trailer']['encrypt'])) {
98
            throw new \Exception('Secured pdf file are currently not supported.');
99
        }
100
101 22
        if (empty($data)) {
102
            throw new \Exception('Object list not found. Possible secured file.');
103
        }
104
105
        // Create destination object.
106 22
        $document = new Document();
107 22
        $this->objects = [];
108
109 22
        foreach ($data as $id => $structure) {
110 22
            $this->parseObject($id, $structure, $document);
111 22
            unset($data[$id]);
112
        }
113
114 22
        $document->setTrailer($this->parseTrailer($xref['trailer'], $document));
115 22
        $document->setObjects($this->objects);
116
117 22
        return $document;
118
    }
119
120 22
    protected function parseTrailer($structure, $document)
121
    {
122 22
        $trailer = [];
123
124 22
        foreach ($structure as $name => $values) {
125 22
            $name = ucfirst($name);
126
127 22
            if (is_numeric($values)) {
128 22
                $trailer[$name] = new ElementNumeric($values);
129 22
            } elseif (\is_array($values)) {
130 21
                $value = $this->parseTrailer($values, null);
131 21
                $trailer[$name] = new ElementArray($value, null);
0 ignored issues
show
Bug introduced by
$value of type Smalot\PdfParser\Header is incompatible with the type string expected by parameter $value of Smalot\PdfParser\Element...entArray::__construct(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

131
                $trailer[$name] = new ElementArray(/** @scrutinizer ignore-type */ $value, null);
Loading history...
132 22
            } elseif (false !== strpos($values, '_')) {
133 22
                $trailer[$name] = new ElementXRef($values, $document);
134
            } else {
135 21
                $trailer[$name] = $this->parseHeaderElement('(', $values, $document);
136
            }
137
        }
138
139 22
        return new Header($trailer, $document);
140
    }
141
142
    /**
143
     * @param string   $id
144
     * @param array    $structure
145
     * @param Document $document
146
     */
147 22
    protected function parseObject($id, $structure, $document)
148
    {
149 22
        $header = new Header([], $document);
150 22
        $content = '';
151
152 22
        foreach ($structure as $position => $part) {
153 22
            if (\is_int($part)) {
154
                $part = [null, null];
155
            }
156 22
            switch ($part[0]) {
157 22
                case '[':
158 7
                    $elements = [];
159
160 7
                    foreach ($part[1] as $sub_element) {
0 ignored issues
show
Bug introduced by
The expression $part[1] of type null is not traversable.
Loading history...
161 7
                        $sub_type = $sub_element[0];
162 7
                        $sub_value = $sub_element[1];
163 7
                        $elements[] = $this->parseHeaderElement($sub_type, $sub_value, $document);
164
                    }
165
166 7
                    $header = new Header($elements, $document);
0 ignored issues
show
Bug introduced by
It seems like $elements can also be of type Smalot\PdfParser\Header[]; however, parameter $elements of Smalot\PdfParser\Header::__construct() does only seem to accept Smalot\PdfParser\Element[], maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

166
                    $header = new Header(/** @scrutinizer ignore-type */ $elements, $document);
Loading history...
167 7
                    break;
168
169 22
                case '<<':
170 22
                    $header = $this->parseHeader($part[1], $document);
171 22
                    break;
172
173 22
                case 'stream':
174 22
                    $content = isset($part[3][0]) ? $part[3][0] : $part[1];
175
176 22
                    if ($header->get('Type')->equals('ObjStm')) {
177 5
                        $match = [];
178
179
                        // Split xrefs and contents.
180 5
                        preg_match('/^((\d+\s+\d+\s*)*)(.*)$/s', $content, $match);
181 5
                        $content = $match[3];
182
183
                        // Extract xrefs.
184 5
                        $xrefs = preg_split(
185 5
                            '/(\d+\s+\d+\s*)/s',
186 5
                            $match[1],
187 5
                            -1,
188 5
                          PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
189
                        );
190 5
                        $table = [];
191
192 5
                        foreach ($xrefs as $xref) {
193 5
                            list($id, $position) = explode(' ', trim($xref));
194 5
                            $table[$position] = $id;
195
                        }
196
197 5
                        ksort($table);
198
199 5
                        $ids = array_values($table);
200 5
                        $positions = array_keys($table);
201
202 5
                        foreach ($positions as $index => $position) {
0 ignored issues
show
Comprehensibility Bug introduced by
$position is overwriting a variable from outer foreach loop.
Loading history...
203 5
                            $id = $ids[$index].'_0';
204 5
                            $next_position = isset($positions[$index + 1]) ? $positions[$index + 1] : \strlen($content);
205 5
                            $sub_content = substr($content, $position, (int) $next_position - (int) $position);
206
207 5
                            $sub_header = Header::parse($sub_content, $document);
208 5
                            $object = PDFObject::factory($document, $sub_header, '');
209 5
                            $this->objects[$id] = $object;
210
                        }
211
212
                        // It is not necessary to store this content.
213 5
                        $content = '';
0 ignored issues
show
Unused Code introduced by
The assignment to $content is dead and can be removed.
Loading history...
214
215 5
                        return;
216
                    }
217 22
                    break;
218
219
                default:
220 22
                    if ('null' != $part) {
221 22
                        $element = $this->parseHeaderElement($part[0], $part[1], $document);
222
223 22
                        if ($element) {
224 16
                            $header = new Header([$element], $document);
0 ignored issues
show
Bug introduced by
array($element) of type array<integer,Smalot\PdfParser\Header> is incompatible with the type Smalot\PdfParser\Element[] expected by parameter $elements of Smalot\PdfParser\Header::__construct(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

224
                            $header = new Header(/** @scrutinizer ignore-type */ [$element], $document);
Loading history...
225
                        }
226
                    }
227 22
                    break;
228
            }
229
        }
230
231 22
        if (!isset($this->objects[$id])) {
232 22
            $this->objects[$id] = PDFObject::factory($document, $header, $content);
233
        }
234 22
    }
235
236
    /**
237
     * @param array    $structure
238
     * @param Document $document
239
     *
240
     * @return Header
241
     *
242
     * @throws \Exception
243
     */
244 22
    protected function parseHeader($structure, $document)
245
    {
246 22
        $elements = [];
247 22
        $count = \count($structure);
248
249 22
        for ($position = 0; $position < $count; $position += 2) {
250 22
            $name = $structure[$position][1];
251 22
            $type = $structure[$position + 1][0];
252 22
            $value = $structure[$position + 1][1];
253
254 22
            $elements[$name] = $this->parseHeaderElement($type, $value, $document);
255
        }
256
257 22
        return new Header($elements, $document);
258
    }
259
260
    /**
261
     * @param string       $type
262
     * @param string|array $value
263
     * @param Document     $document
264
     *
265
     * @return Element|Header|null
266
     *
267
     * @throws \Exception
268
     */
269 22
    protected function parseHeaderElement($type, $value, $document)
270
    {
271 22
        switch ($type) {
272 22
            case '<<':
273 22
            case '>>':
274 22
                $header = $this->parseHeader($value, $document);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type string; however, parameter $structure of Smalot\PdfParser\Parser::parseHeader() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

274
                $header = $this->parseHeader(/** @scrutinizer ignore-type */ $value, $document);
Loading history...
275 22
                PDFObject::factory($document, $header, null);
276
277 22
                return $header;
278
279 22
            case 'numeric':
280 22
                return new ElementNumeric($value);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type array; however, parameter $value of Smalot\PdfParser\Element...tNumeric::__construct() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

280
                return new ElementNumeric(/** @scrutinizer ignore-type */ $value);
Loading history...
281
282 22
            case 'boolean':
283 6
                return new ElementBoolean($value);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type array; however, parameter $value of Smalot\PdfParser\Element...tBoolean::__construct() does only seem to accept boolean|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

283
                return new ElementBoolean(/** @scrutinizer ignore-type */ $value);
Loading history...
284
285 22
            case 'null':
286 3
                return new ElementNull();
287
288 22
            case '(':
289 22
                if ($date = ElementDate::parse('('.$value.')', $document)) {
0 ignored issues
show
Bug introduced by
Are you sure $value of type array|string can be used in concatenation? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

289
                if ($date = ElementDate::parse('('./** @scrutinizer ignore-type */ $value.')', $document)) {
Loading history...
290 20
                    return $date;
291
                }
292
293 22
                return ElementString::parse('('.$value.')', $document);
0 ignored issues
show
Bug Best Practice introduced by
The expression return Smalot\PdfParser\...value . ')', $document) could also return false which is incompatible with the documented return type Smalot\PdfParser\Element...t\PdfParser\Header|null. Did you maybe forget to handle an error condition?

If the returned type also contains false, it is an indicator that maybe an error condition leading to the specific return statement remains unhandled.

Loading history...
294
295 22
            case '<':
296 7
                return $this->parseHeaderElement('(', ElementHexa::decode($value, $document), $document);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type array; however, parameter $value of Smalot\PdfParser\Element\ElementHexa::decode() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

296
                return $this->parseHeaderElement('(', ElementHexa::decode(/** @scrutinizer ignore-type */ $value, $document), $document);
Loading history...
297
298 22
            case '/':
299 22
                return ElementName::parse('/'.$value, $document);
0 ignored issues
show
Bug Best Practice introduced by
The expression return Smalot\PdfParser\.../' . $value, $document) could also return false which is incompatible with the documented return type Smalot\PdfParser\Element...t\PdfParser\Header|null. Did you maybe forget to handle an error condition?

If the returned type also contains false, it is an indicator that maybe an error condition leading to the specific return statement remains unhandled.

Loading history...
300
301 22
            case 'ojbref': // old mistake in tcpdf parser
302 22
            case 'objref':
303 22
                return new ElementXRef($value, $document);
304
305 22
            case '[':
306 22
                $values = [];
307
308 22
                if (\is_array($value)) {
309 22
                    foreach ($value as $sub_element) {
310 22
                        $sub_type = $sub_element[0];
311 22
                        $sub_value = $sub_element[1];
312 22
                        $values[] = $this->parseHeaderElement($sub_type, $sub_value, $document);
313
                    }
314
                }
315
316 22
                return new ElementArray($values, $document);
0 ignored issues
show
Bug introduced by
$values of type Smalot\PdfParser\Header[]|array is incompatible with the type string expected by parameter $value of Smalot\PdfParser\Element...entArray::__construct(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

316
                return new ElementArray(/** @scrutinizer ignore-type */ $values, $document);
Loading history...
317
318 22
            case 'endstream':
319
            case 'obj': //I don't know what it means but got my project fixed.
320
            case '':
321
                // Nothing to do with.
322 22
                return null;
323
324
            default:
325
                throw new \Exception('Invalid type: "'.$type.'".');
326
        }
327
    }
328
}
329