Passed
Push — master ( a216cc...768d1d )
by Konrad
02:19
created

Parser::parseObject()   C

Complexity

Conditions 15
Paths 58

Size

Total Lines 85
Code Lines 51

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 51
CRAP Score 15.0015

Importance

Changes 5
Bugs 2 Features 0
Metric Value
cc 15
eloc 51
c 5
b 2
f 0
nc 58
nop 3
dl 0
loc 85
ccs 51
cts 52
cp 0.9808
crap 15.0015
rs 5.9166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\Element\ElementArray;
34
use Smalot\PdfParser\Element\ElementBoolean;
35
use Smalot\PdfParser\Element\ElementDate;
36
use Smalot\PdfParser\Element\ElementHexa;
37
use Smalot\PdfParser\Element\ElementName;
38
use Smalot\PdfParser\Element\ElementNull;
39
use Smalot\PdfParser\Element\ElementNumeric;
40
use Smalot\PdfParser\Element\ElementString;
41
use Smalot\PdfParser\Element\ElementXRef;
42
use Smalot\PdfParser\RawData\RawDataParser;
43
44
/**
45
 * Class Parser
46
 */
47
class Parser
48
{
49
    /**
50
     * @var Config
51
     */
52
    private $config;
53
54
    /**
55
     * @var PDFObject[]
56
     */
57
    protected $objects = [];
58
59
    protected $rawDataParser;
60
61 37
    public function __construct($cfg = [], ?Config $config = null)
62
    {
63 37
        $this->config = $config ?: new Config();
64 37
        $this->rawDataParser = new RawDataParser($cfg, $this->config);
65 37
    }
66
67 1
    public function getConfig(): Config
68
    {
69 1
        return $this->config;
70
    }
71
72
    /**
73
     * @throws \Exception
74
     */
75 35
    public function parseFile(string $filename): Document
76
    {
77 35
        $content = file_get_contents($filename);
78
        /*
79
         * 2018/06/20 @doganoo as multiple times a
80
         * users have complained that the parseFile()
81
         * method dies silently, it is an better option
82
         * to remove the error control operator (@) and
83
         * let the users know that the method throws an exception
84
         * by adding @throws tag to PHPDoc.
85
         *
86
         * See here for an example: https://github.com/smalot/pdfparser/issues/204
87
         */
88 35
        return $this->parseContent($content);
89
    }
90
91
    /**
92
     * @param string $content PDF content to parse
93
     *
94
     * @throws \Exception if secured PDF file was detected
95
     * @throws \Exception if no object list was found
96
     */
97 35
    public function parseContent(string $content): Document
98
    {
99
        // Create structure from raw data.
100 35
        list($xref, $data) = $this->rawDataParser->parseData($content);
101
102 34
        if (isset($xref['trailer']['encrypt'])) {
103
            throw new \Exception('Secured pdf file are currently not supported.');
104
        }
105
106 34
        if (empty($data)) {
107
            throw new \Exception('Object list not found. Possible secured file.');
108
        }
109
110
        // Create destination object.
111 34
        $document = new Document();
112 34
        $this->objects = [];
113
114 34
        foreach ($data as $id => $structure) {
115 34
            $this->parseObject($id, $structure, $document);
116 34
            unset($data[$id]);
117
        }
118
119 34
        $document->setTrailer($this->parseTrailer($xref['trailer'], $document));
120 34
        $document->setObjects($this->objects);
121
122 34
        return $document;
123
    }
124
125 34
    protected function parseTrailer(array $structure, ?Document $document)
126
    {
127 34
        $trailer = [];
128
129 34
        foreach ($structure as $name => $values) {
130 34
            $name = ucfirst($name);
131
132 34
            if (is_numeric($values)) {
133 34
                $trailer[$name] = new ElementNumeric($values);
134 34
            } elseif (\is_array($values)) {
135 30
                $value = $this->parseTrailer($values, null);
136 30
                $trailer[$name] = new ElementArray($value, null);
137 34
            } elseif (false !== strpos($values, '_')) {
138 34
                $trailer[$name] = new ElementXRef($values, $document);
139
            } else {
140 30
                $trailer[$name] = $this->parseHeaderElement('(', $values, $document);
141
            }
142
        }
143
144 34
        return new Header($trailer, $document);
145
    }
146
147 35
    protected function parseObject(string $id, array $structure, ?Document $document)
148
    {
149 35
        $header = new Header([], $document);
150 35
        $content = '';
151
152 35
        foreach ($structure as $position => $part) {
153 35
            if (\is_int($part)) {
154
                $part = [null, null];
155
            }
156 35
            switch ($part[0]) {
157 35
                case '[':
158 12
                    $elements = [];
159
160 12
                    foreach ($part[1] as $sub_element) {
0 ignored issues
show
Bug introduced by
The expression $part[1] of type null is not traversable.
Loading history...
161 12
                        $sub_type = $sub_element[0];
162 12
                        $sub_value = $sub_element[1];
163 12
                        $elements[] = $this->parseHeaderElement($sub_type, $sub_value, $document);
164
                    }
165
166 12
                    $header = new Header($elements, $document);
167 12
                    break;
168
169 35
                case '<<':
170 35
                    $header = $this->parseHeader($part[1], $document);
0 ignored issues
show
Bug introduced by
$part[1] of type null is incompatible with the type array expected by parameter $structure of Smalot\PdfParser\Parser::parseHeader(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

170
                    $header = $this->parseHeader(/** @scrutinizer ignore-type */ $part[1], $document);
Loading history...
171 35
                    break;
172
173 35
                case 'stream':
174 35
                    $content = isset($part[3][0]) ? $part[3][0] : $part[1];
175
176 35
                    if ($header->get('Type')->equals('ObjStm')) {
177 9
                        $match = [];
178
179
                        // Split xrefs and contents.
180 9
                        preg_match('/^((\d+\s+\d+\s*)*)(.*)$/s', $content, $match);
181 9
                        $content = $match[3];
182
183
                        // Extract xrefs.
184 9
                        $xrefs = preg_split(
185 9
                            '/(\d+\s+\d+\s*)/s',
186 9
                            $match[1],
187 9
                            -1,
188 9
                          \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE
189
                        );
190 9
                        $table = [];
191
192 9
                        foreach ($xrefs as $xref) {
193 9
                            list($id, $position) = preg_split("/\s+/", trim($xref));
194 9
                            $table[$position] = $id;
195
                        }
196
197 9
                        ksort($table);
198
199 9
                        $ids = array_values($table);
200 9
                        $positions = array_keys($table);
201
202 9
                        foreach ($positions as $index => $position) {
0 ignored issues
show
Comprehensibility Bug introduced by
$position is overwriting a variable from outer foreach loop.
Loading history...
203 9
                            $id = $ids[$index].'_0';
204 9
                            $next_position = isset($positions[$index + 1]) ? $positions[$index + 1] : \strlen($content);
205 9
                            $sub_content = substr($content, $position, (int) $next_position - (int) $position);
206
207 9
                            $sub_header = Header::parse($sub_content, $document);
0 ignored issues
show
Bug introduced by
It seems like $document can also be of type null; however, parameter $document of Smalot\PdfParser\Header::parse() does only seem to accept Smalot\PdfParser\Document, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

207
                            $sub_header = Header::parse($sub_content, /** @scrutinizer ignore-type */ $document);
Loading history...
208 9
                            $object = PDFObject::factory($document, $sub_header, '', $this->config);
0 ignored issues
show
Bug introduced by
It seems like $document can also be of type null; however, parameter $document of Smalot\PdfParser\PDFObject::factory() does only seem to accept Smalot\PdfParser\Document, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

208
                            $object = PDFObject::factory(/** @scrutinizer ignore-type */ $document, $sub_header, '', $this->config);
Loading history...
209 9
                            $this->objects[$id] = $object;
210
                        }
211
212
                        // It is not necessary to store this content.
213
214 9
                        return;
215
                    }
216 34
                    break;
217
218
                default:
219 34
                    if ('null' != $part) {
220 34
                        $element = $this->parseHeaderElement($part[0], $part[1], $document);
221
222 34
                        if ($element) {
223 18
                            $header = new Header([$element], $document);
224
                        }
225
                    }
226 34
                    break;
227
            }
228
        }
229
230 34
        if (!isset($this->objects[$id])) {
231 34
            $this->objects[$id] = PDFObject::factory($document, $header, $content, $this->config);
232
        }
233 34
    }
234
235
    /**
236
     * @throws \Exception
237
     */
238 35
    protected function parseHeader(array $structure, ?Document $document): Header
239
    {
240 35
        $elements = [];
241 35
        $count = \count($structure);
242
243 35
        for ($position = 0; $position < $count; $position += 2) {
244 35
            $name = $structure[$position][1];
245 35
            $type = $structure[$position + 1][0];
246 35
            $value = $structure[$position + 1][1];
247
248 35
            $elements[$name] = $this->parseHeaderElement($type, $value, $document);
249
        }
250
251 35
        return new Header($elements, $document);
252
    }
253
254
    /**
255
     * @param string|array $value
256
     *
257
     * @return Element|Header|null
258
     *
259
     * @throws \Exception
260
     */
261 35
    protected function parseHeaderElement(?string $type, $value, ?Document $document)
262
    {
263 35
        switch ($type) {
264 35
            case '<<':
265 35
            case '>>':
266 34
                $header = $this->parseHeader($value, $document);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type string; however, parameter $structure of Smalot\PdfParser\Parser::parseHeader() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

266
                $header = $this->parseHeader(/** @scrutinizer ignore-type */ $value, $document);
Loading history...
267 34
                PDFObject::factory($document, $header, null, $this->config);
0 ignored issues
show
Bug introduced by
It seems like $document can also be of type null; however, parameter $document of Smalot\PdfParser\PDFObject::factory() does only seem to accept Smalot\PdfParser\Document, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

267
                PDFObject::factory(/** @scrutinizer ignore-type */ $document, $header, null, $this->config);
Loading history...
268
269 34
                return $header;
270
271 35
            case 'numeric':
272 34
                return new ElementNumeric($value);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type array; however, parameter $value of Smalot\PdfParser\Element...tNumeric::__construct() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

272
                return new ElementNumeric(/** @scrutinizer ignore-type */ $value);
Loading history...
273
274 35
            case 'boolean':
275 11
                return new ElementBoolean($value);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type array; however, parameter $value of Smalot\PdfParser\Element...tBoolean::__construct() does only seem to accept boolean|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

275
                return new ElementBoolean(/** @scrutinizer ignore-type */ $value);
Loading history...
276
277 35
            case 'null':
278 3
                return new ElementNull();
279
280 35
            case '(':
281 34
                if ($date = ElementDate::parse('('.$value.')', $document)) {
0 ignored issues
show
Bug introduced by
Are you sure $value of type array|string can be used in concatenation? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

281
                if ($date = ElementDate::parse('('./** @scrutinizer ignore-type */ $value.')', $document)) {
Loading history...
282 29
                    return $date;
283
                }
284
285 34
                return ElementString::parse('('.$value.')', $document);
0 ignored issues
show
Bug Best Practice introduced by
The expression return Smalot\PdfParser\...value . ')', $document) could also return false which is incompatible with the documented return type Smalot\PdfParser\Element...t\PdfParser\Header|null. Did you maybe forget to handle an error condition?

If the returned type also contains false, it is an indicator that maybe an error condition leading to the specific return statement remains unhandled.

Loading history...
286
287 35
            case '<':
288 13
                return $this->parseHeaderElement('(', ElementHexa::decode($value), $document);
0 ignored issues
show
Bug introduced by
It seems like $value can also be of type array; however, parameter $value of Smalot\PdfParser\Element\ElementHexa::decode() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

288
                return $this->parseHeaderElement('(', ElementHexa::decode(/** @scrutinizer ignore-type */ $value), $document);
Loading history...
289
290 35
            case '/':
291 35
                return ElementName::parse('/'.$value, $document);
0 ignored issues
show
Bug Best Practice introduced by
The expression return Smalot\PdfParser\.../' . $value, $document) could also return false which is incompatible with the documented return type Smalot\PdfParser\Element...t\PdfParser\Header|null. Did you maybe forget to handle an error condition?

If the returned type also contains false, it is an indicator that maybe an error condition leading to the specific return statement remains unhandled.

Loading history...
292
293 34
            case 'ojbref': // old mistake in tcpdf parser
294 34
            case 'objref':
295 34
                return new ElementXRef($value, $document);
296
297 34
            case '[':
298 34
                $values = [];
299
300 34
                if (\is_array($value)) {
301 34
                    foreach ($value as $sub_element) {
302 34
                        $sub_type = $sub_element[0];
303 34
                        $sub_value = $sub_element[1];
304 34
                        $values[] = $this->parseHeaderElement($sub_type, $sub_value, $document);
305
                    }
306
                }
307
308 34
                return new ElementArray($values, $document);
309
310 34
            case 'endstream':
311
            case 'obj': //I don't know what it means but got my project fixed.
312
            case '':
313
                // Nothing to do with.
314 34
                return null;
315
316
            default:
317
                throw new \Exception('Invalid type: "'.$type.'".');
318
        }
319
    }
320
}
321