FilterHelper::decodeFilter()   B
last analyzed

Complexity

Conditions 11
Paths 11

Size

Total Lines 30
Code Lines 23

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 20
CRAP Score 11.0908

Importance

Changes 1
Bugs 0 Features 1
Metric Value
cc 11
eloc 23
c 1
b 0
f 1
nc 11
nop 3
dl 0
loc 30
ccs 20
cts 22
cp 0.9091
crap 11.0908
rs 7.3166

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * This file is based on code of tecnickcom/TCPDF PDF library.
5
 *
6
 * Original author Nicola Asuni ([email protected]) and
7
 * contributors (https://github.com/tecnickcom/TCPDF/graphs/contributors).
8
 *
9
 * @see https://github.com/tecnickcom/TCPDF
10
 *
11
 * Original code was licensed on the terms of the LGPL v3.
12
 *
13
 * ------------------------------------------------------------------------------
14
 *
15
 * @file This file is part of the PdfParser library.
16
 *
17
 * @author  Konrad Abicht <[email protected]>
18
 *
19
 * @date    2020-01-06
20
 *
21
 * @license LGPLv3
22
 *
23
 * @url     <https://github.com/smalot/pdfparser>
24
 *
25
 *  PdfParser is a pdf library written in PHP, extraction oriented.
26
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
27
 *
28
 *  This program is free software: you can redistribute it and/or modify
29
 *  it under the terms of the GNU Lesser General Public License as published by
30
 *  the Free Software Foundation, either version 3 of the License, or
31
 *  (at your option) any later version.
32
 *
33
 *  This program is distributed in the hope that it will be useful,
34
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
35
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
36
 *  GNU Lesser General Public License for more details.
37
 *
38
 *  You should have received a copy of the GNU Lesser General Public License
39
 *  along with this program.
40
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
41
 */
42
43
namespace Smalot\PdfParser\RawData;
44
45
use Smalot\PdfParser\Exception\NotImplementedException;
46
47
class FilterHelper
48
{
49
    protected $availableFilters = ['ASCIIHexDecode', 'ASCII85Decode', 'LZWDecode', 'FlateDecode', 'RunLengthDecode'];
50
51
    /**
52
     * Decode data using the specified filter type.
53
     *
54
     * @param string $filter Filter name
55
     * @param string $data   Data to decode
56
     *
57
     * @return string Decoded data string
58
     *
59
     * @throws \Exception
60
     * @throws \Smalot\PdfParser\Exception\NotImplementedException if a certain decode function is not implemented yet
61
     */
62 81
    public function decodeFilter(string $filter, string $data, int $decodeMemoryLimit = 0): string
63
    {
64
        switch ($filter) {
65 81
            case 'ASCIIHexDecode':
66 1
                return $this->decodeFilterASCIIHexDecode($data);
67
68 80
            case 'ASCII85Decode':
69 4
                return $this->decodeFilterASCII85Decode($data);
70
71 76
            case 'LZWDecode':
72
                return $this->decodeFilterLZWDecode($data);
73
74 76
            case 'FlateDecode':
75 70
                return $this->decodeFilterFlateDecode($data, $decodeMemoryLimit);
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->decodeFilt...ta, $decodeMemoryLimit) could return the type null which is incompatible with the type-hinted return string. Consider adding an additional type-check to rule them out.
Loading history...
76
77 6
            case 'RunLengthDecode':
78
                return $this->decodeFilterRunLengthDecode($data);
79
80 6
            case 'CCITTFaxDecode':
81 1
                throw new NotImplementedException('Decode CCITTFaxDecode not implemented yet.');
82 5
            case 'JBIG2Decode':
83 1
                throw new NotImplementedException('Decode JBIG2Decode not implemented yet.');
84 4
            case 'DCTDecode':
85 1
                throw new NotImplementedException('Decode DCTDecode not implemented yet.');
86 3
            case 'JPXDecode':
87 1
                throw new NotImplementedException('Decode JPXDecode not implemented yet.');
88 2
            case 'Crypt':
89 1
                throw new NotImplementedException('Decode Crypt not implemented yet.');
90
            default:
91 1
                return $data;
92
        }
93
    }
94
95
    /**
96
     * ASCIIHexDecode
97
     *
98
     * Decodes data encoded in an ASCII hexadecimal representation, reproducing the original binary data.
99
     *
100
     * @param string $data Data to decode
101
     *
102
     * @return string data string
103
     *
104
     * @throws \Exception
105
     */
106 1
    protected function decodeFilterASCIIHexDecode(string $data): string
107
    {
108
        // all white-space characters shall be ignored
109 1
        $data = preg_replace('/[\s]/', '', $data);
110
        // check for EOD character: GREATER-THAN SIGN (3Eh)
111 1
        $eod = strpos($data, '>');
112 1
        if (false !== $eod) {
113
            // remove EOD and extra data (if any)
114
            $data = substr($data, 0, $eod);
115
            $eod = true;
116
        }
117
        // get data length
118 1
        $data_length = \strlen($data);
119 1
        if (0 != ($data_length % 2)) {
120
            // odd number of hexadecimal digits
121
            if ($eod) {
122
                // EOD shall behave as if a 0 (zero) followed the last digit
123
                $data = substr($data, 0, -1).'0'.substr($data, -1);
124
            } else {
125
                throw new \Exception('decodeFilterASCIIHexDecode: invalid code');
126
            }
127
        }
128
        // check for invalid characters
129 1
        if (preg_match('/[^a-fA-F\d]/', $data) > 0) {
130
            throw new \Exception('decodeFilterASCIIHexDecode: invalid code');
131
        }
132
        // get one byte of binary data for each pair of ASCII hexadecimal digits
133 1
        $decoded = pack('H*', $data);
134
135 1
        return $decoded;
136
    }
137
138
    /**
139
     * ASCII85Decode
140
     *
141
     * Decodes data encoded in an ASCII base-85 representation, reproducing the original binary data.
142
     *
143
     * @param string $data Data to decode
144
     *
145
     * @return string data string
146
     *
147
     * @throws \Exception
148
     */
149 4
    protected function decodeFilterASCII85Decode(string $data): string
150
    {
151
        // initialize string to return
152 4
        $decoded = '';
153
        // all white-space characters shall be ignored
154 4
        $data = preg_replace('/[\s]/', '', $data);
155
        // remove start sequence 2-character sequence <~ (3Ch)(7Eh)
156 4
        if (0 === strpos($data, '<~')) {
157
            // remove EOD and extra data (if any)
158 1
            $data = substr($data, 2);
159
        }
160
        // check for EOD: 2-character sequence ~> (7Eh)(3Eh)
161 4
        $eod = strpos($data, '~>');
162 4
        if (\strlen($data) - 2 === $eod) {
163
            // remove EOD and extra data (if any)
164 2
            $data = substr($data, 0, $eod);
165
        }
166
        // data length
167 4
        $data_length = \strlen($data);
168
        // check for invalid characters
169 4
        if (preg_match('/[^\x21-\x75,\x74]/', $data) > 0) {
170
            throw new \Exception('decodeFilterASCII85Decode: invalid code');
171
        }
172
        // z sequence
173 4
        $zseq = \chr(0).\chr(0).\chr(0).\chr(0);
174
        // position inside a group of 4 bytes (0-3)
175 4
        $group_pos = 0;
176 4
        $tuple = 0;
177 4
        $pow85 = [85 * 85 * 85 * 85, 85 * 85 * 85, 85 * 85, 85, 1];
178
179
        // for each byte
180 4
        for ($i = 0; $i < $data_length; ++$i) {
181
            // get char value
182 4
            $char = \ord($data[$i]);
183 4
            if (122 == $char) { // 'z'
184
                if (0 == $group_pos) {
185
                    $decoded .= $zseq;
186
                } else {
187
                    throw new \Exception('decodeFilterASCII85Decode: invalid code');
188
                }
189
            } else {
190
                // the value represented by a group of 5 characters should never be greater than 2^32 - 1
191 4
                $tuple += (($char - 33) * $pow85[$group_pos]);
192 4
                if (4 == $group_pos) {
193 4
                    $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8).\chr($tuple);
194 4
                    $tuple = 0;
195 4
                    $group_pos = 0;
196
                } else {
197 4
                    ++$group_pos;
198
                }
199
            }
200
        }
201 4
        if ($group_pos > 1) {
202 3
            $tuple += $pow85[$group_pos - 1];
203
        }
204
        // last tuple (if any)
205
        switch ($group_pos) {
206 4
            case 4:
207
                $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8);
208
                break;
209
210 4
            case 3:
211
                $decoded .= \chr($tuple >> 24).\chr($tuple >> 16);
212
                break;
213
214 4
            case 2:
215 3
                $decoded .= \chr($tuple >> 24);
216 3
                break;
217
218 1
            case 1:
219
                throw new \Exception('decodeFilterASCII85Decode: invalid code');
220
        }
221
222 4
        return $decoded;
223
    }
224
225
    /**
226
     * FlateDecode
227
     *
228
     * Decompresses data encoded using the zlib/deflate compression method, reproducing the original text or binary data.
229
     *
230
     * @param string $data              Data to decode
231
     * @param int    $decodeMemoryLimit Memory limit on deflation
232
     *
233
     * @return string data string
234
     *
235
     * @throws \Exception
236
     */
237 70
    protected function decodeFilterFlateDecode(string $data, int $decodeMemoryLimit): ?string
238
    {
239
        // Uncatchable E_WARNING for "data error" is @ suppressed
240
        // so execution may proceed with an alternate decompression
241
        // method.
242 70
        $decoded = @gzuncompress($data, $decodeMemoryLimit);
243
244 70
        if (false === $decoded) {
245
            // If gzuncompress() failed, try again using the compress.zlib://
246
            // wrapper to decode it in a file-based context.
247
            // See: https://www.php.net/manual/en/function.gzuncompress.php#79042
248
            // Issue: https://github.com/smalot/pdfparser/issues/592
249 5
            $ztmp = tmpfile();
250 5
            if (false != $ztmp) {
251 5
                fwrite($ztmp, "\x1f\x8b\x08\x00\x00\x00\x00\x00".$data);
252 5
                $file = stream_get_meta_data($ztmp)['uri'];
253 5
                if (0 === $decodeMemoryLimit) {
254 5
                    $decoded = file_get_contents('compress.zlib://'.$file);
255
                } else {
256
                    $decoded = file_get_contents('compress.zlib://'.$file, false, null, 0, $decodeMemoryLimit);
257
                }
258 5
                fclose($ztmp);
259
            }
260
        }
261
262 70
        if (false === \is_string($decoded) || '' === $decoded) {
263
            // If the decoded string is empty, that means decoding failed.
264 4
            throw new \Exception('decodeFilterFlateDecode: invalid data');
265
        }
266
267 66
        return $decoded;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $decoded returns the type false which is incompatible with the type-hinted return null|string.
Loading history...
268
    }
269
270
    /**
271
     * LZWDecode
272
     *
273
     * Decompresses data encoded using the LZW (Lempel-Ziv-Welch) adaptive compression method, reproducing the original text or binary data.
274
     *
275
     * @param string $data Data to decode
276
     *
277
     * @return string Data string
278
     */
279
    protected function decodeFilterLZWDecode(string $data): string
280
    {
281
        // initialize string to return
282
        $decoded = '';
283
        // data length
284
        $data_length = \strlen($data);
285
        // convert string to binary string
286
        $bitstring = '';
287
        for ($i = 0; $i < $data_length; ++$i) {
288
            $bitstring .= \sprintf('%08b', \ord($data[$i]));
289
        }
290
        // get the number of bits
291
        $data_length = \strlen($bitstring);
292
        // initialize code length in bits
293
        $bitlen = 9;
294
        // initialize dictionary index
295
        $dix = 258;
296
        // initialize the dictionary (with the first 256 entries).
297
        $dictionary = [];
298
        for ($i = 0; $i < 256; ++$i) {
299
            $dictionary[$i] = \chr($i);
300
        }
301
        // previous val
302
        $prev_index = 0;
303
        // while we encounter EOD marker (257), read code_length bits
304
        while (($data_length > 0) && (257 != ($index = bindec(substr($bitstring, 0, $bitlen))))) {
305
            // remove read bits from string
306
            $bitstring = substr($bitstring, $bitlen);
307
            // update number of bits
308
            $data_length -= $bitlen;
309
            if (256 == $index) { // clear-table marker
310
                // reset code length in bits
311
                $bitlen = 9;
312
                // reset dictionary index
313
                $dix = 258;
314
                $prev_index = 256;
315
                // reset the dictionary (with the first 256 entries).
316
                $dictionary = [];
317
                for ($i = 0; $i < 256; ++$i) {
318
                    $dictionary[$i] = \chr($i);
319
                }
320
            } elseif (256 == $prev_index) {
321
                // first entry
322
                $decoded .= $dictionary[$index];
323
                $prev_index = $index;
324
            } else {
325
                // check if index exist in the dictionary
326
                if ($index < $dix) {
327
                    // index exist on dictionary
328
                    $decoded .= $dictionary[$index];
329
                    $dic_val = $dictionary[$prev_index].$dictionary[$index][0];
330
                    // store current index
331
                    $prev_index = $index;
332
                } else {
333
                    // index do not exist on dictionary
334
                    $dic_val = $dictionary[$prev_index].$dictionary[$prev_index][0];
335
                    $decoded .= $dic_val;
336
                }
337
                // update dictionary
338
                $dictionary[$dix] = $dic_val;
339
                ++$dix;
340
                // change bit length by case
341
                if (2047 == $dix) {
342
                    $bitlen = 12;
343
                } elseif (1023 == $dix) {
344
                    $bitlen = 11;
345
                } elseif (511 == $dix) {
346
                    $bitlen = 10;
347
                }
348
            }
349
        }
350
351
        return $decoded;
352
    }
353
354
    /**
355
     * RunLengthDecode
356
     *
357
     * Decompresses data encoded using a byte-oriented run-length encoding algorithm.
358
     *
359
     * @param string $data Data to decode
360
     */
361
    protected function decodeFilterRunLengthDecode(string $data): string
362
    {
363
        // initialize string to return
364
        $decoded = '';
365
        // data length
366
        $data_length = \strlen($data);
367
        $i = 0;
368
        while ($i < $data_length) {
369
            // get current byte value
370
            $byte = \ord($data[$i]);
371
            if (128 == $byte) {
372
                // a length value of 128 denote EOD
373
                break;
374
            } elseif ($byte < 128) {
375
                // if the length byte is in the range 0 to 127
376
                // the following length + 1 (1 to 128) bytes shall be copied literally during decompression
377
                $decoded .= substr($data, $i + 1, $byte + 1);
378
                // move to next block
379
                $i += ($byte + 2);
380
            } else {
381
                // if length is in the range 129 to 255,
382
                // the following single byte shall be copied 257 - length (2 to 128) times during decompression
383
                $decoded .= str_repeat($data[$i + 1], 257 - $byte);
384
                // move to next block
385
                $i += 2;
386
            }
387
        }
388
389
        return $decoded;
390
    }
391
392
    /**
393
     * @return array list of available filters
394
     */
395 67
    public function getAvailableFilters(): array
396
    {
397 67
        return $this->availableFilters;
398
    }
399
}
400