FilterHelper::decodeFilterLZWDecode()   C
last analyzed

Complexity

Conditions 12
Paths 48

Size

Total Lines 73
Code Lines 42

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 156

Importance

Changes 1
Bugs 0 Features 1
Metric Value
cc 12
eloc 42
c 1
b 0
f 1
nc 48
nop 1
dl 0
loc 73
ccs 0
cts 41
cp 0
crap 156
rs 6.9666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * This file is based on code of tecnickcom/TCPDF PDF library.
5
 *
6
 * Original author Nicola Asuni ([email protected]) and
7
 * contributors (https://github.com/tecnickcom/TCPDF/graphs/contributors).
8
 *
9
 * @see https://github.com/tecnickcom/TCPDF
10
 *
11
 * Original code was licensed on the terms of the LGPL v3.
12
 *
13
 * ------------------------------------------------------------------------------
14
 *
15
 * @file This file is part of the PdfParser library.
16
 *
17
 * @author  Konrad Abicht <[email protected]>
18
 *
19
 * @date    2020-01-06
20
 *
21
 * @license LGPLv3
22
 *
23
 * @url     <https://github.com/smalot/pdfparser>
24
 *
25
 *  PdfParser is a pdf library written in PHP, extraction oriented.
26
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
27
 *
28
 *  This program is free software: you can redistribute it and/or modify
29
 *  it under the terms of the GNU Lesser General Public License as published by
30
 *  the Free Software Foundation, either version 3 of the License, or
31
 *  (at your option) any later version.
32
 *
33
 *  This program is distributed in the hope that it will be useful,
34
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
35
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
36
 *  GNU Lesser General Public License for more details.
37
 *
38
 *  You should have received a copy of the GNU Lesser General Public License
39
 *  along with this program.
40
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
41
 */
42
43
namespace Smalot\PdfParser\RawData;
44
45
use Smalot\PdfParser\Exception\NotImplementedException;
46
47
class FilterHelper
48
{
49
    protected $availableFilters = ['ASCIIHexDecode', 'ASCII85Decode', 'LZWDecode', 'FlateDecode', 'RunLengthDecode'];
50
51
    /**
52
     * Decode data using the specified filter type.
53
     *
54
     * @param string $filter Filter name
55
     * @param string $data   Data to decode
56
     *
57
     * @return string Decoded data string
58
     *
59
     * @throws \Exception
60
     * @throws \Smalot\PdfParser\Exception\NotImplementedException if a certain decode function is not implemented yet
61
     */
62 81
    public function decodeFilter(string $filter, string $data, int $decodeMemoryLimit = 0): string
63
    {
64
        switch ($filter) {
65 81
            case 'ASCIIHexDecode':
66 1
                return $this->decodeFilterASCIIHexDecode($data);
67
68 80
            case 'ASCII85Decode':
69 4
                return $this->decodeFilterASCII85Decode($data);
70
71 76
            case 'LZWDecode':
72
                return $this->decodeFilterLZWDecode($data);
73
74 76
            case 'FlateDecode':
75 70
                return $this->decodeFilterFlateDecode($data, $decodeMemoryLimit);
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->decodeFilt...ta, $decodeMemoryLimit) could return the type null which is incompatible with the type-hinted return string. Consider adding an additional type-check to rule them out.
Loading history...
76
77 6
            case 'RunLengthDecode':
78
                return $this->decodeFilterRunLengthDecode($data);
79
80 6
            case 'CCITTFaxDecode':
81 1
                throw new NotImplementedException('Decode CCITTFaxDecode not implemented yet.');
82 5
            case 'JBIG2Decode':
83 1
                throw new NotImplementedException('Decode JBIG2Decode not implemented yet.');
84 4
            case 'DCTDecode':
85 1
                throw new NotImplementedException('Decode DCTDecode not implemented yet.');
86 3
            case 'JPXDecode':
87 1
                throw new NotImplementedException('Decode JPXDecode not implemented yet.');
88 2
            case 'Crypt':
89 1
                throw new NotImplementedException('Decode Crypt not implemented yet.');
90
            default:
91 1
                return $data;
92
        }
93
    }
94
95
    /**
96
     * ASCIIHexDecode
97
     *
98
     * Decodes data encoded in an ASCII hexadecimal representation, reproducing the original binary data.
99
     *
100
     * @param string $data Data to decode
101
     *
102
     * @return string data string
103
     *
104
     * @throws \Exception
105
     */
106 1
    protected function decodeFilterASCIIHexDecode(string $data): string
107
    {
108
        // all white-space characters shall be ignored
109 1
        $data = preg_replace('/[\s]/', '', $data);
110
        // check for EOD character: GREATER-THAN SIGN (3Eh)
111 1
        $eod = strpos($data, '>');
112 1
        if (false !== $eod) {
113
            // remove EOD and extra data (if any)
114
            $data = substr($data, 0, $eod);
115
            $eod = true;
116
        }
117
        // get data length
118 1
        $data_length = \strlen($data);
119 1
        if (0 != ($data_length % 2)) {
120
            // odd number of hexadecimal digits
121
            if ($eod) {
122
                // EOD shall behave as if a 0 (zero) followed the last digit
123
                $data = substr($data, 0, -1).'0'.substr($data, -1);
124
            } else {
125
                throw new \Exception('decodeFilterASCIIHexDecode: invalid code');
126
            }
127
        }
128
        // check for invalid characters
129 1
        if (preg_match('/[^a-fA-F\d]/', $data) > 0) {
130
            throw new \Exception('decodeFilterASCIIHexDecode: invalid code');
131
        }
132
        // get one byte of binary data for each pair of ASCII hexadecimal digits
133 1
        $decoded = pack('H*', $data);
134
135 1
        return $decoded;
136
    }
137
138
    /**
139
     * ASCII85Decode
140
     *
141
     * Decodes data encoded in an ASCII base-85 representation, reproducing the original binary data.
142
     *
143
     * @param string $data Data to decode
144
     *
145
     * @return string data string
146
     *
147
     * @throws \Exception
148
     */
149 4
    protected function decodeFilterASCII85Decode(string $data): string
150
    {
151
        // initialize string to return
152 4
        $decoded = '';
153
        // all white-space characters shall be ignored
154 4
        $data = preg_replace('/[\s]/', '', $data);
155
        // remove start sequence 2-character sequence <~ (3Ch)(7Eh)
156 4
        if (0 === strpos($data, '<~')) {
157
            // remove EOD and extra data (if any)
158 1
            $data = substr($data, 2);
159
        }
160
        // check for EOD: 2-character sequence ~> (7Eh)(3Eh)
161 4
        $eod = strpos($data, '~>');
162 4
        if (\strlen($data) - 2 === $eod) {
163
            // remove EOD and extra data (if any)
164 2
            $data = substr($data, 0, $eod);
165
        }
166
        // data length
167 4
        $data_length = \strlen($data);
168
        // check for invalid characters
169 4
        if (preg_match('/[^\x21-\x75,\x74]/', $data) > 0) {
170
            throw new \Exception('decodeFilterASCII85Decode: invalid code');
171
        }
172
        // z sequence
173 4
        $zseq = \chr(0).\chr(0).\chr(0).\chr(0);
174
        // position inside a group of 4 bytes (0-3)
175 4
        $group_pos = 0;
176 4
        $tuple = 0;
177 4
        $pow85 = [85 * 85 * 85 * 85, 85 * 85 * 85, 85 * 85, 85, 1];
178
179
        // for each byte
180 4
        for ($i = 0; $i < $data_length; ++$i) {
181
            // get char value
182 4
            $char = \ord($data[$i]);
183 4
            if (122 == $char) { // 'z'
184
                if (0 == $group_pos) {
185
                    $decoded .= $zseq;
186
                } else {
187
                    throw new \Exception('decodeFilterASCII85Decode: invalid code');
188
                }
189
            } else {
190
                // the value represented by a group of 5 characters should never be greater than 2^32 - 1
191 4
                $tuple += (($char - 33) * $pow85[$group_pos]);
192 4
                if (4 == $group_pos) {
193 4
                    $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8).\chr($tuple);
194 4
                    $tuple = 0;
195 4
                    $group_pos = 0;
196
                } else {
197 4
                    ++$group_pos;
198
                }
199
            }
200
        }
201 4
        if ($group_pos > 1) {
202 3
            $tuple += $pow85[$group_pos - 1];
203
        }
204
        // last tuple (if any)
205
        switch ($group_pos) {
206 4
            case 4:
207
                $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8);
208
                break;
209
210 4
            case 3:
211
                $decoded .= \chr($tuple >> 24).\chr($tuple >> 16);
212
                break;
213
214 4
            case 2:
215 3
                $decoded .= \chr($tuple >> 24);
216 3
                break;
217
218 1
            case 1:
219
                throw new \Exception('decodeFilterASCII85Decode: invalid code');
220
        }
221
222 4
        return $decoded;
223
    }
224
225
    /**
226
     * FlateDecode
227
     *
228
     * Decompresses data encoded using the zlib/deflate compression method, reproducing the original text or binary data.
229
     *
230
     * @param string $data              Data to decode
231
     * @param int    $decodeMemoryLimit Memory limit on deflation
232
     *
233
     * @return string data string
234
     *
235
     * @throws \Exception
236
     */
237 70
    protected function decodeFilterFlateDecode(string $data, int $decodeMemoryLimit): ?string
238
    {
239
        // Uncatchable E_WARNING for "data error" is @ suppressed
240
        // so execution may proceed with an alternate decompression
241
        // method.
242 70
        $decoded = @gzuncompress($data, $decodeMemoryLimit);
243
244 70
        if (false === $decoded) {
245
            // If gzuncompress() failed, try again using the compress.zlib://
246
            // wrapper to decode it in a file-based context.
247
            // See: https://www.php.net/manual/en/function.gzuncompress.php#79042
248
            // Issue: https://github.com/smalot/pdfparser/issues/592
249 5
            $ztmp = tmpfile();
250 5
            if (false != $ztmp) {
251 5
                fwrite($ztmp, "\x1f\x8b\x08\x00\x00\x00\x00\x00".$data);
252 5
                $file = stream_get_meta_data($ztmp)['uri'];
253 5
                if (0 === $decodeMemoryLimit) {
254 5
                    $decoded = file_get_contents('compress.zlib://'.$file);
255
                } else {
256
                    $decoded = file_get_contents('compress.zlib://'.$file, false, null, 0, $decodeMemoryLimit);
257
                }
258 5
                fclose($ztmp);
259
            }
260
        }
261
262 70
        if (false === \is_string($decoded) || '' === $decoded) {
263
            // If the decoded string is empty, that means decoding failed.
264 4
            throw new \Exception('decodeFilterFlateDecode: invalid data');
265
        }
266
267 66
        return $decoded;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $decoded returns the type false which is incompatible with the type-hinted return null|string.
Loading history...
268
    }
269
270
    /**
271
     * LZWDecode
272
     *
273
     * Decompresses data encoded using the LZW (Lempel-Ziv-Welch) adaptive compression method, reproducing the original text or binary data.
274
     *
275
     * @param string $data Data to decode
276
     *
277
     * @return string Data string
278
     */
279
    protected function decodeFilterLZWDecode(string $data): string
280
    {
281
        // initialize string to return
282
        $decoded = '';
283
        // data length
284
        $data_length = \strlen($data);
285
        // convert string to binary string
286
        $bitstring = '';
287
        for ($i = 0; $i < $data_length; ++$i) {
288
            $bitstring .= \sprintf('%08b', \ord($data[$i]));
289
        }
290
        // get the number of bits
291
        $data_length = \strlen($bitstring);
292
        // initialize code length in bits
293
        $bitlen = 9;
294
        // initialize dictionary index
295
        $dix = 258;
296
        // initialize the dictionary (with the first 256 entries).
297
        $dictionary = [];
298
        for ($i = 0; $i < 256; ++$i) {
299
            $dictionary[$i] = \chr($i);
300
        }
301
        // previous val
302
        $prev_index = 0;
303
        // while we encounter EOD marker (257), read code_length bits
304
        while (($data_length > 0) && (257 != ($index = bindec(substr($bitstring, 0, $bitlen))))) {
305
            // remove read bits from string
306
            $bitstring = substr($bitstring, $bitlen);
307
            // update number of bits
308
            $data_length -= $bitlen;
309
            if (256 == $index) { // clear-table marker
310
                // reset code length in bits
311
                $bitlen = 9;
312
                // reset dictionary index
313
                $dix = 258;
314
                $prev_index = 256;
315
                // reset the dictionary (with the first 256 entries).
316
                $dictionary = [];
317
                for ($i = 0; $i < 256; ++$i) {
318
                    $dictionary[$i] = \chr($i);
319
                }
320
            } elseif (256 == $prev_index) {
321
                // first entry
322
                $decoded .= $dictionary[$index];
323
                $prev_index = $index;
324
            } else {
325
                // check if index exist in the dictionary
326
                if ($index < $dix) {
327
                    // index exist on dictionary
328
                    $decoded .= $dictionary[$index];
329
                    $dic_val = $dictionary[$prev_index].$dictionary[$index][0];
330
                    // store current index
331
                    $prev_index = $index;
332
                } else {
333
                    // index do not exist on dictionary
334
                    $dic_val = $dictionary[$prev_index].$dictionary[$prev_index][0];
335
                    $decoded .= $dic_val;
336
                }
337
                // update dictionary
338
                $dictionary[$dix] = $dic_val;
339
                ++$dix;
340
                // change bit length by case
341
                if (2047 == $dix) {
342
                    $bitlen = 12;
343
                } elseif (1023 == $dix) {
344
                    $bitlen = 11;
345
                } elseif (511 == $dix) {
346
                    $bitlen = 10;
347
                }
348
            }
349
        }
350
351
        return $decoded;
352
    }
353
354
    /**
355
     * RunLengthDecode
356
     *
357
     * Decompresses data encoded using a byte-oriented run-length encoding algorithm.
358
     *
359
     * @param string $data Data to decode
360
     */
361
    protected function decodeFilterRunLengthDecode(string $data): string
362
    {
363
        // initialize string to return
364
        $decoded = '';
365
        // data length
366
        $data_length = \strlen($data);
367
        $i = 0;
368
        while ($i < $data_length) {
369
            // get current byte value
370
            $byte = \ord($data[$i]);
371
            if (128 == $byte) {
372
                // a length value of 128 denote EOD
373
                break;
374
            } elseif ($byte < 128) {
375
                // if the length byte is in the range 0 to 127
376
                // the following length + 1 (1 to 128) bytes shall be copied literally during decompression
377
                $decoded .= substr($data, $i + 1, $byte + 1);
378
                // move to next block
379
                $i += ($byte + 2);
380
            } else {
381
                // if length is in the range 129 to 255,
382
                // the following single byte shall be copied 257 - length (2 to 128) times during decompression
383
                $decoded .= str_repeat($data[$i + 1], 257 - $byte);
384
                // move to next block
385
                $i += 2;
386
            }
387
        }
388
389
        return $decoded;
390
    }
391
392
    /**
393
     * @return array list of available filters
394
     */
395 67
    public function getAvailableFilters(): array
396
    {
397 67
        return $this->availableFilters;
398
    }
399
}
400