Completed
Push — feature/remove-tcpdf-lib ( d32e07 )
by Konrad
02:34
created

FilterHelper::decodeFilter()   B

Complexity

Conditions 11
Paths 11

Size

Total Lines 35
Code Lines 28

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 1
Metric Value
cc 11
eloc 28
c 1
b 0
f 1
nc 11
nop 2
dl 0
loc 35
rs 7.3166

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * This file is based on code of tecnickcom/TCPDF PDF library.
5
 *
6
 * Original author Nicola Asuni ([email protected]) and
7
 * contributors (https://github.com/tecnickcom/TCPDF/graphs/contributors).
8
 *
9
 * @see https://github.com/tecnickcom/TCPDF
10
 *
11
 * Original code was licensed on the terms of the LGPL v3.
12
 *
13
 * ------------------------------------------------------------------------------
14
 *
15
 * @file This file is part of the PdfParser library.
16
 *
17
 * @author  Konrad Abicht <[email protected]>
18
 * @date    2020-01-06
19
 *
20
 * @license LGPLv3
21
 * @url     <https://github.com/smalot/pdfparser>
22
 *
23
 *  PdfParser is a pdf library written in PHP, extraction oriented.
24
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
25
 *
26
 *  This program is free software: you can redistribute it and/or modify
27
 *  it under the terms of the GNU Lesser General Public License as published by
28
 *  the Free Software Foundation, either version 3 of the License, or
29
 *  (at your option) any later version.
30
 *
31
 *  This program is distributed in the hope that it will be useful,
32
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
33
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
34
 *  GNU Lesser General Public License for more details.
35
 *
36
 *  You should have received a copy of the GNU Lesser General Public License
37
 *  along with this program.
38
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
39
 */
40
41
namespace Smalot\PdfParser\RawData;
42
43
use Exception;
44
45
class FilterHelper
46
{
47
    protected $availableFilters = ['ASCIIHexDecode', 'ASCII85Decode', 'LZWDecode', 'FlateDecode', 'RunLengthDecode'];
48
49
    /**
50
     * Decode data using the specified filter type.
51
     *
52
     * @param string $filter Filter name
53
     * @param string $data   Data to decode
54
     *
55
     * @return Decoded data string
0 ignored issues
show
Bug introduced by
The type Smalot\PdfParser\RawData\Decoded was not found. Maybe you did not declare it correctly or list all dependencies?

The issue could also be caused by a filter entry in the build configuration. If the path has been excluded in your configuration, e.g. excluded_paths: ["lib/*"], you can move it to the dependency path list as follows:

filter:
    dependency_paths: ["lib/*"]

For further information see https://scrutinizer-ci.com/docs/tools/php/php-scrutinizer/#list-dependency-paths

Loading history...
56
     *
57
     * @throws Exception if a certain decode function is not implemented yet.
58
     */
59
    public function decodeFilter($filter, $data)
60
    {
61
        switch ($filter) {
62
            case 'ASCIIHexDecode':
63
                return $this->decodeFilterASCIIHexDecode($data);
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->decodeFilterASCIIHexDecode($data) returns the type string which is incompatible with the documented return type Smalot\PdfParser\RawData\Decoded.
Loading history...
64
                break;
0 ignored issues
show
Unused Code introduced by
break is not strictly necessary here and could be removed.

The break statement is not necessary if it is preceded for example by a return statement:

switch ($x) {
    case 1:
        return 'foo';
        break; // This break is not necessary and can be left off.
}

If you would like to keep this construct to be consistent with other case statements, you can safely mark this issue as a false-positive.

Loading history...
65
66
            case 'ASCII85Decode':
67
                return $this->decodeFilterASCII85Decode($data);
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->decodeFilterASCII85Decode($data) returns the type string which is incompatible with the documented return type Smalot\PdfParser\RawData\Decoded.
Loading history...
68
                break;
69
70
            case 'LZWDecode':
71
                return $this->decodeFilterLZWDecode($data);
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->decodeFilterLZWDecode($data) returns the type string which is incompatible with the documented return type Smalot\PdfParser\RawData\Decoded.
Loading history...
72
                break;
73
74
            case 'FlateDecode':
75
                return $this->decodeFilterFlateDecode($data);
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->decodeFilterFlateDecode($data) returns the type string which is incompatible with the documented return type Smalot\PdfParser\RawData\Decoded.
Loading history...
76
                break;
77
78
            case 'RunLengthDecode':
79
                return $this->decodeFilterRunLengthDecode($data);
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->decodeFilterRunLengthDecode($data) returns the type string which is incompatible with the documented return type Smalot\PdfParser\RawData\Decoded.
Loading history...
80
                break;
81
82
            case 'CCITTFaxDecode':
83
                throw new Exception('Decode CCITTFaxDecode not implemented yet.');
84
            case 'JBIG2Decode':
85
                throw new Exception('Decode JBIG2Decode not implemented yet.');
86
            case 'DCTDecode':
87
                throw new Exception('Decode DCTDecode not implemented yet.');
88
            case 'JPXDecode':
89
                throw new Exception('Decode JPXDecode not implemented yet.');
90
            case 'Crypt':
91
                throw new Exception('Decode Crypt not implemented yet.');
92
            default:
93
                return $data;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $data returns the type string which is incompatible with the documented return type Smalot\PdfParser\RawData\Decoded.
Loading history...
94
        }
95
    }
96
97
    /**
98
     * ASCIIHexDecode
99
     *
100
     * Decodes data encoded in an ASCII hexadecimal representation, reproducing the original binary data.
101
     *
102
     * @param string $data Data to decode
103
     *
104
     * @return string data string
105
     */
106
    public function decodeFilterASCIIHexDecode($data)
107
    {
108
        // initialize string to return
109
        $decoded = '';
0 ignored issues
show
Unused Code introduced by
The assignment to $decoded is dead and can be removed.
Loading history...
110
        // all white-space characters shall be ignored
111
        $data = preg_replace('/[\s]/', '', $data);
112
        // check for EOD character: GREATER-THAN SIGN (3Eh)
113
        $eod = strpos($data, '>');
114
        if (false !== $eod) {
115
            // remove EOD and extra data (if any)
116
            $data = substr($data, 0, $eod);
117
            $eod = true;
118
        }
119
        // get data length
120
        $data_length = \strlen($data);
121
        if (0 != ($data_length % 2)) {
122
            // odd number of hexadecimal digits
123
            if ($eod) {
124
                // EOD shall behave as if a 0 (zero) followed the last digit
125
                $data = substr($data, 0, -1).'0'.substr($data, -1);
126
            } else {
127
                throw new Exception('decodeFilterASCIIHexDecode: invalid code');
128
            }
129
        }
130
        // check for invalid characters
131
        if (preg_match('/[^a-fA-F\d]/', $data) > 0) {
132
            throw new Exception('decodeFilterASCIIHexDecode: invalid code');
133
        }
134
        // get one byte of binary data for each pair of ASCII hexadecimal digits
135
        $decoded = pack('H*', $data);
136
137
        return $decoded;
138
    }
139
140
    /**
141
     * ASCII85Decode
142
     *
143
     * Decodes data encoded in an ASCII base-85 representation, reproducing the original binary data.
144
     *
145
     * @param string $data Data to decode
146
     *
147
     * @return string data string
148
     */
149
    public function decodeFilterASCII85Decode($data)
150
    {
151
        // initialize string to return
152
        $decoded = '';
153
        // all white-space characters shall be ignored
154
        $data = preg_replace('/[\s]/', '', $data);
155
        // remove start sequence 2-character sequence <~ (3Ch)(7Eh)
156
        if (false !== strpos($data, '<~')) {
157
            // remove EOD and extra data (if any)
158
            $data = substr($data, 2);
159
        }
160
        // check for EOD: 2-character sequence ~> (7Eh)(3Eh)
161
        $eod = strpos($data, '~>');
162
        if (false !== $eod) {
163
            // remove EOD and extra data (if any)
164
            $data = substr($data, 0, $eod);
165
        }
166
        // data length
167
        $data_length = \strlen($data);
168
        // check for invalid characters
169
        if (preg_match('/[^\x21-\x75,\x74]/', $data) > 0) {
170
            throw new Exception('decodeFilterASCII85Decode: invalid code');
171
        }
172
        // z sequence
173
        $zseq = \chr(0).\chr(0).\chr(0).\chr(0);
174
        // position inside a group of 4 bytes (0-3)
175
        $group_pos = 0;
176
        $tuple = 0;
177
        $pow85 = [(85 * 85 * 85 * 85), (85 * 85 * 85), (85 * 85), 85, 1];
178
        $last_pos = ($data_length - 1);
0 ignored issues
show
Unused Code introduced by
The assignment to $last_pos is dead and can be removed.
Loading history...
179
        // for each byte
180
        for ($i = 0; $i < $data_length; ++$i) {
181
            // get char value
182
            $char = \ord($data[$i]);
183
            if (122 == $char) { // 'z'
184
                if (0 == $group_pos) {
185
                    $decoded .= $zseq;
186
                } else {
187
                    throw new Exception('decodeFilterASCII85Decode: invalid code');
188
                }
189
            } else {
190
                // the value represented by a group of 5 characters should never be greater than 2^32 - 1
191
                $tuple += (($char - 33) * $pow85[$group_pos]);
192
                if (4 == $group_pos) {
193
                    $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8).\chr($tuple);
194
                    $tuple = 0;
195
                    $group_pos = 0;
196
                } else {
197
                    ++$group_pos;
198
                }
199
            }
200
        }
201
        if ($group_pos > 1) {
202
            $tuple += $pow85[($group_pos - 1)];
203
        }
204
        // last tuple (if any)
205
        switch ($group_pos) {
206
            case 4:
207
                    $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8);
208
                    break;
209
210
            case 3:
211
                    $decoded .= \chr($tuple >> 24).\chr($tuple >> 16);
212
                    break;
213
214
            case 2:
215
                    $decoded .= \chr($tuple >> 24);
216
                    break;
217
218
            case 1:
219
                    throw new Exception('decodeFilterASCII85Decode: invalid code');
220
                    break;
221
        }
222
223
        return $decoded;
224
    }
225
226
    /**
227
     * FlateDecode
228
     *
229
     * Decompresses data encoded using the zlib/deflate compression method, reproducing the original text or binary data.
230
     *
231
     * @param string $data Data to decode
232
     *
233
     * @return string data string
234
     */
235
    public function decodeFilterFlateDecode($data)
236
    {
237
        // initialize string to return
238
        $decoded = @gzuncompress($data);
239
        if (false === $decoded) {
240
            throw new Exception('decodeFilterFlateDecode: invalid code');
241
        }
242
243
        return $decoded;
244
    }
245
246
    /**
247
     * LZWDecode
248
     *
249
     * Decompresses data encoded using the LZW (Lempel-Ziv-Welch) adaptive compression method, reproducing the original text or binary data.
250
     *
251
     * @param string $data Data to decode
252
     *
253
     * @return string Data string
254
     */
255
    public function decodeFilterLZWDecode($data)
256
    {
257
        // initialize string to return
258
        $decoded = '';
259
        // data length
260
        $data_length = \strlen($data);
261
        // convert string to binary string
262
        $bitstring = '';
263
        for ($i = 0; $i < $data_length; ++$i) {
264
            $bitstring .= sprintf('%08b', \ord($data[$i]));
265
        }
266
        // get the number of bits
267
        $data_length = \strlen($bitstring);
268
        // initialize code length in bits
269
        $bitlen = 9;
270
        // initialize dictionary index
271
        $dix = 258;
272
        // initialize the dictionary (with the first 256 entries).
273
        $dictionary = [];
274
        for ($i = 0; $i < 256; ++$i) {
275
            $dictionary[$i] = \chr($i);
276
        }
277
        // previous val
278
        $prev_index = 0;
279
        // while we encounter EOD marker (257), read code_length bits
280
        while (($data_length > 0) and (257 != ($index = bindec(substr($bitstring, 0, $bitlen))))) {
281
            // remove read bits from string
282
            $bitstring = substr($bitstring, $bitlen);
283
            // update number of bits
284
            $data_length -= $bitlen;
285
            if (256 == $index) { // clear-table marker
286
                // reset code length in bits
287
                $bitlen = 9;
288
                // reset dictionary index
289
                $dix = 258;
290
                $prev_index = 256;
291
                // reset the dictionary (with the first 256 entries).
292
                $dictionary = [];
293
                for ($i = 0; $i < 256; ++$i) {
294
                    $dictionary[$i] = \chr($i);
295
                }
296
            } elseif (256 == $prev_index) {
297
                // first entry
298
                $decoded .= $dictionary[$index];
299
                $prev_index = $index;
300
            } else {
301
                // check if index exist in the dictionary
302
                if ($index < $dix) {
303
                    // index exist on dictionary
304
                    $decoded .= $dictionary[$index];
305
                    $dic_val = $dictionary[$prev_index].$dictionary[$index][0];
306
                    // store current index
307
                    $prev_index = $index;
308
                } else {
309
                    // index do not exist on dictionary
310
                    $dic_val = $dictionary[$prev_index].$dictionary[$prev_index][0];
311
                    $decoded .= $dic_val;
312
                }
313
                // update dictionary
314
                $dictionary[$dix] = $dic_val;
315
                ++$dix;
316
                // change bit length by case
317
                if (2047 == $dix) {
318
                    $bitlen = 12;
319
                } elseif (1023 == $dix) {
320
                    $bitlen = 11;
321
                } elseif (511 == $dix) {
322
                    $bitlen = 10;
323
                }
324
            }
325
        }
326
327
        return $decoded;
328
    }
329
330
    /**
331
     * RunLengthDecode
332
     *
333
     * Decompresses data encoded using a byte-oriented run-length encoding algorithm.
334
     *
335
     * @param string $data Data to decode
336
     *
337
     * @return string
338
     */
339
    public function decodeFilterRunLengthDecode($data)
340
    {
341
        // initialize string to return
342
        $decoded = '';
343
        // data length
344
        $data_length = \strlen($data);
345
        $i = 0;
346
        while ($i < $data_length) {
347
            // get current byte value
348
            $byte = \ord($data[$i]);
349
            if (128 == $byte) {
350
                // a length value of 128 denote EOD
351
                break;
352
            } elseif ($byte < 128) {
353
                // if the length byte is in the range 0 to 127
354
                // the following length + 1 (1 to 128) bytes shall be copied literally during decompression
355
                $decoded .= substr($data, ($i + 1), ($byte + 1));
356
                // move to next block
357
                $i += ($byte + 2);
358
            } else {
359
                // if length is in the range 129 to 255,
360
                // the following single byte shall be copied 257 - length (2 to 128) times during decompression
361
                $decoded .= str_repeat($data[($i + 1)], (257 - $byte));
362
                // move to next block
363
                $i += 2;
364
            }
365
        }
366
367
        return $decoded;
368
    }
369
370
    /**
371
     * @return array list of available filters
372
     */
373
    public function getAvailableFilters()
374
    {
375
        return $this->availableFilters;
376
    }
377
}
378