Passed
Pull Request — master (#299)
by Konrad
01:46
created

FilterHelper   B

Complexity

Total Complexity 48

Size/Duplication

Total Lines 324
Duplicated Lines 0 %

Importance

Changes 2
Bugs 0 Features 1
Metric Value
eloc 146
c 2
b 0
f 1
dl 0
loc 324
rs 8.5599
wmc 48

7 Methods

Rating   Name   Duplication   Size   Complexity  
B decodeFilter() 0 30 11
C decodeFilterLZWDecode() 0 73 12
C decodeFilterASCII85Decode() 0 75 13
A decodeFilterRunLengthDecode() 0 29 4
A decodeFilterFlateDecode() 0 9 2
A getAvailableFilters() 0 3 1
A decodeFilterASCIIHexDecode() 0 30 5

How to fix   Complexity   

Complex Class

Complex classes like FilterHelper often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use FilterHelper, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/**
4
 * This file is based on code of tecnickcom/TCPDF PDF library.
5
 *
6
 * Original author Nicola Asuni ([email protected]) and
7
 * contributors (https://github.com/tecnickcom/TCPDF/graphs/contributors).
8
 *
9
 * @see https://github.com/tecnickcom/TCPDF
10
 *
11
 * Original code was licensed on the terms of the LGPL v3.
12
 *
13
 * ------------------------------------------------------------------------------
14
 *
15
 * @file This file is part of the PdfParser library.
16
 *
17
 * @author  Konrad Abicht <[email protected]>
18
 * @date    2020-01-06
19
 *
20
 * @license LGPLv3
21
 * @url     <https://github.com/smalot/pdfparser>
22
 *
23
 *  PdfParser is a pdf library written in PHP, extraction oriented.
24
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
25
 *
26
 *  This program is free software: you can redistribute it and/or modify
27
 *  it under the terms of the GNU Lesser General Public License as published by
28
 *  the Free Software Foundation, either version 3 of the License, or
29
 *  (at your option) any later version.
30
 *
31
 *  This program is distributed in the hope that it will be useful,
32
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
33
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
34
 *  GNU Lesser General Public License for more details.
35
 *
36
 *  You should have received a copy of the GNU Lesser General Public License
37
 *  along with this program.
38
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
39
 */
40
41
namespace Smalot\PdfParser\RawData;
42
43
use Exception;
44
45
class FilterHelper
46
{
47
    protected $availableFilters = ['ASCIIHexDecode', 'ASCII85Decode', 'LZWDecode', 'FlateDecode', 'RunLengthDecode'];
48
49
    /**
50
     * Decode data using the specified filter type.
51
     *
52
     * @param string $filter Filter name
53
     * @param string $data   Data to decode
54
     *
55
     * @return string Decoded data string
56
     *
57
     * @throws Exception if a certain decode function is not implemented yet
58
     */
59
    public function decodeFilter($filter, $data)
60
    {
61
        switch ($filter) {
62
            case 'ASCIIHexDecode':
63
                return $this->decodeFilterASCIIHexDecode($data);
64
65
            case 'ASCII85Decode':
66
                return $this->decodeFilterASCII85Decode($data);
67
68
            case 'LZWDecode':
69
                return $this->decodeFilterLZWDecode($data);
70
71
            case 'FlateDecode':
72
                return $this->decodeFilterFlateDecode($data);
73
74
            case 'RunLengthDecode':
75
                return $this->decodeFilterRunLengthDecode($data);
76
77
            case 'CCITTFaxDecode':
78
                throw new Exception('Decode CCITTFaxDecode not implemented yet.');
79
            case 'JBIG2Decode':
80
                throw new Exception('Decode JBIG2Decode not implemented yet.');
81
            case 'DCTDecode':
82
                throw new Exception('Decode DCTDecode not implemented yet.');
83
            case 'JPXDecode':
84
                throw new Exception('Decode JPXDecode not implemented yet.');
85
            case 'Crypt':
86
                throw new Exception('Decode Crypt not implemented yet.');
87
            default:
88
                return $data;
89
        }
90
    }
91
92
    /**
93
     * ASCIIHexDecode
94
     *
95
     * Decodes data encoded in an ASCII hexadecimal representation, reproducing the original binary data.
96
     *
97
     * @param string $data Data to decode
98
     *
99
     * @return string data string
100
     */
101
    protected function decodeFilterASCIIHexDecode($data)
102
    {
103
        // all white-space characters shall be ignored
104
        $data = preg_replace('/[\s]/', '', $data);
105
        // check for EOD character: GREATER-THAN SIGN (3Eh)
106
        $eod = strpos($data, '>');
107
        if (false !== $eod) {
108
            // remove EOD and extra data (if any)
109
            $data = substr($data, 0, $eod);
110
            $eod = true;
111
        }
112
        // get data length
113
        $data_length = \strlen($data);
114
        if (0 != ($data_length % 2)) {
115
            // odd number of hexadecimal digits
116
            if ($eod) {
117
                // EOD shall behave as if a 0 (zero) followed the last digit
118
                $data = substr($data, 0, -1).'0'.substr($data, -1);
119
            } else {
120
                throw new Exception('decodeFilterASCIIHexDecode: invalid code');
121
            }
122
        }
123
        // check for invalid characters
124
        if (preg_match('/[^a-fA-F\d]/', $data) > 0) {
125
            throw new Exception('decodeFilterASCIIHexDecode: invalid code');
126
        }
127
        // get one byte of binary data for each pair of ASCII hexadecimal digits
128
        $decoded = pack('H*', $data);
129
130
        return $decoded;
131
    }
132
133
    /**
134
     * ASCII85Decode
135
     *
136
     * Decodes data encoded in an ASCII base-85 representation, reproducing the original binary data.
137
     *
138
     * @param string $data Data to decode
139
     *
140
     * @return string data string
141
     */
142
    protected function decodeFilterASCII85Decode($data)
143
    {
144
        // initialize string to return
145
        $decoded = '';
146
        // all white-space characters shall be ignored
147
        $data = preg_replace('/[\s]/', '', $data);
148
        // remove start sequence 2-character sequence <~ (3Ch)(7Eh)
149
        if (false !== strpos($data, '<~')) {
150
            // remove EOD and extra data (if any)
151
            $data = substr($data, 2);
152
        }
153
        // check for EOD: 2-character sequence ~> (7Eh)(3Eh)
154
        $eod = strpos($data, '~>');
155
        if (false !== $eod) {
156
            // remove EOD and extra data (if any)
157
            $data = substr($data, 0, $eod);
158
        }
159
        // data length
160
        $data_length = \strlen($data);
161
        // check for invalid characters
162
        if (preg_match('/[^\x21-\x75,\x74]/', $data) > 0) {
163
            throw new Exception('decodeFilterASCII85Decode: invalid code');
164
        }
165
        // z sequence
166
        $zseq = \chr(0).\chr(0).\chr(0).\chr(0);
167
        // position inside a group of 4 bytes (0-3)
168
        $group_pos = 0;
169
        $tuple = 0;
170
        $pow85 = [(85 * 85 * 85 * 85), (85 * 85 * 85), (85 * 85), 85, 1];
171
172
        // for each byte
173
        for ($i = 0; $i < $data_length; ++$i) {
174
            // get char value
175
            $char = \ord($data[$i]);
176
            if (122 == $char) { // 'z'
177
                if (0 == $group_pos) {
178
                    $decoded .= $zseq;
179
                } else {
180
                    throw new Exception('decodeFilterASCII85Decode: invalid code');
181
                }
182
            } else {
183
                // the value represented by a group of 5 characters should never be greater than 2^32 - 1
184
                $tuple += (($char - 33) * $pow85[$group_pos]);
185
                if (4 == $group_pos) {
186
                    $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8).\chr($tuple);
187
                    $tuple = 0;
188
                    $group_pos = 0;
189
                } else {
190
                    ++$group_pos;
191
                }
192
            }
193
        }
194
        if ($group_pos > 1) {
195
            $tuple += $pow85[($group_pos - 1)];
196
        }
197
        // last tuple (if any)
198
        switch ($group_pos) {
199
            case 4:
200
                $decoded .= \chr($tuple >> 24).\chr($tuple >> 16).\chr($tuple >> 8);
201
                break;
202
203
            case 3:
204
                $decoded .= \chr($tuple >> 24).\chr($tuple >> 16);
205
                break;
206
207
            case 2:
208
                $decoded .= \chr($tuple >> 24);
209
                break;
210
211
            case 1:
212
                throw new Exception('decodeFilterASCII85Decode: invalid code');
213
                break;
214
        }
215
216
        return $decoded;
217
    }
218
219
    /**
220
     * FlateDecode
221
     *
222
     * Decompresses data encoded using the zlib/deflate compression method, reproducing the original text or binary data.
223
     *
224
     * @param string $data Data to decode
225
     *
226
     * @return string data string
227
     */
228
    protected function decodeFilterFlateDecode($data)
229
    {
230
        // initialize string to return
231
        $decoded = @gzuncompress($data);
232
        if (false === $decoded) {
233
            throw new Exception('decodeFilterFlateDecode: invalid code');
234
        }
235
236
        return $decoded;
237
    }
238
239
    /**
240
     * LZWDecode
241
     *
242
     * Decompresses data encoded using the LZW (Lempel-Ziv-Welch) adaptive compression method, reproducing the original text or binary data.
243
     *
244
     * @param string $data Data to decode
245
     *
246
     * @return string Data string
247
     */
248
    protected function decodeFilterLZWDecode($data)
249
    {
250
        // initialize string to return
251
        $decoded = '';
252
        // data length
253
        $data_length = \strlen($data);
254
        // convert string to binary string
255
        $bitstring = '';
256
        for ($i = 0; $i < $data_length; ++$i) {
257
            $bitstring .= sprintf('%08b', \ord($data[$i]));
258
        }
259
        // get the number of bits
260
        $data_length = \strlen($bitstring);
261
        // initialize code length in bits
262
        $bitlen = 9;
263
        // initialize dictionary index
264
        $dix = 258;
265
        // initialize the dictionary (with the first 256 entries).
266
        $dictionary = [];
267
        for ($i = 0; $i < 256; ++$i) {
268
            $dictionary[$i] = \chr($i);
269
        }
270
        // previous val
271
        $prev_index = 0;
272
        // while we encounter EOD marker (257), read code_length bits
273
        while (($data_length > 0) and (257 != ($index = bindec(substr($bitstring, 0, $bitlen))))) {
274
            // remove read bits from string
275
            $bitstring = substr($bitstring, $bitlen);
276
            // update number of bits
277
            $data_length -= $bitlen;
278
            if (256 == $index) { // clear-table marker
279
                // reset code length in bits
280
                $bitlen = 9;
281
                // reset dictionary index
282
                $dix = 258;
283
                $prev_index = 256;
284
                // reset the dictionary (with the first 256 entries).
285
                $dictionary = [];
286
                for ($i = 0; $i < 256; ++$i) {
287
                    $dictionary[$i] = \chr($i);
288
                }
289
            } elseif (256 == $prev_index) {
290
                // first entry
291
                $decoded .= $dictionary[$index];
292
                $prev_index = $index;
293
            } else {
294
                // check if index exist in the dictionary
295
                if ($index < $dix) {
296
                    // index exist on dictionary
297
                    $decoded .= $dictionary[$index];
298
                    $dic_val = $dictionary[$prev_index].$dictionary[$index][0];
299
                    // store current index
300
                    $prev_index = $index;
301
                } else {
302
                    // index do not exist on dictionary
303
                    $dic_val = $dictionary[$prev_index].$dictionary[$prev_index][0];
304
                    $decoded .= $dic_val;
305
                }
306
                // update dictionary
307
                $dictionary[$dix] = $dic_val;
308
                ++$dix;
309
                // change bit length by case
310
                if (2047 == $dix) {
311
                    $bitlen = 12;
312
                } elseif (1023 == $dix) {
313
                    $bitlen = 11;
314
                } elseif (511 == $dix) {
315
                    $bitlen = 10;
316
                }
317
            }
318
        }
319
320
        return $decoded;
321
    }
322
323
    /**
324
     * RunLengthDecode
325
     *
326
     * Decompresses data encoded using a byte-oriented run-length encoding algorithm.
327
     *
328
     * @param string $data Data to decode
329
     *
330
     * @return string
331
     */
332
    protected function decodeFilterRunLengthDecode($data)
333
    {
334
        // initialize string to return
335
        $decoded = '';
336
        // data length
337
        $data_length = \strlen($data);
338
        $i = 0;
339
        while ($i < $data_length) {
340
            // get current byte value
341
            $byte = \ord($data[$i]);
342
            if (128 == $byte) {
343
                // a length value of 128 denote EOD
344
                break;
345
            } elseif ($byte < 128) {
346
                // if the length byte is in the range 0 to 127
347
                // the following length + 1 (1 to 128) bytes shall be copied literally during decompression
348
                $decoded .= substr($data, ($i + 1), ($byte + 1));
349
                // move to next block
350
                $i += ($byte + 2);
351
            } else {
352
                // if length is in the range 129 to 255,
353
                // the following single byte shall be copied 257 - length (2 to 128) times during decompression
354
                $decoded .= str_repeat($data[($i + 1)], (257 - $byte));
355
                // move to next block
356
                $i += 2;
357
            }
358
        }
359
360
        return $decoded;
361
    }
362
363
    /**
364
     * @return array list of available filters
365
     */
366
    public function getAvailableFilters()
367
    {
368
        return $this->availableFilters;
369
    }
370
}
371