Passed
Pull Request — master (#562)
by Konrad
04:47 queued 02:15
created

Document   A

Complexity

Total Complexity 42

Size/Duplication

Total Lines 258
Duplicated Lines 0 %

Test Coverage

Coverage 94%

Importance

Changes 10
Bugs 3 Features 0
Metric Value
eloc 85
c 10
b 3
f 0
dl 0
loc 258
ccs 94
cts 100
cp 0.94
rs 9.0399
wmc 42

17 Methods

Rating   Name   Duplication   Size   Complexity  
A getObjectsByType() 0 15 4
A getPages() 0 35 6
A buildDictionary() 0 25 6
A init() 0 10 2
A getObjects() 0 3 1
A __construct() 0 3 1
A setObjects() 0 5 1
A getObjectById() 0 7 2
A getFonts() 0 3 1
A buildDetails() 0 25 5
A getFirstFont() 0 8 2
A hasObjectsByType() 0 3 1
A getDictionary() 0 3 1
A getDetails() 0 3 1
A getTrailer() 0 3 1
A setTrailer() 0 3 1
A getText() 0 23 6

How to fix   Complexity   

Complex Class

Complex classes like Document often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Document, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 *
9
 * @date    2017-01-03
10
 *
11
 * @license LGPLv3
12
 *
13
 * @url     <https://github.com/smalot/pdfparser>
14
 *
15
 *  PdfParser is a pdf library written in PHP, extraction oriented.
16
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
17
 *
18
 *  This program is free software: you can redistribute it and/or modify
19
 *  it under the terms of the GNU Lesser General Public License as published by
20
 *  the Free Software Foundation, either version 3 of the License, or
21
 *  (at your option) any later version.
22
 *
23
 *  This program is distributed in the hope that it will be useful,
24
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
25
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
26
 *  GNU Lesser General Public License for more details.
27
 *
28
 *  You should have received a copy of the GNU Lesser General Public License
29
 *  along with this program.
30
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
31
 */
32
33
namespace Smalot\PdfParser;
34
35
/**
36
 * Technical references :
37
 * - http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/index.html
38
 * - http://framework.zend.com/issues/secure/attachment/12512/Pdf.php
39
 * - http://www.php.net/manual/en/ref.pdf.php#74211
40
 * - http://cpansearch.perl.org/src/JV/PostScript-Font-1.10.02/lib/PostScript/ISOLatin1Encoding.pm
41
 * - http://cpansearch.perl.org/src/JV/PostScript-Font-1.10.02/lib/PostScript/ISOLatin9Encoding.pm
42
 * - http://cpansearch.perl.org/src/JV/PostScript-Font-1.10.02/lib/PostScript/StandardEncoding.pm
43
 * - http://cpansearch.perl.org/src/JV/PostScript-Font-1.10.02/lib/PostScript/WinAnsiEncoding.pm
44
 *
45
 * Class Document
46
 */
47
class Document
48
{
49
    /**
50
     * @var PDFObject[]
51
     */
52
    protected $objects = [];
53
54
    /**
55
     * @var array
56
     */
57
    protected $dictionary = [];
58
59
    /**
60
     * @var Header
61
     */
62
    protected $trailer = null;
63
64
    /**
65
     * @var array
66
     */
67
    protected $details = null;
68
69 72
    public function __construct()
70
    {
71 72
        $this->trailer = new Header([], $this);
72 72
    }
73
74 49
    public function init()
75
    {
76 49
        $this->buildDictionary();
77
78 49
        $this->buildDetails();
79
80
        // Propagate init to objects.
81 49
        foreach ($this->objects as $object) {
82 49
            $object->getHeader()->init();
83 49
            $object->init();
84
        }
85 49
    }
86
87
    /**
88
     * Build dictionary based on type header field.
89
     */
90 49
    protected function buildDictionary()
91
    {
92
        // Build dictionary.
93 49
        $this->dictionary = [];
94
95 49
        foreach ($this->objects as $id => $object) {
96
            // Cache objects by type and subtype
97 49
            $type = $object->getHeader()->get('Type')->getContent();
98
99 49
            if (null != $type) {
100 49
                if (!isset($this->dictionary[$type])) {
101 49
                    $this->dictionary[$type] = [
102
                        'all' => [],
103
                        'subtype' => [],
104
                    ];
105
                }
106
107 49
                $this->dictionary[$type]['all'][$id] = $object;
108
109 49
                $subtype = $object->getHeader()->get('Subtype')->getContent();
110 49
                if (null != $subtype) {
111 42
                    if (!isset($this->dictionary[$type]['subtype'][$subtype])) {
112 42
                        $this->dictionary[$type]['subtype'][$subtype] = [];
113
                    }
114 42
                    $this->dictionary[$type]['subtype'][$subtype][$id] = $object;
115
                }
116
            }
117
        }
118 49
    }
119
120
    /**
121
     * Build details array.
122
     */
123 49
    protected function buildDetails()
124
    {
125
        // Build details array.
126 49
        $details = [];
127
128
        // Extract document info
129 49
        if ($this->trailer->has('Info')) {
130
            /** @var PDFObject $info */
131 40
            $info = $this->trailer->get('Info');
132
            // This could be an ElementMissing object, so we need to check for
133
            // the getHeader method first.
134 40
            if (null !== $info && method_exists($info, 'getHeader')) {
135 40
                $details = $info->getHeader()->getDetails();
136
            }
137
        }
138
139
        // Retrieve the page count
140
        try {
141 49
            $pages = $this->getPages();
142 48
            $details['Pages'] = \count($pages);
143 2
        } catch (\Exception $e) {
144 2
            $details['Pages'] = 0;
145
        }
146
147 49
        $this->details = $details;
148 49
    }
149
150 1
    public function getDictionary(): array
151
    {
152 1
        return $this->dictionary;
153
    }
154
155
    /**
156
     * @param PDFObject[] $objects
157
     */
158 49
    public function setObjects($objects = [])
159
    {
160 49
        $this->objects = (array) $objects;
161
162 49
        $this->init();
163 49
    }
164
165
    /**
166
     * @return PDFObject[]
167
     */
168 1
    public function getObjects()
169
    {
170 1
        return $this->objects;
171
    }
172
173
    /**
174
     * @return PDFObject|Font|Page|Element|null
175
     */
176 46
    public function getObjectById(string $id)
177
    {
178 46
        if (isset($this->objects[$id])) {
179 46
            return $this->objects[$id];
180
        }
181
182 3
        return null;
183
    }
184
185 50
    public function hasObjectsByType(string $type, ?string $subtype = null): bool
186
    {
187 50
        return 0 < \count($this->getObjectsByType($type, $subtype));
188
    }
189
190 53
    public function getObjectsByType(string $type, ?string $subtype = null): array
191
    {
192 53
        if (!isset($this->dictionary[$type])) {
193 12
            return [];
194
        }
195
196 48
        if (null != $subtype) {
0 ignored issues
show
Bug introduced by
It seems like you are loosely comparing $subtype of type null|string against null; this is ambiguous if the string can be empty. Consider using a strict comparison !== instead.
Loading history...
197
            if (!isset($this->dictionary[$type]['subtype'][$subtype])) {
198
                return [];
199
            }
200
201
            return $this->dictionary[$type]['subtype'][$subtype];
202
        }
203
204 48
        return $this->dictionary[$type]['all'];
205
    }
206
207
    /**
208
     * @return Font[]
209
     */
210 27
    public function getFonts()
211
    {
212 27
        return $this->getObjectsByType('Font');
213
    }
214
215 21
    public function getFirstFont(): ?Font
216
    {
217 21
        $fonts = $this->getFonts();
218 21
        if ([] === $fonts) {
219 3
            return null;
220
        }
221
222 18
        return reset($fonts);
223
    }
224
225
    /**
226
     * @return Page[]
227
     *
228
     * @throws \Exception
229
     */
230 50
    public function getPages()
231
    {
232 50
        if ($this->hasObjectsByType('Catalog')) {
233
            // Search for catalog to list pages.
234 42
            $catalogues = $this->getObjectsByType('Catalog');
235 42
            $catalogue = reset($catalogues);
236
237
            /** @var Pages $object */
238 42
            $object = $catalogue->get('Pages');
239 42
            if (method_exists($object, 'getPages')) {
240 42
                return $object->getPages(true);
241
            }
242
        }
243
244 9
        if ($this->hasObjectsByType('Pages')) {
245
            // Search for pages to list kids.
246 1
            $pages = [];
247
248
            /** @var Pages[] $objects */
249 1
            $objects = $this->getObjectsByType('Pages');
250 1
            foreach ($objects as $object) {
251 1
                $pages = array_merge($pages, $object->getPages(true));
252
            }
253
254 1
            return $pages;
255
        }
256
257 9
        if ($this->hasObjectsByType('Page')) {
258
            // Search for 'page' (unordered pages).
259 7
            $pages = $this->getObjectsByType('Page');
260
261 7
            return array_values($pages);
262
        }
263
264 3
        throw new \Exception('Missing catalog.');
265
    }
266
267 12
    public function getText(?int $pageLimit = null): string
268
    {
269 12
        $texts = [];
270 12
        $pages = $this->getPages();
271
272
        // Only use the first X number of pages if $pageLimit is set and numeric.
273 12
        if (\is_int($pageLimit) && 0 < $pageLimit) {
274 1
            $pages = \array_slice($pages, 0, $pageLimit);
275
        }
276
277 12
        foreach ($pages as $index => $page) {
278
            /**
279
             * In some cases, the $page variable may be null.
280
             */
281 12
            if (null === $page) {
282
                continue;
283
            }
284 12
            if ($text = trim($page->getText())) {
285 12
                $texts[] = $text;
286
            }
287
        }
288
289 12
        return implode("\n\n", $texts);
290
    }
291
292
    public function getTrailer(): Header
293
    {
294
        return $this->trailer;
295
    }
296
297 41
    public function setTrailer(Header $trailer)
298
    {
299 41
        $this->trailer = $trailer;
300 41
    }
301
302 12
    public function getDetails(): array
303
    {
304 12
        return $this->details;
305
    }
306
}
307