Reader   B
last analyzed

Complexity

Total Complexity 46

Size/Duplication

Total Lines 486
Duplicated Lines 0 %

Coupling/Cohesion

Components 2
Dependencies 6

Importance

Changes 0
Metric Value
dl 0
loc 486
rs 8.72
c 0
b 0
f 0
wmc 46
lcom 2
cbo 6

23 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 6 1
A getFlavor() 0 4 1
A hasHeader() 0 4 1
A current() 0 4 1
A next() 0 7 1
A valid() 0 4 1
A key() 0 4 1
A rewind() 0 12 2
A header() 0 4 1
A addFilter() 0 6 1
A addFilters() 0 8 2
A filter() 0 4 1
A toArray() 0 6 1
A setFlavor() 0 17 4
A setSource() 0 9 2
B load() 0 20 6
B readLine() 0 23 6
B inQuotedString() 0 21 6
A replaceQuotedSpecialChars() 0 9 1
A undoReplaceQuotedSpecialChars() 0 9 2
A unQuote() 0 8 2
A unEscape() 0 4 1
A parse() 0 13 1

How to fix   Complexity   

Complex Class

Complex classes like Reader often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Reader, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/*
4
 * CSVelte: Slender, elegant CSV for PHP
5
 * Inspired by Python's CSV module and Frictionless Data and the W3C's CSV
6
 * standardization efforts, CSVelte was written in an effort to take all the
7
 * suck out of working with CSV.
8
 *
9
 * @version   {version}
10
 * @copyright Copyright (c) 2016 Luke Visinoni <[email protected]>
11
 * @author    Luke Visinoni <[email protected]>
12
 * @license   https://github.com/deni-zen/csvelte/blob/master/LICENSE The MIT License (MIT)
13
 */
14
namespace CSVelte;
15
16
use CSVelte\Contract\Streamable;
17
18
use CSVelte\Exception\EndOfFileException;
19
use CSVelte\Reader\FilteredIterator as FilteredReader;
20
use CSVelte\Table\HeaderRow;
21
22
use CSVelte\Table\Row;
23
24
use function
25
    CSVelte\streamize;
26
27
/**
28
 * CSV Reader.
29
 *
30
 * Reads CSV data from any object that implements CSVelte\Contract\Readable.
31
 *
32
 * @package CSVelte
33
 * @subpackage Reader
34
 *
35
 * @since v0.1
36
 *
37
 * @todo Also, is there any way to do some kind of caching or something? Probably
38
 *     not but if you could that would be a cool feature...
39
 */
40
class Reader implements \Iterator
41
{
42
    const PLACEHOLDER_DELIM   = '[=[__DLIM__]=]';
43
    const PLACEHOLDER_NEWLINE = '[=[__NWLN__]=]';
44
45
    /**
46
     * This class supports any sources of input that implements this interface.
47
     * This way I can read from local files, streams, FTP, any class that implements
48
     * the "Readable" interface.
49
     *
50
     * @var Contract\Streamable
51
     */
52
    protected $source;
53
54
    /**
55
     * @var Flavor The "flavor" or format of the CSV being read
56
     */
57
    protected $flavor;
58
59
    /**
60
     * @var Table\Row|null Row currently loaded into memory
61
     */
62
    protected $current;
63
64
    /**
65
     * @var int The current line being read (from input source)
66
     */
67
    protected $line = 0;
68
69
    /**
70
     * @var Table\HeaderRow The header row (if any)
71
     */
72
    protected $header;
73
74
    /**
75
     * @var array An array of callback functions
76
     */
77
    protected $filters = [];
78
79
    /**
80
     * @var bool True if current line ended while inside a quoted string
81
     */
82
    protected $open = false;
83
84
    /**
85
     * @var bool True if last character read was the escape character
86
     */
87
    protected $escape = false;
88
89
    /**
90
     * Reader Constructor.
91
     * Initializes a reader object using an input source and optionally a flavor.
92
     *
93
     * @param mixed             $input  The source of our CSV data
94
     * @param Flavor|array|null $flavor The "flavor" or format specification object
95
     */
96
    public function __construct($input, $flavor = null)
97
    {
98
        $this->setSource($input)
99
             ->setFlavor($flavor)
100
             ->rewind();
101
    }
102
103
    /**
104
     * Flavor Getter.
105
     *
106
     * Retreive the "flavor" object being used by the reader
107
     *
108
     * @return Flavor
109
     */
110
    public function getFlavor()
111
    {
112
        return $this->flavor;
113
    }
114
115
    /**
116
     * Check if flavor object defines header.
117
     *
118
     * Determine whether or not the input source's CSV data contains a header
119
     * row or not. Unless you explicitly specify so within your Flavor object,
120
     * this method is a logical best guess. The CSV format does not
121
     * provide metadata of any kind and therefor does not provide this info.
122
     *
123
     * @return bool True if the input source has a header row (or, to be more )
124
     *              accurate, if the flavor SAYS it has a header row)
125
     *
126
     * @todo Rather than always reading in Taster::SAMPLE_SIZE, read in ten lines at a time until
127
     *     whatever method it is has enough data to make a reliable decision/guess
128
     */
129
    public function hasHeader()
130
    {
131
        return $this->getFlavor()->header;
132
    }
133
134
    /**
135
     * Retrieve current row.
136
     *
137
     * @return Table\Row The current row
138
     */
139
    public function current()
140
    {
141
        return $this->current;
142
    }
143
144
    /**
145
     * Advance to the next row.
146
     *
147
     * @return Table\Row|null The current row (if there is one)
148
     */
149
    public function next()
150
    {
151
        $this->current = null;
152
        $this->load();
153
154
        return $this->current;
155
    }
156
157
    /**
158
     * Determine if current position has valid row.
159
     *
160
     * @return bool True if current row is valid
161
     */
162
    public function valid()
163
    {
164
        return (bool) $this->current;
165
    }
166
167
    /**
168
     * Retrieve current row key (line number).
169
     *
170
     * @return int The current line number
171
     */
172
    public function key()
173
    {
174
        return $this->line;
175
    }
176
177
    /**
178
     * Rewind to the beginning of the dataset.
179
     *
180
     * @return Table\Row|null The current row
181
     */
182
    public function rewind()
183
    {
184
        $this->line = 0;
185
        $this->source->rewind();
186
        $this->current = null;
187
        $this->load();
188
        if ($this->hasHeader()) {
189
            $this->next();
190
        }
191
192
        return $this->current();
193
    }
194
195
    /**
196
     * Retrieve header row.
197
     *
198
     * @return Table\HeaderRow The header row if there is one
199
     */
200
    public function header()
201
    {
202
        return $this->header;
203
    }
204
205
    /**
206
     * Add anonumous function as filter.
207
     *
208
     * Add an anonymous function that accepts the current row as its only argument.
209
     * Return true from the function to keep that row, false otherwise.
210
     *
211
     * @param callable $filter An anonymous function to filter out row by certain criteria
212
     *
213
     * @return $this
214
     */
215
    public function addFilter(callable $filter)
216
    {
217
        array_push($this->filters, $filter);
218
219
        return $this;
220
    }
221
222
    /**
223
     * Add multiple filters at once.
224
     *
225
     * Add an array of anonymous functions to filter out certain rows.
226
     *
227
     * @param array $filters An array of anonymous functions
228
     *
229
     * @return $this
230
     */
231
    public function addFilters(array $filters)
232
    {
233
        foreach ($filters as $filter) {
234
            $this->addFilter($filter);
235
        }
236
237
        return $this;
238
    }
239
240
    /**
241
     * Returns an iterator with rows from user-supplied filter functions removed.
242
     *
243
     * @return FilteredReader An iterator with filtered rows
244
     */
245
    public function filter()
246
    {
247
        return new FilteredReader($this, $this->filters);
248
    }
249
250
    /**
251
     * Retrieve the contents of the dataset as an array of arrays.
252
     *
253
     * @return array An array of arrays of CSV content
254
     */
255
    public function toArray()
256
    {
257
        return array_map(function ($row) {
258
            return $row->toArray();
259
        }, iterator_to_array($this));
260
    }
261
262
    /**
263
     * Set the flavor.
264
     *
265
     * Set the ``CSVelte\Flavor`` object, used to determine CSV format.
266
     *
267
     * @param Flavor|array|null $flavor Either an array or a flavor object
268
     *
269
     * @return $this
270
     */
271
    protected function setFlavor($flavor = null)
272
    {
273
        if (is_array($flavor)) {
274
            $flavor = new Flavor($flavor);
275
        }
276
        // @todo put this inside a try/catch
277
        if (is_null($flavor)) {
278
            $flavor = taste($this->source);
279
        }
280
        if (is_null($flavor->header)) {
281
            // Flavor is immutable, give me a new one with header set to lickHeader return val
282
            $flavor = $flavor->copy(['header' => taste_has_header($this->source)]);
283
        }
284
        $this->flavor = $flavor;
285
286
        return $this;
287
    }
288
289
    /**
290
     * Set the reader source.
291
     *
292
     * The reader can accept anything that implements Readable and is actually
293
     * readable (can be read). This will make sure that whatever is passed to
294
     * the reader meets these expectations and set $this->source. It can also
295
     * accept any string (or any object with a __toString() method), or an
296
     * SplFileObject, so long as it represents a file rather than a directory.
297
     *
298
     * @param mixed $input See description
299
     *
300
     * @return $this
301
     */
302
    protected function setSource($input)
303
    {
304
        if (!($input instanceof Streamable)) {
305
            $input = streamize($input);
306
        }
307
        $this->source = $input;
308
309
        return $this;
310
    }
311
312
    /**
313
     * Load a line into memory.
314
     */
315
    protected function load()
316
    {
317
        if (is_null($this->current)) {
318
            try {
319
                $line = $this->readLine();
320
                $this->line++;
321
                $parsed = $this->parse($line);
322
                if ($this->hasHeader() && $this->line === 1) {
323
                    $this->header = new HeaderRow($parsed);
324
                } else {
325
                    $this->current = new Row($parsed);
326
                    if ($this->header) {
327
                        $this->current->setHeaderRow($this->header);
328
                    }
329
                }
330
            } catch (EndOfFileException $e) {
331
                $this->current = null;
332
            }
333
        }
334
    }
335
336
    /**
337
     * Read single line from CSV data source (stream, file, etc.), taking into
338
     * account CSV's de-facto quoting rules with respect to designated line
339
     * terminator character when they fall within quoted strings.
340
     *
341
     * @throws Exception\EndOfFileException when eof has been reached
342
     *                                      and the read buffer has all been returned
343
     *
344
     * @return string A CSV row (could possibly span multiple lines depending on
345
     *                quoting and escaping)
346
     */
347
    protected function readLine()
348
    {
349
        $f   = $this->getFlavor();
350
        $eol = $f->lineTerminator;
351
        try {
352
            do {
353
                if (!isset($lines)) {
354
                    $lines = [];
355
                }
356
                if (false === ($line = $this->source->readLine($eol))) {
357
                    throw new EndOfFileException('End of file reached');
358
                }
359
                array_push($lines, rtrim($line, $eol));
360
            } while ($this->inQuotedString(end($lines), $f->quoteChar, $f->escapeChar));
361
        } catch (EndOfFileException $e) {
362
            // only throw the exception if we don't already have lines in the buffer
363
            if (!count($lines)) {
364
                throw $e;
365
            }
366
        }
367
368
        return rtrim(implode($eol, $lines), $eol);
369
    }
370
371
    /**
372
     * Determine whether last line ended while a quoted string was still "open".
373
     *
374
     * This method is used in a loop to determine if each line being read ends
375
     * while a quoted string is still "open".
376
     *
377
     * @param string $line       Line of csv to analyze
378
     * @param string $quoteChar  The quote/enclosure character to use
379
     * @param string $escapeChar The escape char/sequence to use
380
     *
381
     * @return bool True if currently within a quoted string
382
     */
383
    protected function inQuotedString($line, $quoteChar, $escapeChar)
384
    {
385
        if (!empty($line)) {
386
            do {
387
                if (!isset($i)) {
388
                    $i = 0;
389
                }
390
                $c                 = $line[$i++];
391
                if ($this->escape) {
392
                    $this->escape = false;
393
                    continue;
394
                }
395
                $this->escape                     = ($c == $escapeChar);
396
                if ($c == $quoteChar) {
397
                    $this->open = !$this->open;
398
                }
399
            } while ($i < strlen($line));
400
        }
401
402
        return $this->open;
403
    }
404
405
    /**
406
     * Temporarily replace special characters within a quoted string.
407
     *
408
     * Replace all instances of newlines and whatever character you specify (as
409
     * the delimiter) that are contained within quoted text. The replacements are
410
     * simply a special placeholder string. This is done so that I can use the
411
     * very unsmart "explode" function and not have to worry about it exploding
412
     * on delimiters or newlines within quotes. Once I have exploded, I typically
413
     * sub back in the real characters before doing anything else.
414
     *
415
     * @param string $data  The string to do the replacements on
416
     * @param string $delim The delimiter character to replace
417
     * @param string $quo   The quote character
418
     * @param string $eol   Line terminator character/sequence
419
     *
420
     * @return string The data with replacements performed
421
     *
422
     * @internal
423
     *
424
     * @todo I could probably pass in (maybe optionally) the newline character I
425
     *     want to replace as well. I'll do that if I need to.
426
     * @todo Create a regex class so you can do $regex->escape() rather than
427
     *     preg_quote
428
     */
429
    protected function replaceQuotedSpecialChars($data, $delim, $quo, $eol)
430
    {
431
        return preg_replace_callback('/([' . preg_quote($quo, '/') . '])(.*)\1/imsU', function ($matches) use ($delim, $eol) {
432
            $ret = str_replace($eol, self::PLACEHOLDER_NEWLINE, $matches[0]);
433
            $ret = str_replace($delim, self::PLACEHOLDER_DELIM, $ret);
434
435
            return $ret;
436
        }, $data);
437
    }
438
439
    /**
440
     * Undo temporary special char replacements.
441
     *
442
     * Replace the special character placeholders with the characters they
443
     * originally substituted.
444
     *
445
     * @param string $data  The data to undo replacements in
446
     * @param string $delim The delimiter character
447
     * @param string $eol   The character or string of characters used to terminate lines
448
     *
449
     * @return string The data with placeholders replaced with original characters
450
     *
451
     * @internal
452
     */
453
    protected function undoReplaceQuotedSpecialChars($data, $delim, $eol)
454
    {
455
        $replacements = [self::PLACEHOLDER_DELIM => $delim, self::PLACEHOLDER_NEWLINE => $eol];
456
        if (array_walk($replacements, function ($replacement, $placeholder) use (&$data) {
457
            $data = str_replace($placeholder, $replacement, $data);
458
        })) {
459
            return $data;
460
        }
461
    }
462
463
    /**
464
     * Remove quotes wrapping text.
465
     *
466
     * @param string $data The data to unquote
467
     *
468
     * @return string The data with quotes stripped from the outside of it
469
     *
470
     * @internal
471
     */
472
    protected function unQuote($data)
473
    {
474
        $escapeChar = $this->getFlavor()->doubleQuote ? $this->getFlavor()->quoteChar : $this->getFlavor()->escapeChar;
475
        $quoteChar  = $this->getFlavor()->quoteChar;
476
        $data       = $this->unEscape($data, $escapeChar, $quoteChar);
477
478
        return preg_replace('/^(["\'])(.*)\1$/ms', '\2', $data);
479
    }
480
481
    /**
482
     * "Unescape" a string.
483
     *
484
     * Replaces escaped characters with their unescaped versions.
485
     *
486
     * @internal
487
     *
488
     * @param string $str The string to unescape
489
     * @param string $esc The escape character used
490
     * @param string $quo The quote character used
491
     *
492
     * @return mixed The string with characters unescaped
493
     *
494
     * @todo This actually shouldn't even be necessary. Characters should be read
495
     *     in one at a time and a quote that follows another should just be ignored
496
     *     deeming this unnecessary.
497
     */
498
    protected function unEscape($str, $esc, $quo)
499
    {
500
        return str_replace($esc . $quo, $quo, $str);
501
    }
502
503
    /**
504
     * Parse a line of CSV data into an array of columns.
505
     *
506
     * @param string $line A line of CSV data to parse
507
     *
508
     * @return array An array of columns
509
     *
510
     * @internal
511
     */
512
    protected function parse($line)
513
    {
514
        $f        = $this->getFlavor();
515
        $replaced = $this->replaceQuotedSpecialChars($line, $f->delimiter, $f->quoteChar, $f->lineTerminator);
516
        $columns  = explode($f->delimiter, $replaced);
517
        $that     = $this;
518
519
        return array_map(function ($val) use ($that, $f) {
520
            $undone = $that->undoReplaceQuotedSpecialChars($val, $f->delimiter, $f->lineTerminator);
521
522
            return $this->unQuote($undone);
523
        }, $columns);
524
    }
525
}
526