Completed
Pull Request — master (#183)
by Luke
09:18 queued 06:34
created

Sniffer::setPossibleDelimiters()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 11

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 8
CRAP Score 1

Importance

Changes 0
Metric Value
cc 1
nc 1
nop 1
dl 0
loc 11
ccs 8
cts 8
cp 1
crap 1
rs 9.9
c 0
b 0
f 0
1
<?php
2
/**
3
 * CSVelte: Slender, elegant CSV for PHP
4
 *
5
 * Inspired by Python's CSV module and Frictionless Data and the W3C's CSV
6
 * standardization efforts, CSVelte was written in an effort to take all the
7
 * suck out of working with CSV.
8
 *
9
 * @copyright Copyright (c) 2018 Luke Visinoni
10
 * @author    Luke Visinoni <[email protected]>
11
 * @license   See LICENSE file (MIT license)
12
 */
13
namespace CSVelte;
14
15
use CSVelte\Contract\Streamable;
16
17
use CSVelte\Exception\SnifferException;
18
use CSVelte\Sniffer\SniffDelimiterByConsistency;
19
use CSVelte\Sniffer\SniffDelimiterByDistribution;
20
use CSVelte\Sniffer\SniffLineTerminatorByCount;
21
use CSVelte\Sniffer\SniffQuoteAndDelimByAdjacency;
22
use Noz\Collection\Collection;
23
use function Noz\to_array;
24
use RuntimeException;
25
26
use function Noz\collect;
27
use function Stringy\create as s;
28
29
class Sniffer
30
{
31
    /** CSV data sample size - sniffer will use this many bytes to make its determinations */
32
    const SAMPLE_SIZE = 2500;
33
34
    /**
35
     * ASCII character codes for "invisibles".
36
     */
37
    const HORIZONTAL_TAB  = 9;
38
    const LINE_FEED       = 10;
39
    const CARRIAGE_RETURN = 13;
40
    const SPACE           = 32;
41
42
    /**
43
     * @var array A list of possible delimiters to check for (in order of preference)
44
     */
45
    protected $delims = [',', "\t", ';', '|', ':', '-', '_', '#', '/', '\\', '$', '+', '=', '&', '@'];
46
47
    /**
48
     * @var Streamable A stream of the sample data
49
     */
50
    protected $stream;
51
52
    /**
53
     * Sniffer constructor.
54
     *
55
     * @param Streamable $stream The data to sniff
56
     * @param array $delims A list of possible delimiter characters in order of preference
57
     */
58 1
    public function __construct(Streamable $stream, $delims = null)
59
    {
60 1
        $this->stream = $stream;
61 1
        if (!is_null($delims)) {
62 1
            $this->setPossibleDelimiters($delims);
63 1
        }
64 1
    }
65
66
    /**
67
     * Set possible delimiter characters
68
     *
69
     * @param array $delims A list of possible delimiter characters
70
     *
71
     * @return self
72
     */
73 1
    public function setPossibleDelimiters(array $delims)
74
    {
75 1
        $this->delims = collect($delims)
76 1
            ->filter(function($val) {
77 1
                return s($val)->length() == 1;
78 1
            })
79 1
            ->values()
80 1
            ->toArray();
81
82 1
        return $this;
83
    }
84
85
    /**
86
     * Get list of possible delimiter characters
87
     *
88
     * @return array
89
     */
90 1
    public function getPossibleDelimiters()
91
    {
92 1
        return $this->delims;
93
    }
94
95
    /**
96
     * Sniff CSV data (determine its dialect)
97
     *
98
     * Since CSV is less a format than a collection of similar formats, you can never be certain how a particular CSV
99
     * file is formatted. This method inspects CSV data and returns its "dialect", an object that can be passed to
100
     * either a `CSVelte\Reader` or `CSVelte\Writer` object to tell it what "dialect" of CSV to use.
101
     *
102
     * @todo look into which other Dialect attributes you can sniff for
103
     *
104
     * @return Dialect
105
     */
106
    public function sniff()
107
    {
108
        $sample = $this->stream->read(static::SAMPLE_SIZE);
109
        $lineTerminator = $this->sniffLineTerminator($sample);
0 ignored issues
show
Security Bug introduced by
It seems like $sample defined by $this->stream->read(static::SAMPLE_SIZE) on line 108 can also be of type false; however, CSVelte\Sniffer::sniffLineTerminator() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
110
        try {
111
            list($quoteChar, $delimiter) = $this->sniffQuoteAndDelim($sample, $lineTerminator);
0 ignored issues
show
Security Bug introduced by
It seems like $sample can also be of type false; however, CSVelte\Sniffer::sniffQuoteAndDelim() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
112
        } catch (SnifferException $e) {
113
            if ($e->getCode() !== SnifferException::ERR_QUOTE_AND_DELIM) {
114
                throw $e;
115
            }
116
            $quoteChar = '"';
117
            $delimiter = $this->sniffDelimiter($sample, $lineTerminator);
118
        }
119
        /**
120
         * @todo Should this be null? Because doubleQuote = true means this = null
121
         */
122
        $escapeChar = '\\';
123
        $quoteStyle = $this->sniffQuotingStyle($delimiter, $lineTerminator);
124
        $header     = $this->sniffHeader($delimiter, $lineTerminator);
125
        $encoding   = s($sample)->getEncoding();
126
127
        return new Dialect(compact('quoteChar', 'escapeChar', 'delimiter', 'lineTerminator', 'quoteStyle', 'header', 'encoding'));
128
    }
129
130
    /**
131
     * Sniff sample data for line terminator character
132
     *
133
     * @param string $data The sample data
134
     *
135
     * @return string
136
     */
137
    protected function sniffLineTerminator($data)
138
    {
139
        $sniffer = new SniffLineTerminatorByCount();
140
        return $sniffer->sniff($data);
141
    }
142
143
    /**
144
     * Sniff quote and delimiter chars
145
     *
146
     * The best way to determine quote and delimiter characters is when columns
147
     * are quoted, often you can seek out a pattern of delim, quote, stuff, quote, delim
148
     * but this only works if you have quoted columns. If you don't you have to
149
     * determine these characters some other way... (see lickDelimiter).
150
     *
151
     * @throws SnifferException
152
     *
153
     * @param string $data The data to analyze
154
     * @param string $lineTerminator The line terminator char/sequence
155
     *
156
     * @return array A two-row array containing quotechar, delimchar
157
     */
158
    protected function sniffQuoteAndDelim($data, $lineTerminator)
159
    {
160
        $sniffer = new SniffQuoteAndDelimByAdjacency(compact('lineTerminator'));
161
        return $sniffer->sniff($data);
162
    }
163
164
    /**
165
     * @todo To make this class more oop and test-friendly, implement strategy pattern here with each delim sniffing method implemented in its own strategy class.
166
     */
167
    protected function sniffDelimiter($data, $lineTerminator)
168
    {
169
        $delimiters = $this->getPossibleDelimiters();
170
        $consistency = new SniffDelimiterByConsistency(compact('lineTerminator', 'delimiters'));
171
        $winners = $consistency->sniff($data);
172
        if (count($winners) > 1) {
173
            $delimiters = $winners;
174
            return (new SniffDelimiterByDistribution(compact('lineTerminator', 'delimiters')))
175
                ->sniff($data);
176
        }
177
        return current($winners);
178
    }
179
180
    protected function sniffQuotingStyle($delimiter, $eols)
0 ignored issues
show
Unused Code introduced by
The parameter $delimiter is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
Unused Code introduced by
The parameter $eols is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
181
    {
182
        return Dialect::QUOTE_MINIMAL;
183
    }
184
185
    protected function sniffHeader($delimiter, $eols)
0 ignored issues
show
Unused Code introduced by
The parameter $delimiter is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
Unused Code introduced by
The parameter $eols is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
186
    {
187
        return true;
188
    }
189
}
190