| 1 |  |  | <?php | 
            
                                                                                                            
                            
            
                                    
            
            
                | 2 |  |  | /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 3 |  |  |  * CSVelte: Slender, elegant CSV for PHP | 
            
                                                                                                            
                            
            
                                    
            
            
                | 4 |  |  |  * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 5 |  |  |  * Inspired by Python's CSV module and Frictionless Data and the W3C's CSV | 
            
                                                                                                            
                            
            
                                    
            
            
                | 6 |  |  |  * standardization efforts, CSVelte was written in an effort to take all the | 
            
                                                                                                            
                            
            
                                    
            
            
                | 7 |  |  |  * suck out of working with CSV. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 8 |  |  |  * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 9 |  |  |  * @copyright Copyright (c) 2018 Luke Visinoni | 
            
                                                                                                            
                            
            
                                    
            
            
                | 10 |  |  |  * @author    Luke Visinoni <[email protected]> | 
            
                                                                                                            
                            
            
                                    
            
            
                | 11 |  |  |  * @license   See LICENSE file (MIT license) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 12 |  |  |  */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 13 |  |  | namespace CSVelte; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 14 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 15 |  |  | use CSVelte\Contract\Streamable; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 16 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 17 |  |  | use CSVelte\Exception\SnifferException; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 18 |  |  | use CSVelte\Sniffer\SniffDelimiterByConsistency; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 19 |  |  | use CSVelte\Sniffer\SniffDelimiterByDistribution; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 20 |  |  | use CSVelte\Sniffer\SniffLineTerminatorByCount; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 21 |  |  | use CSVelte\Sniffer\SniffQuoteAndDelimByAdjacency; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 22 |  |  | use Noz\Collection\Collection; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 23 |  |  | use function Noz\to_array; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 24 |  |  | use RuntimeException; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 25 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 26 |  |  | use function Noz\collect; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 27 |  |  | use function Stringy\create as s; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 28 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 29 |  |  | class Sniffer | 
            
                                                                                                            
                            
            
                                    
            
            
                | 30 |  |  | { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 31 |  |  |     /** CSV data sample size - sniffer will use this many bytes to make its determinations */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 32 |  |  |     const SAMPLE_SIZE = 2500; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 33 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 34 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 35 |  |  |      * ASCII character codes for "invisibles". | 
            
                                                                                                            
                            
            
                                    
            
            
                | 36 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 37 |  |  |     const HORIZONTAL_TAB  = 9; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 38 |  |  |     const LINE_FEED       = 10; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 39 |  |  |     const CARRIAGE_RETURN = 13; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 40 |  |  |     const SPACE           = 32; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 41 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 42 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 43 |  |  |      * @var array A list of possible delimiters to check for (in order of preference) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 44 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 45 |  |  |     protected $delims = [',', "\t", ';', '|', ':', '-', '_', '#', '/', '\\', '$', '+', '=', '&', '@']; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 46 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 47 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 48 |  |  |      * @var Streamable A stream of the sample data | 
            
                                                                                                            
                            
            
                                    
            
            
                | 49 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 50 |  |  |     protected $stream; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 51 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 52 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 53 |  |  |      * Sniffer constructor. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 54 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 55 |  |  |      * @param Streamable $stream The data to sniff | 
            
                                                                                                            
                            
            
                                    
            
            
                | 56 |  |  |      * @param array $delims A list of possible delimiter characters in order of preference | 
            
                                                                                                            
                            
            
                                    
            
            
                | 57 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 58 | 1 |  |     public function __construct(Streamable $stream, $delims = null) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 59 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 60 | 1 |  |         $this->stream = $stream; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 61 | 1 |  |         if (!is_null($delims)) { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 62 | 1 |  |             $this->setPossibleDelimiters($delims); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 63 | 1 |  |         } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 64 | 1 |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 65 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 66 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 67 |  |  |      * Set possible delimiter characters | 
            
                                                                                                            
                            
            
                                    
            
            
                | 68 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 69 |  |  |      * @param array $delims A list of possible delimiter characters | 
            
                                                                                                            
                            
            
                                    
            
            
                | 70 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 71 |  |  |      * @return self | 
            
                                                                                                            
                                                                
            
                                    
            
            
                | 72 |  |  |      */ | 
            
                                                                        
                            
            
                                    
            
            
                | 73 | 1 |  |     public function setPossibleDelimiters(array $delims) | 
            
                                                                        
                            
            
                                    
            
            
                | 74 |  |  |     { | 
            
                                                                        
                            
            
                                    
            
            
                | 75 | 1 |  |         $this->delims = collect($delims) | 
            
                                                                        
                            
            
                                    
            
            
                | 76 | 1 |  |             ->filter(function($val) { | 
            
                                                                        
                            
            
                                    
            
            
                | 77 | 1 |  |                 return s($val)->length() == 1; | 
            
                                                                        
                            
            
                                    
            
            
                | 78 | 1 |  |             }) | 
            
                                                                        
                            
            
                                    
            
            
                | 79 | 1 |  |             ->values() | 
            
                                                                        
                            
            
                                    
            
            
                | 80 | 1 |  |             ->toArray(); | 
            
                                                                        
                            
            
                                    
            
            
                | 81 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 82 | 1 |  |         return $this; | 
            
                                                                        
                            
            
                                    
            
            
                | 83 |  |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 84 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 85 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 86 |  |  |      * Get list of possible delimiter characters | 
            
                                                                                                            
                            
            
                                    
            
            
                | 87 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 88 |  |  |      * @return array | 
            
                                                                                                            
                            
            
                                    
            
            
                | 89 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 90 | 1 |  |     public function getPossibleDelimiters() | 
            
                                                                                                            
                            
            
                                    
            
            
                | 91 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 92 | 1 |  |         return $this->delims; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 93 |  |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 94 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 95 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 96 |  |  |      * Sniff CSV data (determine its dialect) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 97 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 98 |  |  |      * Since CSV is less a format than a collection of similar formats, you can never be certain how a particular CSV | 
            
                                                                                                            
                            
            
                                    
            
            
                | 99 |  |  |      * file is formatted. This method inspects CSV data and returns its "dialect", an object that can be passed to | 
            
                                                                                                            
                            
            
                                    
            
            
                | 100 |  |  |      * either a `CSVelte\Reader` or `CSVelte\Writer` object to tell it what "dialect" of CSV to use. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 101 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 102 |  |  |      * @todo look into which other Dialect attributes you can sniff for | 
            
                                                                                                            
                            
            
                                    
            
            
                | 103 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 104 |  |  |      * @return Dialect | 
            
                                                                                                            
                            
            
                                    
            
            
                | 105 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 106 |  |  |     public function sniff() | 
            
                                                                                                            
                            
            
                                    
            
            
                | 107 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 108 |  |  |         $sample = $this->stream->read(static::SAMPLE_SIZE); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 109 |  |  |         $lineTerminator = $this->sniffLineTerminator($sample); | 
                            
                    |  |  |  | 
                                                                                        
                                                                                     | 
            
                                                                                                            
                            
            
                                    
            
            
                | 110 |  |  |         try { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 111 |  |  |             list($quoteChar, $delimiter) = $this->sniffQuoteAndDelim($sample, $lineTerminator); | 
                            
                    |  |  |  | 
                                                                                        
                                                                                     | 
            
                                                                                                            
                            
            
                                    
            
            
                | 112 |  |  |         } catch (SnifferException $e) { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 113 |  |  |             if ($e->getCode() !== SnifferException::ERR_QUOTE_AND_DELIM) { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 114 |  |  |                 throw $e; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 115 |  |  |             } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 116 |  |  |             $quoteChar = '"'; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 117 |  |  |             $delimiter = $this->sniffDelimiter($sample, $lineTerminator); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 118 |  |  |         } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 119 |  |  |         /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 120 |  |  |          * @todo Should this be null? Because doubleQuote = true means this = null | 
            
                                                                                                            
                            
            
                                    
            
            
                | 121 |  |  |          */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 122 |  |  |         $escapeChar = '\\'; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 123 |  |  |         $quoteStyle = $this->sniffQuotingStyle($delimiter, $lineTerminator); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 124 |  |  |         $header     = $this->sniffHeader($delimiter, $lineTerminator); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 125 |  |  |         $encoding   = s($sample)->getEncoding(); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 126 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 127 |  |  |         return new Dialect(compact('quoteChar', 'escapeChar', 'delimiter', 'lineTerminator', 'quoteStyle', 'header', 'encoding')); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 128 |  |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 129 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 130 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 131 |  |  |      * Sniff sample data for line terminator character | 
            
                                                                                                            
                            
            
                                    
            
            
                | 132 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 133 |  |  |      * @param string $data The sample data | 
            
                                                                                                            
                            
            
                                    
            
            
                | 134 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 135 |  |  |      * @return string | 
            
                                                                                                            
                            
            
                                    
            
            
                | 136 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 137 |  |  |     protected function sniffLineTerminator($data) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 138 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 139 |  |  |         $sniffer = new SniffLineTerminatorByCount(); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 140 |  |  |         return $sniffer->sniff($data); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 141 |  |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 142 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 143 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 144 |  |  |      * Sniff quote and delimiter chars | 
            
                                                                                                            
                            
            
                                    
            
            
                | 145 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 146 |  |  |      * The best way to determine quote and delimiter characters is when columns | 
            
                                                                                                            
                            
            
                                    
            
            
                | 147 |  |  |      * are quoted, often you can seek out a pattern of delim, quote, stuff, quote, delim | 
            
                                                                                                            
                            
            
                                    
            
            
                | 148 |  |  |      * but this only works if you have quoted columns. If you don't you have to | 
            
                                                                                                            
                            
            
                                    
            
            
                | 149 |  |  |      * determine these characters some other way... (see lickDelimiter). | 
            
                                                                                                            
                            
            
                                    
            
            
                | 150 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 151 |  |  |      * @throws SnifferException | 
            
                                                                                                            
                            
            
                                    
            
            
                | 152 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 153 |  |  |      * @param string $data The data to analyze | 
            
                                                                                                            
                            
            
                                    
            
            
                | 154 |  |  |      * @param string $lineTerminator The line terminator char/sequence | 
            
                                                                                                            
                            
            
                                    
            
            
                | 155 |  |  |      * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 156 |  |  |      * @return array A two-row array containing quotechar, delimchar | 
            
                                                                                                            
                            
            
                                    
            
            
                | 157 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 158 |  |  |     protected function sniffQuoteAndDelim($data, $lineTerminator) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 159 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 160 |  |  |         $sniffer = new SniffQuoteAndDelimByAdjacency(compact('lineTerminator')); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 161 |  |  |         return $sniffer->sniff($data); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 162 |  |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 163 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 164 |  |  |     /** | 
            
                                                                                                            
                            
            
                                    
            
            
                | 165 |  |  |      * @todo To make this class more oop and test-friendly, implement strategy pattern here with each delim sniffing method implemented in its own strategy class. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 166 |  |  |      */ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 167 |  |  |     protected function sniffDelimiter($data, $lineTerminator) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 168 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 169 |  |  |         $delimiters = $this->getPossibleDelimiters(); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 170 |  |  |         $consistency = new SniffDelimiterByConsistency(compact('lineTerminator', 'delimiters')); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 171 |  |  |         $winners = $consistency->sniff($data); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 172 |  |  |         if (count($winners) > 1) { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 173 |  |  |             $delimiters = $winners; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 174 |  |  |             return (new SniffDelimiterByDistribution(compact('lineTerminator', 'delimiters'))) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 175 |  |  |                 ->sniff($data); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 176 |  |  |         } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 177 |  |  |         return current($winners); | 
            
                                                                                                            
                            
            
                                    
            
            
                | 178 |  |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 179 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 180 |  |  |     protected function sniffQuotingStyle($delimiter, $eols) | 
                            
                    |  |  |  | 
                                                                                        
                                                                                            
                                                                                     | 
            
                                                                                                            
                            
            
                                    
            
            
                | 181 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 182 |  |  |         return Dialect::QUOTE_MINIMAL; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 183 |  |  |     } | 
            
                                                                                                            
                            
            
                                    
            
            
                | 184 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 185 |  |  |     protected function sniffHeader($delimiter, $eols) | 
                            
                    |  |  |  | 
                                                                                        
                                                                                            
                                                                                     | 
            
                                                                                                            
                            
            
                                    
            
            
                | 186 |  |  |     { | 
            
                                                                                                            
                            
            
                                    
            
            
                | 187 |  |  |         return true; | 
            
                                                                                                            
                            
            
                                    
            
            
                | 188 |  |  |     } | 
            
                                                                                                            
                                                                
            
                                    
            
            
                | 189 |  |  | } | 
            
                                                        
            
                                    
            
            
                | 190 |  |  |  | 
            
                        
This check looks for type mismatches where the missing type is
false. This is usually indicative of an error condtion.Consider the follow example
This function either returns a new
DateTimeobject or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returnedfalsebefore passing on the value to another function or method that may not be able to handle afalse.