Completed
Push — develop ( 29208e...50a0ec )
by Adrien
05:34
created

Csv::inferSeparator()   F

Complexity

Conditions 17
Paths 1041

Size

Total Lines 76
Code Lines 45

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 44
CRAP Score 17.0237

Importance

Changes 0
Metric Value
cc 17
eloc 45
nc 1041
nop 0
dl 0
loc 76
rs 2.2706
c 0
b 0
f 0
ccs 44
cts 46
cp 0.9565
crap 17.0237

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace PhpOffice\PhpSpreadsheet\Reader;
4
5
use PhpOffice\PhpSpreadsheet\Cell;
6
use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
7
use PhpOffice\PhpSpreadsheet\Spreadsheet;
8
9
class Csv extends BaseReader implements IReader
10
{
11
    /**
12
     * Input encoding.
13
     *
14
     * @var string
15
     */
16
    private $inputEncoding = 'UTF-8';
17
18
    /**
19
     * Delimiter.
20
     *
21
     * @var string
22
     */
23
    private $delimiter = null;
24
25
    /**
26
     * Enclosure.
27
     *
28
     * @var string
29
     */
30
    private $enclosure = '"';
31
32
    /**
33
     * Sheet index to read.
34
     *
35
     * @var int
36
     */
37
    private $sheetIndex = 0;
38
39
    /**
40
     * Load rows contiguously.
41
     *
42
     * @var bool
43
     */
44
    private $contiguous = false;
45
46
    /**
47
     * Row counter for loading rows contiguously.
48
     *
49
     * @var int
50
     */
51
    private $contiguousRow = -1;
52
53
    /**
54
     * Create a new CSV Reader instance.
55
     */
56 2
    public function __construct()
57
    {
58 2
        $this->readFilter = new DefaultReadFilter();
59 2
    }
60
61
    /**
62
     * Set input encoding.
63
     *
64
     * @param string $pValue Input encoding, eg: 'UTF-8'
65
     */
66
    public function setInputEncoding($pValue)
67
    {
68
        $this->inputEncoding = $pValue;
69
70
        return $this;
71
    }
72
73
    /**
74
     * Get input encoding.
75
     *
76
     * @return string
77
     */
78
    public function getInputEncoding()
79
    {
80
        return $this->inputEncoding;
81
    }
82
83
    /**
84
     * Move filepointer past any BOM marker.
85
     */
86 2
    protected function skipBOM()
87
    {
88 2
        rewind($this->fileHandle);
89
90 2
        switch ($this->inputEncoding) {
91 2 View Code Duplication
            case 'UTF-8':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
92 2
                fgets($this->fileHandle, 4) == "\xEF\xBB\xBF" ?
93 2
                    fseek($this->fileHandle, 3) : fseek($this->fileHandle, 0);
94 2
                break;
95 View Code Duplication
            case 'UTF-16LE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
96
                fgets($this->fileHandle, 3) == "\xFF\xFE" ?
97
                    fseek($this->fileHandle, 2) : fseek($this->fileHandle, 0);
98
                break;
99 View Code Duplication
            case 'UTF-16BE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
100
                fgets($this->fileHandle, 3) == "\xFE\xFF" ?
101
                    fseek($this->fileHandle, 2) : fseek($this->fileHandle, 0);
102
                break;
103 View Code Duplication
            case 'UTF-32LE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
104
                fgets($this->fileHandle, 5) == "\xFF\xFE\x00\x00" ?
105
                    fseek($this->fileHandle, 4) : fseek($this->fileHandle, 0);
106
                break;
107 View Code Duplication
            case 'UTF-32BE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
108
                fgets($this->fileHandle, 5) == "\x00\x00\xFE\xFF" ?
109
                    fseek($this->fileHandle, 4) : fseek($this->fileHandle, 0);
110
                break;
111
            default:
112
                break;
113
        }
114 2
    }
115
116
    /**
117
     * Identify any separator that is explicitly set in the file.
118
     */
119 2
    protected function checkSeparator()
120
    {
121 2
        $line = fgets($this->fileHandle);
122 2
        if ($line === false) {
123
            return;
124
        }
125
126 2
        if ((strlen(trim($line, "\r\n")) == 5) && (stripos($line, 'sep=') === 0)) {
127
            $this->delimiter = substr($line, 4, 1);
128
129
            return;
130
        }
131
132 2
        return $this->skipBOM();
133
    }
134
135
    /**
136
     * Infer the separator if it isn't explicitly set in the file or specified by the user.
137
     */
138 2
    protected function inferSeparator()
139
    {
140 2
        if ($this->delimiter !== null) {
141
            return;
142
        }
143
144 2
        $potentialDelimiters = [',', ';', "\t", '|', ':', ' '];
145 2
        $counts = [];
146 2
        foreach ($potentialDelimiters as $delimiter) {
147 2
            $counts[$delimiter] = [];
148
        }
149
150
        // Count how many times each of the potential delimiters appears in each line
151 2
        $numberLines = 0;
152 2
        while (($line = fgets($this->fileHandle)) !== false && (++$numberLines < 1000)) {
153 2
            $countLine = [];
154 2
            for ($i = strlen($line) - 1; $i >= 0; --$i) {
155 2
                $char = $line[$i];
156 2
                if (isset($counts[$char])) {
157 2
                    if (!isset($countLine[$char])) {
158 2
                        $countLine[$char] = 0;
159
                    }
160 2
                    ++$countLine[$char];
161
                }
162
            }
163 2
            foreach ($potentialDelimiters as $delimiter) {
164 2
                $counts[$delimiter][] = isset($countLine[$delimiter])
165 2
                    ? $countLine[$delimiter]
166 2
                    : 0;
167
            }
168
        }
169
170
        // Calculate the mean square deviations for each delimiter (ignoring delimiters that haven't been found consistently)
171 2
        $meanSquareDeviations = [];
172 2
        $middleIdx = floor(($numberLines - 1) / 2);
173
174 2
        foreach ($potentialDelimiters as $delimiter) {
175 2
            $series = $counts[$delimiter];
176 2
            sort($series);
177
178 2
            $median = ($numberLines % 2)
179 2
                ? $series[$middleIdx]
180 2
                : ($series[$middleIdx] + $series[$middleIdx + 1]) / 2;
181
182 2
            if ($median === 0) {
183 2
                continue;
184
            }
185
186 2
            $meanSquareDeviations[$delimiter] = array_reduce(
187 2
                $series,
188 2
                function ($sum, $value) use ($median) {
189 2
                    return $sum + pow($value - $median, 2);
190 2
                }
191 2
            ) / count($series);
192
        }
193
194
        // ... and pick the delimiter with the smallest mean square deviation (in case of ties, the order in potentialDelimiters is respected)
195 2
        $min = INF;
196 2
        foreach ($potentialDelimiters as $delimiter) {
197 2
            if (!isset($meanSquareDeviations[$delimiter])) {
198 2
                continue;
199
            }
200
201 2
            if ($meanSquareDeviations[$delimiter] < $min) {
202 2
                $min = $meanSquareDeviations[$delimiter];
203 2
                $this->delimiter = $delimiter;
204
            }
205
        }
206
207
        // If no delimiter could be detected, fall back to the default
208 2
        if ($this->delimiter === null) {
209
            $this->delimiter = reset($potentialDelimiters);
0 ignored issues
show
Documentation Bug introduced by
It seems like reset($potentialDelimiters) can also be of type false. However, the property $delimiter is declared as type string. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
210
        }
211
212 2
        return $this->skipBOM();
213
    }
214
215
    /**
216
     * Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
217
     *
218
     * @param string $pFilename
219
     *
220
     * @throws Exception
221
     */
222
    public function listWorksheetInfo($pFilename)
223
    {
224
        // Open file
225
        if (!$this->canRead($pFilename)) {
226
            throw new Exception($pFilename . ' is an Invalid Spreadsheet file.');
227
        }
228
        $this->openFile($pFilename);
229
        $fileHandle = $this->fileHandle;
230
231
        // Skip BOM, if any
232
        $this->skipBOM();
233
        $this->checkSeparator();
234
        $this->inferSeparator();
235
236
        $worksheetInfo = [];
237
        $worksheetInfo[0]['worksheetName'] = 'Worksheet';
238
        $worksheetInfo[0]['lastColumnLetter'] = 'A';
239
        $worksheetInfo[0]['lastColumnIndex'] = 0;
240
        $worksheetInfo[0]['totalRows'] = 0;
241
        $worksheetInfo[0]['totalColumns'] = 0;
242
243
        // Loop through each line of the file in turn
244
        while (($rowData = fgetcsv($fileHandle, 0, $this->delimiter, $this->enclosure)) !== false) {
245
            ++$worksheetInfo[0]['totalRows'];
246
            $worksheetInfo[0]['lastColumnIndex'] = max($worksheetInfo[0]['lastColumnIndex'], count($rowData) - 1);
247
        }
248
249
        $worksheetInfo[0]['lastColumnLetter'] = Cell::stringFromColumnIndex($worksheetInfo[0]['lastColumnIndex']);
250
        $worksheetInfo[0]['totalColumns'] = $worksheetInfo[0]['lastColumnIndex'] + 1;
251
252
        // Close file
253
        fclose($fileHandle);
254
255
        return $worksheetInfo;
256
    }
257
258
    /**
259
     * Loads Spreadsheet from file.
260
     *
261
     * @param string $pFilename
262
     *
263
     * @throws Exception
264
     *
265
     * @return Spreadsheet
266
     */
267 2
    public function load($pFilename)
268
    {
269
        // Create new Spreadsheet
270 2
        $spreadsheet = new Spreadsheet();
271
272
        // Load into this instance
273 2
        return $this->loadIntoExisting($pFilename, $spreadsheet);
274
    }
275
276
    /**
277
     * Loads PhpSpreadsheet from file into PhpSpreadsheet instance.
278
     *
279
     * @param string $pFilename
280
     * @param Spreadsheet $spreadsheet
281
     *
282
     * @throws Exception
283
     *
284
     * @return Spreadsheet
285
     */
286 2
    public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet)
287
    {
288 2
        $lineEnding = ini_get('auto_detect_line_endings');
289 2
        ini_set('auto_detect_line_endings', true);
290
291
        // Open file
292 2
        if (!$this->canRead($pFilename)) {
293
            throw new Exception($pFilename . ' is an Invalid Spreadsheet file.');
294
        }
295 2
        $this->openFile($pFilename);
296 2
        $fileHandle = $this->fileHandle;
297
298
        // Skip BOM, if any
299 2
        $this->skipBOM();
300 2
        $this->checkSeparator();
301 2
        $this->inferSeparator();
302
303
        // Create new PhpSpreadsheet object
304 2
        while ($spreadsheet->getSheetCount() <= $this->sheetIndex) {
305
            $spreadsheet->createSheet();
306
        }
307 2
        $sheet = $spreadsheet->setActiveSheetIndex($this->sheetIndex);
308
309
        // Set our starting row based on whether we're in contiguous mode or not
310 2
        $currentRow = 1;
311 2
        if ($this->contiguous) {
312
            $currentRow = ($this->contiguousRow == -1) ? $sheet->getHighestRow() : $this->contiguousRow;
313
        }
314
315
        // Loop through each line of the file in turn
316 2
        while (($rowData = fgetcsv($fileHandle, 0, $this->delimiter, $this->enclosure)) !== false) {
317 2
            $columnLetter = 'A';
318 2
            foreach ($rowData as $rowDatum) {
319 2
                if ($rowDatum != '' && $this->readFilter->readCell($columnLetter, $currentRow)) {
320
                    // Convert encoding if necessary
321 2
                    if ($this->inputEncoding !== 'UTF-8') {
322
                        $rowDatum = StringHelper::convertEncoding($rowDatum, 'UTF-8', $this->inputEncoding);
323
                    }
324
325
                    // Set cell value
326 2
                    $sheet->getCell($columnLetter . $currentRow)->setValue($rowDatum);
327
                }
328 2
                ++$columnLetter;
329
            }
330 2
            ++$currentRow;
331
        }
332
333
        // Close file
334 2
        fclose($fileHandle);
335
336 2
        if ($this->contiguous) {
337
            $this->contiguousRow = $currentRow;
338
        }
339
340 2
        ini_set('auto_detect_line_endings', $lineEnding);
341
342
        // Return
343 2
        return $spreadsheet;
344
    }
345
346
    /**
347
     * Get delimiter.
348
     *
349
     * @return string
350
     */
351 1
    public function getDelimiter()
352
    {
353 1
        return $this->delimiter;
354
    }
355
356
    /**
357
     * Set delimiter.
358
     *
359
     * @param string $delimiter Delimiter, eg: ','
360
     *
361
     * @return CSV
362
     */
363
    public function setDelimiter($delimiter)
364
    {
365
        $this->delimiter = $delimiter;
366
367
        return $this;
368
    }
369
370
    /**
371
     * Get enclosure.
372
     *
373
     * @return string
374
     */
375
    public function getEnclosure()
376
    {
377
        return $this->enclosure;
378
    }
379
380
    /**
381
     * Set enclosure.
382
     *
383
     * @param string $enclosure Enclosure, defaults to "
384
     *
385
     * @return CSV
386
     */
387
    public function setEnclosure($enclosure)
388
    {
389
        if ($enclosure == '') {
390
            $enclosure = '"';
391
        }
392
        $this->enclosure = $enclosure;
393
394
        return $this;
395
    }
396
397
    /**
398
     * Get sheet index.
399
     *
400
     * @return int
401
     */
402
    public function getSheetIndex()
403
    {
404
        return $this->sheetIndex;
405
    }
406
407
    /**
408
     * Set sheet index.
409
     *
410
     * @param int $pValue Sheet index
411
     *
412
     * @return CSV
413
     */
414
    public function setSheetIndex($pValue)
415
    {
416
        $this->sheetIndex = $pValue;
417
418
        return $this;
419
    }
420
421
    /**
422
     * Set Contiguous.
423
     *
424
     * @param bool $contiguous
425
     */
426
    public function setContiguous($contiguous)
427
    {
428
        $this->contiguous = (bool) $contiguous;
429
        if (!$contiguous) {
430
            $this->contiguousRow = -1;
431
        }
432
433
        return $this;
434
    }
435
436
    /**
437
     * Get Contiguous.
438
     *
439
     * @return bool
440
     */
441
    public function getContiguous()
442
    {
443
        return $this->contiguous;
444
    }
445
446
    /**
447
     * Can the current IReader read the file?
448
     *
449
     * @param string $pFilename
450
     *
451
     * @throws Exception
452
     *
453
     * @return bool
454
     */
455 2
    public function canRead($pFilename)
456
    {
457
        // Check if file exists
458
        try {
459 2
            $this->openFile($pFilename);
460
        } catch (Exception $e) {
461
            return false;
462
        }
463
464 2
        fclose($this->fileHandle);
465
466 2
        return true;
467
    }
468
}
469