Completed
Push — develop ( 0bd3a9...3ee9cc )
by Adrien
26:43
created

Csv::inferSeparator()   F

Complexity

Conditions 17
Paths 1041

Size

Total Lines 76
Code Lines 45

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 44
CRAP Score 17.0031

Importance

Changes 0
Metric Value
cc 17
eloc 45
nc 1041
nop 0
dl 0
loc 76
ccs 44
cts 45
cp 0.9778
crap 17.0031
rs 2.2706
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace PhpOffice\PhpSpreadsheet\Reader;
4
5
use PhpOffice\PhpSpreadsheet\Spreadsheet;
6
7
/**
8
 * Copyright (c) 2006 - 2016 PhpSpreadsheet.
9
 *
10
 * This library is free software; you can redistribute it and/or
11
 * modify it under the terms of the GNU Lesser General Public
12
 * License as published by the Free Software Foundation; either
13
 * version 2.1 of the License, or (at your option) any later version.
14
 *
15
 * This library is distributed in the hope that it will be useful,
16
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
17
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
18
 * Lesser General Public License for more details.
19
 *
20
 * You should have received a copy of the GNU Lesser General Public
21
 * License along with this library; if not, write to the Free Software
22
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
23
 *
24
 * @category   PhpSpreadsheet
25
 *
26
 * @copyright  Copyright (c) 2006 - 2016 PhpSpreadsheet (https://github.com/PHPOffice/PhpSpreadsheet)
27
 * @license    http://www.gnu.org/licenses/old-licenses/lgpl-2.1.txt    LGPL
28
 */
29
class Csv extends BaseReader implements IReader
30
{
31
    /**
32
     * Input encoding.
33
     *
34
     * @var string
35
     */
36
    private $inputEncoding = 'UTF-8';
37
38
    /**
39
     * Delimiter.
40
     *
41
     * @var string
42
     */
43
    private $delimiter = null;
44
45
    /**
46
     * Enclosure.
47
     *
48
     * @var string
49
     */
50
    private $enclosure = '"';
51
52
    /**
53
     * Sheet index to read.
54
     *
55
     * @var int
56
     */
57
    private $sheetIndex = 0;
58
59
    /**
60
     * Load rows contiguously.
61
     *
62
     * @var bool
63
     */
64
    private $contiguous = false;
65
66
    /**
67
     * Row counter for loading rows contiguously.
68
     *
69
     * @var int
70
     */
71
    private $contiguousRow = -1;
72
73
    /**
74
     * Create a new CSV Reader instance.
75
     */
76 3
    public function __construct()
77
    {
78 3
        $this->readFilter = new DefaultReadFilter();
79 3
    }
80
81
    /**
82
     * Set input encoding.
83
     *
84
     * @param string $pValue Input encoding, eg: 'UTF-8'
85
     */
86
    public function setInputEncoding($pValue)
87
    {
88
        $this->inputEncoding = $pValue;
89
90
        return $this;
91
    }
92
93
    /**
94
     * Get input encoding.
95
     *
96
     * @return string
97
     */
98
    public function getInputEncoding()
99
    {
100
        return $this->inputEncoding;
101
    }
102
103
    /**
104
     * Move filepointer past any BOM marker.
105
     */
106 3
    protected function skipBOM()
107
    {
108 3
        rewind($this->fileHandle);
109
110 3
        switch ($this->inputEncoding) {
111 3
            case 'UTF-8':
112 3
                fgets($this->fileHandle, 4) == "\xEF\xBB\xBF" ?
113 3
                    fseek($this->fileHandle, 3) : fseek($this->fileHandle, 0);
114 3
                break;
115 View Code Duplication
            case 'UTF-16LE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
116
                fgets($this->fileHandle, 3) == "\xFF\xFE" ?
117
                    fseek($this->fileHandle, 2) : fseek($this->fileHandle, 0);
118
                break;
119 View Code Duplication
            case 'UTF-16BE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
120
                fgets($this->fileHandle, 3) == "\xFE\xFF" ?
121
                    fseek($this->fileHandle, 2) : fseek($this->fileHandle, 0);
122
                break;
123 View Code Duplication
            case 'UTF-32LE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
124
                fgets($this->fileHandle, 5) == "\xFF\xFE\x00\x00" ?
125
                    fseek($this->fileHandle, 4) : fseek($this->fileHandle, 0);
126
                break;
127 View Code Duplication
            case 'UTF-32BE':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
128
                fgets($this->fileHandle, 5) == "\x00\x00\xFE\xFF" ?
129
                    fseek($this->fileHandle, 4) : fseek($this->fileHandle, 0);
130
                break;
131
            default:
132
                break;
133
        }
134 3
    }
135
136
    /**
137
     * Identify any separator that is explicitly set in the file.
138
     */
139 3
    protected function checkSeparator()
140
    {
141 3
        $line = fgets($this->fileHandle);
142 3
        if ($line === false) {
143
            return;
144
        }
145
146 3
        if ((strlen(trim($line, "\r\n")) == 5) && (stripos($line, 'sep=') === 0)) {
147
            $this->delimiter = substr($line, 4, 1);
148
149
            return;
150
        }
151
152 3
        return $this->skipBOM();
153
    }
154
155
    /**
156
     * Infer the separator if it isn't explicitly set in the file or specified by the user.
157
     */
158 3
    protected function inferSeparator()
159
    {
160 3
        if ($this->delimiter !== null) {
161 1
            return;
162
        }
163
164 2
        $potentialDelimiters = [',', ';', "\t", '|', ':', ' '];
165 2
        $counts = [];
166 2
        foreach ($potentialDelimiters as $delimiter) {
167 2
            $counts[$delimiter] = [];
168
        }
169
170
        // Count how many times each of the potential delimiters appears in each line
171 2
        $numberLines = 0;
172 2
        while (($line = fgets($this->fileHandle)) !== false && (++$numberLines < 1000)) {
173 2
            $countLine = [];
174 2
            for ($i = strlen($line) - 1; $i >= 0; --$i) {
175 2
                $char = $line[$i];
176 2
                if (isset($counts[$char])) {
177 2
                    if (!isset($countLine[$char])) {
178 2
                        $countLine[$char] = 0;
179
                    }
180 2
                    ++$countLine[$char];
181
                }
182
            }
183 2
            foreach ($potentialDelimiters as $delimiter) {
184 2
                $counts[$delimiter][] = isset($countLine[$delimiter])
185 2
                    ? $countLine[$delimiter]
186 2
                    : 0;
187
            }
188
        }
189
190
        // Calculate the mean square deviations for each delimiter (ignoring delimiters that haven't been found consistently)
191 2
        $meanSquareDeviations = [];
192 2
        $middleIdx = floor(($numberLines - 1) / 2);
193
194 2
        foreach ($potentialDelimiters as $delimiter) {
195 2
            $series = $counts[$delimiter];
196 2
            sort($series);
197
198 2
            $median = ($numberLines % 2)
199 2
                ? $series[$middleIdx]
200 2
                : ($series[$middleIdx] + $series[$middleIdx + 1]) / 2;
201
202 2
            if ($median === 0) {
203 2
                continue;
204
            }
205
206 2
            $meanSquareDeviations[$delimiter] = array_reduce(
207
                $series,
208 2
                function ($sum, $value) use ($median) {
209 2
                    return $sum + pow($value - $median, 2);
210 2
                }
211 2
            ) / count($series);
212
        }
213
214
        // ... and pick the delimiter with the smallest mean square deviation (in case of ties, the order in potentialDelimiters is respected)
215 2
        $min = INF;
216 2
        foreach ($potentialDelimiters as $delimiter) {
217 2
            if (!isset($meanSquareDeviations[$delimiter])) {
218 2
                continue;
219
            }
220
221 2
            if ($meanSquareDeviations[$delimiter] < $min) {
222 2
                $min = $meanSquareDeviations[$delimiter];
223 2
                $this->delimiter = $delimiter;
224
            }
225
        }
226
227
        // If no delimiter could be detected, fall back to the default
228 2
        if ($this->delimiter === null) {
229
            $this->delimiter = reset($potentialDelimiters);
0 ignored issues
show
Documentation Bug introduced by
It seems like reset($potentialDelimiters) can also be of type false. However, the property $delimiter is declared as type string. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
230
        }
231
232 2
        return $this->skipBOM();
233
    }
234
235
    /**
236
     * Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
237
     *
238
     * @param string $pFilename
239
     *
240
     * @throws Exception
241
     */
242
    public function listWorksheetInfo($pFilename)
243
    {
244
        // Open file
245
        if (!$this->canRead($pFilename)) {
246
            throw new Exception($pFilename . ' is an Invalid Spreadsheet file.');
247
        }
248
        $this->openFile($pFilename);
249
        $fileHandle = $this->fileHandle;
250
251
        // Skip BOM, if any
252
        $this->skipBOM();
253
        $this->checkSeparator();
254
        $this->inferSeparator();
255
256
        $worksheetInfo = [];
257
        $worksheetInfo[0]['worksheetName'] = 'Worksheet';
258
        $worksheetInfo[0]['lastColumnLetter'] = 'A';
259
        $worksheetInfo[0]['lastColumnIndex'] = 0;
260
        $worksheetInfo[0]['totalRows'] = 0;
261
        $worksheetInfo[0]['totalColumns'] = 0;
262
263
        // Loop through each line of the file in turn
264
        while (($rowData = fgetcsv($fileHandle, 0, $this->delimiter, $this->enclosure)) !== false) {
265
            ++$worksheetInfo[0]['totalRows'];
266
            $worksheetInfo[0]['lastColumnIndex'] = max($worksheetInfo[0]['lastColumnIndex'], count($rowData) - 1);
267
        }
268
269
        $worksheetInfo[0]['lastColumnLetter'] = \PhpOffice\PhpSpreadsheet\Cell::stringFromColumnIndex($worksheetInfo[0]['lastColumnIndex']);
270
        $worksheetInfo[0]['totalColumns'] = $worksheetInfo[0]['lastColumnIndex'] + 1;
271
272
        // Close file
273
        fclose($fileHandle);
274
275
        return $worksheetInfo;
276
    }
277
278
    /**
279
     * Loads Spreadsheet from file.
280
     *
281
     * @param string $pFilename
282
     *
283
     * @throws Exception
284
     *
285
     * @return \PhpOffice\PhpSpreadsheet\Spreadsheet
286
     */
287 3
    public function load($pFilename)
288
    {
289
        // Create new Spreadsheet
290 3
        $spreadsheet = new \PhpOffice\PhpSpreadsheet\Spreadsheet();
291
292
        // Load into this instance
293 3
        return $this->loadIntoExisting($pFilename, $spreadsheet);
294
    }
295
296
    /**
297
     * Loads PhpSpreadsheet from file into PhpSpreadsheet instance.
298
     *
299
     * @param string $pFilename
300
     * @param Spreadsheet $spreadsheet
301
     *
302
     * @throws Exception
303
     *
304
     * @return Spreadsheet
305
     */
306 3
    public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet)
307
    {
308 3
        $lineEnding = ini_get('auto_detect_line_endings');
309 3
        ini_set('auto_detect_line_endings', true);
310
311
        // Open file
312 3
        if (!$this->canRead($pFilename)) {
313
            throw new Exception($pFilename . ' is an Invalid Spreadsheet file.');
314
        }
315 3
        $this->openFile($pFilename);
316 3
        $fileHandle = $this->fileHandle;
317
318
        // Skip BOM, if any
319 3
        $this->skipBOM();
320 3
        $this->checkSeparator();
321 3
        $this->inferSeparator();
322
323
        // Create new PhpSpreadsheet object
324 3
        while ($spreadsheet->getSheetCount() <= $this->sheetIndex) {
325
            $spreadsheet->createSheet();
326
        }
327 3
        $sheet = $spreadsheet->setActiveSheetIndex($this->sheetIndex);
328
329
        // Set our starting row based on whether we're in contiguous mode or not
330 3
        $currentRow = 1;
331 3
        if ($this->contiguous) {
332
            $currentRow = ($this->contiguousRow == -1) ? $sheet->getHighestRow() : $this->contiguousRow;
333
        }
334
335
        // Loop through each line of the file in turn
336 3
        while (($rowData = fgetcsv($fileHandle, 0, $this->delimiter, $this->enclosure)) !== false) {
337 3
            $columnLetter = 'A';
338 3
            foreach ($rowData as $rowDatum) {
339 3
                if ($rowDatum != '' && $this->readFilter->readCell($columnLetter, $currentRow)) {
340
                    // Convert encoding if necessary
341 3
                    if ($this->inputEncoding !== 'UTF-8') {
342
                        $rowDatum = \PhpOffice\PhpSpreadsheet\Shared\StringHelper::convertEncoding($rowDatum, 'UTF-8', $this->inputEncoding);
343
                    }
344
345
                    // Set cell value
346 3
                    $sheet->getCell($columnLetter . $currentRow)->setValue($rowDatum);
347
                }
348 3
                ++$columnLetter;
349
            }
350 3
            ++$currentRow;
351
        }
352
353
        // Close file
354 3
        fclose($fileHandle);
355
356 3
        if ($this->contiguous) {
357
            $this->contiguousRow = $currentRow;
358
        }
359
360 3
        ini_set('auto_detect_line_endings', $lineEnding);
361
362
        // Return
363 3
        return $spreadsheet;
364
    }
365
366
    /**
367
     * Get delimiter.
368
     *
369
     * @return string
370
     */
371 1
    public function getDelimiter()
372
    {
373 1
        return $this->delimiter;
374
    }
375
376
    /**
377
     * Set delimiter.
378
     *
379
     * @param string $delimiter Delimiter, eg: ','
380
     *
381
     * @return CSV
382
     */
383 1
    public function setDelimiter($delimiter)
384
    {
385 1
        $this->delimiter = $delimiter;
386
387 1
        return $this;
388
    }
389
390
    /**
391
     * Get enclosure.
392
     *
393
     * @return string
394
     */
395
    public function getEnclosure()
396
    {
397
        return $this->enclosure;
398
    }
399
400
    /**
401
     * Set enclosure.
402
     *
403
     * @param string $enclosure Enclosure, defaults to "
404
     *
405
     * @return CSV
406
     */
407 1
    public function setEnclosure($enclosure)
408
    {
409 1
        if ($enclosure == '') {
410
            $enclosure = '"';
411
        }
412 1
        $this->enclosure = $enclosure;
413
414 1
        return $this;
415
    }
416
417
    /**
418
     * Get sheet index.
419
     *
420
     * @return int
421
     */
422
    public function getSheetIndex()
423
    {
424
        return $this->sheetIndex;
425
    }
426
427
    /**
428
     * Set sheet index.
429
     *
430
     * @param int $pValue Sheet index
431
     *
432
     * @return CSV
433
     */
434 1
    public function setSheetIndex($pValue)
435
    {
436 1
        $this->sheetIndex = $pValue;
437
438 1
        return $this;
439
    }
440
441
    /**
442
     * Set Contiguous.
443
     *
444
     * @param bool $contiguous
445
     */
446
    public function setContiguous($contiguous)
447
    {
448
        $this->contiguous = (bool) $contiguous;
449
        if (!$contiguous) {
450
            $this->contiguousRow = -1;
451
        }
452
453
        return $this;
454
    }
455
456
    /**
457
     * Get Contiguous.
458
     *
459
     * @return bool
460
     */
461
    public function getContiguous()
462
    {
463
        return $this->contiguous;
464
    }
465
466
    /**
467
     * Can the current IReader read the file?
468
     *
469
     * @param string $pFilename
470
     *
471
     * @throws Exception
472
     *
473
     * @return bool
474
     */
475 3
    public function canRead($pFilename)
476
    {
477
        // Check if file exists
478
        try {
479 3
            $this->openFile($pFilename);
480
        } catch (Exception $e) {
481
            return false;
482
        }
483
484 3
        fclose($this->fileHandle);
485
486 3
        return true;
487
    }
488
}
489