Passed
Pull Request — 4 (#10237)
by Maxime
07:41
created

CsvBulkLoader::processAll()   D

Complexity

Conditions 16
Paths 172

Size

Total Lines 80
Code Lines 49

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 16
eloc 49
c 1
b 0
f 0
nc 172
nop 2
dl 0
loc 80
rs 4.9666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace SilverStripe\Dev;
4
5
use League\Csv\MapIterator;
6
use League\Csv\Reader;
7
use SilverStripe\Control\Director;
8
use SilverStripe\ORM\DataObject;
9
10
/**
11
 * Utility class to facilitate complex CSV-imports by defining column-mappings
12
 * and custom converters.
13
 *
14
 * Uses the fgetcsv() function to process CSV input. Accepts a file-handler as
15
 * input.
16
 *
17
 * @see http://tools.ietf.org/html/rfc4180
18
 *
19
 * @todo Support for deleting existing records not matched in the import
20
 * (through relation checks)
21
 */
22
class CsvBulkLoader extends BulkLoader
23
{
24
25
    /**
26
     * Delimiter character (Default: comma).
27
     *
28
     * @var string
29
     */
30
    public $delimiter = ',';
31
32
    /**
33
     * Enclosure character (Default: doublequote)
34
     *
35
     * @var string
36
     */
37
    public $enclosure = '"';
38
39
    /**
40
     * Identifies if csv the has a header row.
41
     *
42
     * @var boolean
43
     */
44
    public $hasHeaderRow = true;
45
46
    /**
47
     * Number of lines to split large CSV files into.
48
     *
49
     * @var int
50
     *
51
     * @config
52
     */
53
    private static $lines = 1000;
54
55
    /**
56
     * @inheritDoc
57
     */
58
    public function preview($filepath)
59
    {
60
        return $this->processAll($filepath, true);
61
    }
62
63
    /**
64
     * @param string $filepath
65
     * @param boolean $preview
66
     *
67
     * @return null|BulkLoader_Result
68
     */
69
    protected function processAll($filepath, $preview = false)
70
    {
71
        $this->extend('onBeforeProcessAll', $filepath, $preview);
72
73
        $result = BulkLoader_Result::create();
74
75
        try {
76
            $filepath = Director::getAbsFile($filepath);
77
            $csvReader = Reader::createFromPath($filepath, 'r');
78
            $csvReader->setDelimiter($this->delimiter);
79
80
            // league/csv 9
81
            if (method_exists($csvReader, 'skipInputBOM')) {
82
                $csvReader->skipInputBOM();
83
            // league/csv 8
84
            } else {
85
                $csvReader->stripBom(true);
0 ignored issues
show
Bug introduced by
true of type true is incompatible with the type Iterator expected by parameter $iterator of League\Csv\Reader::stripBOM(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

85
                $csvReader->stripBom(/** @scrutinizer ignore-type */ true);
Loading history...
Bug introduced by
The call to League\Csv\Reader::stripBOM() has too few arguments starting with bom. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

85
                $csvReader->/** @scrutinizer ignore-call */ 
86
                            stripBom(true);

This check compares calls to functions or methods with their respective definitions. If the call has less arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
86
            }
87
88
            $tabExtractor = function ($row, $rowOffset) {
89
                foreach ($row as &$item) {
90
                    // [SS-2017-007] Ensure all cells with leading tab and then [@=+] have the tab removed on import
91
                    if (preg_match("/^\t[\-@=\+]+.*/", $item)) {
92
                        $item = ltrim($item, "\t");
93
                    }
94
                }
95
                return $row;
96
            };
97
98
            if ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
99
                $headerMap = $this->getNormalisedColumnMap();
100
101
                $remapper = function ($row, $rowOffset) use ($headerMap, $tabExtractor) {
102
                    $row = $tabExtractor($row, $rowOffset);
103
                    foreach ($headerMap as $column => $renamedColumn) {
104
                        if ($column == $renamedColumn) {
105
                            continue;
106
                        }
107
                        if (array_key_exists($column, $row)) {
108
                            if (strpos($renamedColumn, '_ignore_') !== 0) {
109
                                $row[$renamedColumn] = $row[$column];
110
                            }
111
                            unset($row[$column]);
112
                        }
113
                    }
114
                    return $row;
115
                };
116
            } else {
117
                $remapper = $tabExtractor;
118
            }
119
120
            if ($this->hasHeaderRow) {
121
                if (method_exists($csvReader, 'fetchAssoc')) {
122
                    $rows = $csvReader->fetchAssoc(0, $remapper);
123
                } else {
124
                    $csvReader->setHeaderOffset(0);
125
                    $rows = new MapIterator($csvReader->getRecords(), $remapper);
126
                }
127
            } elseif ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
128
                if (method_exists($csvReader, 'fetchAssoc')) {
129
                    $rows = $csvReader->fetchAssoc($headerMap, $remapper);
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $headerMap does not seem to be defined for all execution paths leading up to this point.
Loading history...
130
                } else {
131
                    $rows = new MapIterator($csvReader->getRecords($headerMap), $remapper);
132
                }
133
            }
134
135
            foreach ($rows as $row) {
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $rows does not seem to be defined for all execution paths leading up to this point.
Loading history...
136
                $this->processRecord($row, $this->columnMap, $result, $preview);
137
            }
138
        } catch (\Exception $e) {
139
            $failedMessage = sprintf("Failed to parse %s", $filepath);
140
            if (Director::isDev()) {
141
                $failedMessage = sprintf($failedMessage . " because %s", $e->getMessage());
142
            }
143
            print $failedMessage . PHP_EOL;
144
        }
145
146
        $this->extend('onAfterProcessAll', $result, $preview);
147
148
        return $result;
149
    }
150
151
    protected function getNormalisedColumnMap()
152
    {
153
        $map = [];
154
        foreach ($this->columnMap as $column => $newColumn) {
155
            if (is_string($newColumn) && strpos($newColumn, "->") === 0) {
156
                $map[$column] = $column;
157
            } elseif (is_null($newColumn)) {
158
                // the column map must consist of unique scalar values
159
                // `null` can be present multiple times and is not scalar
160
                // so we name it in a standard way so we can remove it later
161
                $map[$column] = '_ignore_' . $column;
162
            } else {
163
                $map[$column] = $newColumn;
164
            }
165
        }
166
        return $map;
167
    }
168
169
    /**
170
     * Splits a large file up into many smaller files.
171
     *
172
     * @param string $path Path to large file to split
173
     * @param int $lines Number of lines per file
174
     *
175
     * @return array List of file paths
176
     */
177
    protected function splitFile($path, $lines = null)
178
    {
179
        Deprecation::notice('5.0', 'splitFile is deprecated, please process files using a stream');
180
181
        if (!is_int($lines)) {
182
            $lines = $this->config()->get("lines");
183
        }
184
185
        $new = $this->getNewSplitFileName();
186
187
        $to = fopen($new, 'w+');
188
        $from = fopen($path, 'r');
189
190
        $header = null;
191
192
        if ($this->hasHeaderRow) {
193
            $header = fgets($from);
194
            fwrite($to, $header);
195
        }
196
197
        $files = [];
198
        $files[] = $new;
199
200
        $count = 0;
201
202
        while (!feof($from)) {
203
            fwrite($to, fgets($from));
204
205
            $count++;
206
207
            if ($count >= $lines) {
208
                fclose($to);
209
210
                // get a new temporary file name, to write the next lines to
211
                $new = $this->getNewSplitFileName();
212
213
                $to = fopen($new, 'w+');
214
215
                if ($this->hasHeaderRow) {
216
                    // add the headers to the new file
217
                    fwrite($to, $header);
0 ignored issues
show
Bug introduced by
It seems like $header can also be of type null; however, parameter $data of fwrite() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

217
                    fwrite($to, /** @scrutinizer ignore-type */ $header);
Loading history...
218
                }
219
220
                $files[] = $new;
221
222
                $count = 0;
223
            }
224
        }
225
226
        fclose($to);
227
228
        return $files;
229
    }
230
231
    /**
232
     * @return string
233
     */
234
    protected function getNewSplitFileName()
235
    {
236
        Deprecation::notice('5.0', 'getNewSplitFileName is deprecated, please name your files yourself');
237
        return TEMP_PATH . DIRECTORY_SEPARATOR . uniqid(str_replace('\\', '_', static::class), true) . '.csv';
238
    }
239
240
    /**
241
     * @param string $filepath
242
     * @param boolean $preview
243
     *
244
     * @return BulkLoader_Result
245
     */
246
    protected function processChunk($filepath, $preview = false)
247
    {
248
        Deprecation::notice('5.0', 'processChunk is deprecated, please process rows individually');
249
        $results = BulkLoader_Result::create();
250
251
        $csv = new CSVParser(
252
            $filepath,
253
            $this->delimiter,
254
            $this->enclosure
255
        );
256
257
        // ColumnMap has two uses, depending on whether hasHeaderRow is set
258
        if ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
259
            // if the map goes to a callback, use the same key value as the map
260
            // value, rather than function name as multiple keys may use the
261
            // same callback
262
            $map = [];
263
            foreach ($this->columnMap as $k => $v) {
264
                if (strpos($v, "->") === 0) {
265
                    $map[$k] = $k;
266
                } else {
267
                    $map[$k] = $v;
268
                }
269
            }
270
271
            if ($this->hasHeaderRow) {
272
                $csv->mapColumns($map);
273
            } else {
274
                $csv->provideHeaderRow($map);
275
            }
276
        }
277
278
        foreach ($csv as $row) {
279
            $this->processRecord($row, $this->columnMap, $results, $preview);
280
        }
281
282
        return $results;
283
    }
284
285
    /**
286
     * @todo Better messages for relation checks and duplicate detection
287
     * Note that columnMap isn't used.
288
     *
289
     * @param array $record
290
     * @param array $columnMap
291
     * @param BulkLoader_Result $results
292
     * @param boolean $preview
293
     *
294
     * @return int
295
     */
296
    protected function processRecord($record, $columnMap, &$results, $preview = false)
297
    {
298
        $class = $this->objectClass;
299
300
        // find existing object, or create new one
301
        $existingObj = $this->findExistingObject($record, $columnMap);
302
        /** @var DataObject $obj */
303
        $obj = ($existingObj) ? $existingObj : new $class();
0 ignored issues
show
introduced by
$existingObj is of type SilverStripe\ORM\DataObject, thus it always evaluated to true.
Loading history...
304
        $schema = DataObject::getSchema();
305
306
        // first run: find/create any relations and store them on the object
307
        // we can't combine runs, as other columns might rely on the relation being present
308
        foreach ($record as $fieldName => $val) {
309
            // don't bother querying of value is not set
310
            if ($this->isNullValue($val)) {
311
                continue;
312
            }
313
314
            // checking for existing relations
315
            if (isset($this->relationCallbacks[$fieldName])) {
316
                // trigger custom search method for finding a relation based on the given value
317
                // and write it back to the relation (or create a new object)
318
                $relationName = $this->relationCallbacks[$fieldName]['relationname'];
319
                /** @var DataObject $relationObj */
320
                $relationObj = null;
321
                if ($this->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
322
                    $relationObj = $this->{$this->relationCallbacks[$fieldName]['callback']}($obj, $val, $record);
323
                } elseif ($obj->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
324
                    $relationObj = $obj->{$this->relationCallbacks[$fieldName]['callback']}($val, $record);
325
                }
326
                if (!$relationObj || !$relationObj->exists()) {
327
                    $relationClass = $schema->hasOneComponent(get_class($obj), $relationName);
328
                    $relationObj = new $relationClass();
329
                    //write if we aren't previewing
330
                    if (!$preview) {
331
                        $relationObj->write();
332
                    }
333
                }
334
                $obj->{"{$relationName}ID"} = $relationObj->ID;
335
                //write if we are not previewing
336
                if (!$preview) {
337
                    $obj->write();
338
                    $obj->flushCache(); // avoid relation caching confusion
339
                }
340
            } elseif (strpos($fieldName, '.') !== false) {
341
                // we have a relation column with dot notation
342
                [$relationName, $columnName] = explode('.', $fieldName);
343
                // always gives us an component (either empty or existing)
344
                $relationObj = $obj->getComponent($relationName);
345
                if (!$preview) {
346
                    $relationObj->write();
347
                }
348
                $obj->{"{$relationName}ID"} = $relationObj->ID;
349
350
                //write if we are not previewing
351
                if (!$preview) {
352
                    $obj->write();
353
                    $obj->flushCache(); // avoid relation caching confusion
354
                }
355
            }
356
        }
357
358
        // second run: save data
359
360
        foreach ($record as $fieldName => $val) {
361
            // break out of the loop if we are previewing
362
            if ($preview) {
363
                break;
364
            }
365
366
            // look up the mapping to see if this needs to map to callback
367
            $mapped = $this->columnMap && isset($this->columnMap[$fieldName]);
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
368
369
            if ($mapped && strpos($this->columnMap[$fieldName], '->') === 0) {
370
                $funcName = substr($this->columnMap[$fieldName], 2);
371
372
                $this->$funcName($obj, $val, $record);
373
            } elseif ($obj->hasMethod("import{$fieldName}")) {
374
                $obj->{"import{$fieldName}"}($val, $record);
375
            } else {
376
                $obj->update([$fieldName => $val]);
377
            }
378
        }
379
380
        $isChanged = $obj->isChanged();
381
382
        // write record
383
        if (!$preview) {
384
            $obj->write();
385
        }
386
387
        // @todo better message support
388
        $message = '';
389
390
        // save to results
391
        if ($existingObj) {
0 ignored issues
show
introduced by
$existingObj is of type SilverStripe\ORM\DataObject, thus it always evaluated to true.
Loading history...
392
            // We mark as updated regardless of isChanged, since custom formatters and importers
393
            // might have affected relationships and other records.
394
            $results->addUpdated($obj, $message);
395
        } else {
396
            $results->addCreated($obj, $message);
397
        }
398
399
        $this->extend('onAfterProcessRecord', $obj, $preview, $isChanged);
400
401
        $objID = $obj->ID;
402
403
        $obj->destroy();
404
405
        // memory usage
406
        unset($existingObj, $obj);
407
408
        return $objID;
409
    }
410
411
    /**
412
     * Find an existing objects based on one or more uniqueness columns
413
     * specified via {@link self::$duplicateChecks}.
414
     *
415
     * @todo support $columnMap
416
     *
417
     * @param array $record CSV data column
418
     * @param array $columnMap
419
     * @return DataObject
420
     */
421
    public function findExistingObject($record, $columnMap = [])
422
    {
423
        $SNG_objectClass = singleton($this->objectClass);
424
        // checking for existing records (only if not already found)
425
426
        foreach ($this->duplicateChecks as $fieldName => $duplicateCheck) {
427
            $existingRecord = null;
428
            if (is_string($duplicateCheck)) {
429
                // Skip current duplicate check if field value is empty
430
                if (empty($record[$duplicateCheck])) {
431
                    continue;
432
                }
433
434
                // Check existing record with this value
435
                $dbFieldValue = $record[$duplicateCheck];
436
                $existingRecord = DataObject::get($this->objectClass)
437
                    ->filter($duplicateCheck, $dbFieldValue)
438
                    ->first();
439
440
                if ($existingRecord) {
441
                    return $existingRecord;
442
                }
443
            } elseif (is_array($duplicateCheck) && isset($duplicateCheck['callback'])) {
444
                if ($this->hasMethod($duplicateCheck['callback'])) {
445
                    $existingRecord = $this->{$duplicateCheck['callback']}($record[$fieldName], $record);
446
                } elseif ($SNG_objectClass->hasMethod($duplicateCheck['callback'])) {
447
                    $existingRecord = $SNG_objectClass->{$duplicateCheck['callback']}($record[$fieldName], $record);
448
                } else {
449
                    throw new \RuntimeException(
450
                        "CsvBulkLoader::processRecord():"
451
                        . " {$duplicateCheck['callback']} not found on importer or object class."
452
                    );
453
                }
454
455
                if ($existingRecord) {
456
                    return $existingRecord;
457
                }
458
            } else {
459
                throw new \InvalidArgumentException(
460
                    'CsvBulkLoader::processRecord(): Wrong format for $duplicateChecks'
461
                );
462
            }
463
        }
464
465
        return false;
466
    }
467
468
    /**
469
     * Determine whether any loaded files should be parsed with a
470
     * header-row (otherwise we rely on {@link self::$columnMap}.
471
     *
472
     * @return boolean
473
     */
474
    public function hasHeaderRow()
475
    {
476
        return ($this->hasHeaderRow || isset($this->columnMap));
477
    }
478
}
479