Completed
Push — 4 ( 3202ef...3e5a74 )
by Maxime
36s queued 27s
created

CsvBulkLoader::processAll()   D

Complexity

Conditions 16
Paths 172

Size

Total Lines 80
Code Lines 49

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 16
eloc 49
c 1
b 0
f 0
nc 172
nop 2
dl 0
loc 80
rs 4.9666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace SilverStripe\Dev;
4
5
use League\Csv\MapIterator;
6
use League\Csv\Reader;
7
use SilverStripe\Control\Director;
8
use SilverStripe\ORM\DataObject;
9
10
/**
11
 * Utility class to facilitate complex CSV-imports by defining column-mappings
12
 * and custom converters.
13
 *
14
 * Uses the fgetcsv() function to process CSV input. Accepts a file-handler as
15
 * input.
16
 *
17
 * @see http://tools.ietf.org/html/rfc4180
18
 *
19
 * @todo Support for deleting existing records not matched in the import
20
 * (through relation checks)
21
 */
22
class CsvBulkLoader extends BulkLoader
23
{
24
25
    /**
26
     * Delimiter character (Default: comma).
27
     *
28
     * @var string
29
     */
30
    public $delimiter = ',';
31
32
    /**
33
     * Enclosure character (Default: doublequote)
34
     *
35
     * @var string
36
     */
37
    public $enclosure = '"';
38
39
    /**
40
     * Identifies if csv the has a header row.
41
     *
42
     * @var boolean
43
     */
44
    public $hasHeaderRow = true;
45
46
    /**
47
     * Number of lines to split large CSV files into.
48
     *
49
     * @var int
50
     *
51
     * @config
52
     */
53
    private static $lines = 1000;
54
55
    /**
56
     * @inheritDoc
57
     */
58
    public function preview($filepath)
59
    {
60
        return $this->processAll($filepath, true);
61
    }
62
63
    /**
64
     * @param string $filepath
65
     * @param boolean $preview
66
     *
67
     * @return null|BulkLoader_Result
68
     */
69
    protected function processAll($filepath, $preview = false)
70
    {
71
        $this->extend('onBeforeProcessAll', $filepath, $preview);
72
73
        $result = BulkLoader_Result::create();
74
75
        try {
76
            $filepath = Director::getAbsFile($filepath);
77
            $csvReader = Reader::createFromPath($filepath, 'r');
78
            $csvReader->setDelimiter($this->delimiter);
79
80
            // league/csv 9
81
            if (method_exists($csvReader, 'skipInputBOM')) {
82
                $csvReader->skipInputBOM();
83
            // league/csv 8
84
            } else {
85
                $csvReader->stripBom(true);
0 ignored issues
show
Bug introduced by
true of type true is incompatible with the type Iterator expected by parameter $iterator of League\Csv\Reader::stripBOM(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

85
                $csvReader->stripBom(/** @scrutinizer ignore-type */ true);
Loading history...
Bug introduced by
The call to League\Csv\Reader::stripBOM() has too few arguments starting with bom. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

85
                $csvReader->/** @scrutinizer ignore-call */ 
86
                            stripBom(true);

This check compares calls to functions or methods with their respective definitions. If the call has less arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
86
            }
87
88
            $tabExtractor = function ($row, $rowOffset) {
89
                foreach ($row as &$item) {
90
                    // [SS-2017-007] Ensure all cells with leading tab and then [@=+] have the tab removed on import
91
                    if (preg_match("/^\t[\-@=\+]+.*/", $item)) {
92
                        $item = ltrim($item, "\t");
93
                    }
94
                }
95
                return $row;
96
            };
97
98
            if ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
99
                $headerMap = $this->getNormalisedColumnMap();
100
101
                $remapper = function ($row, $rowOffset) use ($headerMap, $tabExtractor) {
102
                    $row = $tabExtractor($row, $rowOffset);
103
                    foreach ($headerMap as $column => $renamedColumn) {
104
                        if ($column == $renamedColumn) {
105
                            continue;
106
                        }
107
                        if (array_key_exists($column, $row)) {
108
                            if (strpos($renamedColumn, '_ignore_') !== 0) {
109
                                $row[$renamedColumn] = $row[$column];
110
                            }
111
                            unset($row[$column]);
112
                        }
113
                    }
114
                    return $row;
115
                };
116
            } else {
117
                $remapper = $tabExtractor;
118
            }
119
120
            if ($this->hasHeaderRow) {
121
                if (method_exists($csvReader, 'fetchAssoc')) {
122
                    $rows = $csvReader->fetchAssoc(0, $remapper);
123
                } else {
124
                    $csvReader->setHeaderOffset(0);
125
                    $rows = new MapIterator($csvReader->getRecords(), $remapper);
126
                }
127
            } elseif ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
128
                if (method_exists($csvReader, 'fetchAssoc')) {
129
                    $rows = $csvReader->fetchAssoc($headerMap, $remapper);
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $headerMap does not seem to be defined for all execution paths leading up to this point.
Loading history...
130
                } else {
131
                    $rows = new MapIterator($csvReader->getRecords($headerMap), $remapper);
132
                }
133
            }
134
135
            foreach ($rows as $row) {
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $rows does not seem to be defined for all execution paths leading up to this point.
Loading history...
136
                $this->processRecord($row, $this->columnMap, $result, $preview);
137
            }
138
        } catch (\Exception $e) {
139
            $failedMessage = sprintf("Failed to parse %s", $filepath);
140
            if (Director::isDev()) {
141
                $failedMessage = sprintf($failedMessage . " because %s", $e->getMessage());
142
            }
143
            print $failedMessage . PHP_EOL;
144
        }
145
146
        $this->extend('onAfterProcessAll', $result, $preview);
147
148
        return $result;
149
    }
150
151
    protected function getNormalisedColumnMap()
152
    {
153
        $map = [];
154
        foreach ($this->columnMap as $column => $newColumn) {
155
            if (strpos($newColumn, "->") === 0) {
156
                $map[$column] = $column;
157
            } elseif (is_null($newColumn)) {
158
                // the column map must consist of unique scalar values
159
                // `null` can be present multiple times and is not scalar
160
                // so we name it in a standard way so we can remove it later
161
                $map[$column] = '_ignore_' . $column;
162
            } else {
163
                $map[$column] = $newColumn;
164
            }
165
        }
166
        return $map;
167
    }
168
169
    /**
170
     * Splits a large file up into many smaller files.
171
     *
172
     * @param string $path Path to large file to split
173
     * @param int $lines Number of lines per file
174
     *
175
     * @return array List of file paths
176
     */
177
    protected function splitFile($path, $lines = null)
178
    {
179
        Deprecation::notice('5.0', 'splitFile is deprecated, please process files using a stream');
180
181
        if (!is_int($lines)) {
182
            $lines = $this->config()->get("lines");
183
        }
184
185
        $new = $this->getNewSplitFileName();
186
187
        $to = fopen($new, 'w+');
188
        $from = fopen($path, 'r');
189
190
        $header = null;
191
192
        if ($this->hasHeaderRow) {
193
            $header = fgets($from);
194
            fwrite($to, $header);
195
        }
196
197
        $files = [];
198
        $files[] = $new;
199
200
        $count = 0;
201
202
        while (!feof($from)) {
203
            fwrite($to, fgets($from));
204
205
            $count++;
206
207
            if ($count >= $lines) {
208
                fclose($to);
209
210
                // get a new temporary file name, to write the next lines to
211
                $new = $this->getNewSplitFileName();
212
213
                $to = fopen($new, 'w+');
214
215
                if ($this->hasHeaderRow) {
216
                    // add the headers to the new file
217
                    fwrite($to, $header);
0 ignored issues
show
Bug introduced by
It seems like $header can also be of type null; however, parameter $data of fwrite() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

217
                    fwrite($to, /** @scrutinizer ignore-type */ $header);
Loading history...
218
                }
219
220
                $files[] = $new;
221
222
                $count = 0;
223
            }
224
        }
225
        fclose($to);
226
227
        return $files;
228
    }
229
230
    /**
231
     * @return string
232
     */
233
    protected function getNewSplitFileName()
234
    {
235
        Deprecation::notice('5.0', 'getNewSplitFileName is deprecated, please name your files yourself');
236
        return TEMP_PATH . DIRECTORY_SEPARATOR . uniqid(str_replace('\\', '_', static::class), true) . '.csv';
237
    }
238
239
    /**
240
     * @param string $filepath
241
     * @param boolean $preview
242
     *
243
     * @return BulkLoader_Result
244
     */
245
    protected function processChunk($filepath, $preview = false)
246
    {
247
        Deprecation::notice('5.0', 'processChunk is deprecated, please process rows individually');
248
        $results = BulkLoader_Result::create();
249
250
        $csv = new CSVParser(
251
            $filepath,
252
            $this->delimiter,
253
            $this->enclosure
254
        );
255
256
        // ColumnMap has two uses, depending on whether hasHeaderRow is set
257
        if ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
258
            // if the map goes to a callback, use the same key value as the map
259
            // value, rather than function name as multiple keys may use the
260
            // same callback
261
            $map = [];
262
            foreach ($this->columnMap as $k => $v) {
263
                if (strpos($v, "->") === 0) {
264
                    $map[$k] = $k;
265
                } else {
266
                    $map[$k] = $v;
267
                }
268
            }
269
270
            if ($this->hasHeaderRow) {
271
                $csv->mapColumns($map);
272
            } else {
273
                $csv->provideHeaderRow($map);
274
            }
275
        }
276
277
        foreach ($csv as $row) {
278
            $this->processRecord($row, $this->columnMap, $results, $preview);
279
        }
280
281
        return $results;
282
    }
283
284
    /**
285
     * @todo Better messages for relation checks and duplicate detection
286
     * Note that columnMap isn't used.
287
     *
288
     * @param array $record
289
     * @param array $columnMap
290
     * @param BulkLoader_Result $results
291
     * @param boolean $preview
292
     *
293
     * @return int
294
     */
295
    protected function processRecord($record, $columnMap, &$results, $preview = false)
296
    {
297
        $class = $this->objectClass;
298
299
        // find existing object, or create new one
300
        $existingObj = $this->findExistingObject($record, $columnMap);
301
        /** @var DataObject $obj */
302
        $obj = ($existingObj) ? $existingObj : new $class();
0 ignored issues
show
introduced by
$existingObj is of type SilverStripe\ORM\DataObject, thus it always evaluated to true.
Loading history...
303
        $schema = DataObject::getSchema();
304
305
        // first run: find/create any relations and store them on the object
306
        // we can't combine runs, as other columns might rely on the relation being present
307
        foreach ($record as $fieldName => $val) {
308
            // don't bother querying of value is not set
309
            if ($this->isNullValue($val)) {
310
                continue;
311
            }
312
313
            // checking for existing relations
314
            if (isset($this->relationCallbacks[$fieldName])) {
315
                // trigger custom search method for finding a relation based on the given value
316
                // and write it back to the relation (or create a new object)
317
                $relationName = $this->relationCallbacks[$fieldName]['relationname'];
318
                /** @var DataObject $relationObj */
319
                $relationObj = null;
320
                if ($this->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
321
                    $relationObj = $this->{$this->relationCallbacks[$fieldName]['callback']}($obj, $val, $record);
322
                } elseif ($obj->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
323
                    $relationObj = $obj->{$this->relationCallbacks[$fieldName]['callback']}($val, $record);
324
                }
325
                if (!$relationObj || !$relationObj->exists()) {
326
                    $relationClass = $schema->hasOneComponent(get_class($obj), $relationName);
327
                    $relationObj = new $relationClass();
328
                    //write if we aren't previewing
329
                    if (!$preview) {
330
                        $relationObj->write();
331
                    }
332
                }
333
                $obj->{"{$relationName}ID"} = $relationObj->ID;
334
                //write if we are not previewing
335
                if (!$preview) {
336
                    $obj->write();
337
                    $obj->flushCache(); // avoid relation caching confusion
338
                }
339
            } elseif (strpos($fieldName, '.') !== false) {
340
                // we have a relation column with dot notation
341
                [$relationName, $columnName] = explode('.', $fieldName);
342
                // always gives us an component (either empty or existing)
343
                $relationObj = $obj->getComponent($relationName);
344
                if (!$preview) {
345
                    $relationObj->write();
346
                }
347
                $obj->{"{$relationName}ID"} = $relationObj->ID;
348
349
                //write if we are not previewing
350
                if (!$preview) {
351
                    $obj->write();
352
                    $obj->flushCache(); // avoid relation caching confusion
353
                }
354
            }
355
        }
356
357
        // second run: save data
358
359
        foreach ($record as $fieldName => $val) {
360
            // break out of the loop if we are previewing
361
            if ($preview) {
362
                break;
363
            }
364
365
            // look up the mapping to see if this needs to map to callback
366
            $mapped = $this->columnMap && isset($this->columnMap[$fieldName]);
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
367
368
            if ($mapped && strpos($this->columnMap[$fieldName], '->') === 0) {
369
                $funcName = substr($this->columnMap[$fieldName], 2);
370
371
                $this->$funcName($obj, $val, $record);
372
            } elseif ($obj->hasMethod("import{$fieldName}")) {
373
                $obj->{"import{$fieldName}"}($val, $record);
374
            } else {
375
                $obj->update([$fieldName => $val]);
376
            }
377
        }
378
379
        $isChanged = $obj->isChanged();
380
381
        // write record
382
        if (!$preview) {
383
            $obj->write();
384
        }
385
386
        // @todo better message support
387
        $message = '';
388
389
        // save to results
390
        if ($existingObj) {
0 ignored issues
show
introduced by
$existingObj is of type SilverStripe\ORM\DataObject, thus it always evaluated to true.
Loading history...
391
            // We mark as updated regardless of isChanged, since custom formatters and importers
392
            // might have affected relationships and other records.
393
            $results->addUpdated($obj, $message);
394
        } else {
395
            $results->addCreated($obj, $message);
396
        }
397
398
        $this->extend('onAfterProcessRecord', $obj, $preview, $isChanged);
399
400
        $objID = $obj->ID;
401
402
        $obj->destroy();
403
404
        // memory usage
405
        unset($existingObj, $obj);
406
407
        return $objID;
408
    }
409
410
    /**
411
     * Find an existing objects based on one or more uniqueness columns
412
     * specified via {@link self::$duplicateChecks}.
413
     *
414
     * @todo support $columnMap
415
     *
416
     * @param array $record CSV data column
417
     * @param array $columnMap
418
     * @return DataObject
419
     */
420
    public function findExistingObject($record, $columnMap = [])
421
    {
422
        $SNG_objectClass = singleton($this->objectClass);
423
        // checking for existing records (only if not already found)
424
425
        foreach ($this->duplicateChecks as $fieldName => $duplicateCheck) {
426
            $existingRecord = null;
427
            if (is_string($duplicateCheck)) {
428
                // Skip current duplicate check if field value is empty
429
                if (empty($record[$duplicateCheck])) {
430
                    continue;
431
                }
432
433
                // Check existing record with this value
434
                $dbFieldValue = $record[$duplicateCheck];
435
                $existingRecord = DataObject::get($this->objectClass)
436
                    ->filter($duplicateCheck, $dbFieldValue)
437
                    ->first();
438
439
                if ($existingRecord) {
440
                    return $existingRecord;
441
                }
442
            } elseif (is_array($duplicateCheck) && isset($duplicateCheck['callback'])) {
443
                if ($this->hasMethod($duplicateCheck['callback'])) {
444
                    $existingRecord = $this->{$duplicateCheck['callback']}($record[$fieldName], $record);
445
                } elseif ($SNG_objectClass->hasMethod($duplicateCheck['callback'])) {
446
                    $existingRecord = $SNG_objectClass->{$duplicateCheck['callback']}($record[$fieldName], $record);
447
                } else {
448
                    throw new \RuntimeException(
449
                        "CsvBulkLoader::processRecord():"
450
                        . " {$duplicateCheck['callback']} not found on importer or object class."
451
                    );
452
                }
453
454
                if ($existingRecord) {
455
                    return $existingRecord;
456
                }
457
            } else {
458
                throw new \InvalidArgumentException(
459
                    'CsvBulkLoader::processRecord(): Wrong format for $duplicateChecks'
460
                );
461
            }
462
        }
463
464
        return false;
465
    }
466
467
    /**
468
     * Determine whether any loaded files should be parsed with a
469
     * header-row (otherwise we rely on {@link self::$columnMap}.
470
     *
471
     * @return boolean
472
     */
473
    public function hasHeaderRow()
474
    {
475
        return ($this->hasHeaderRow || isset($this->columnMap));
476
    }
477
}
478