Completed
Pull Request — master (#376)
by ignace nyamagana
01:47
created

EmptyEscapeParser::filterDocument()   A

Complexity

Conditions 3
Paths 2

Size

Total Lines 13

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 3
CRAP Score 5.1971

Importance

Changes 0
Metric Value
cc 3
nc 2
nop 1
dl 0
loc 13
ccs 3
cts 8
cp 0.375
crap 5.1971
rs 9.8333
c 0
b 0
f 0
1
<?php
2
3
/**
4
 * League.Csv (https://csv.thephpleague.com)
5
 *
6
 * (c) Ignace Nyamagana Butera <[email protected]>
7
 *
8
 * For the full copyright and license information, please view the LICENSE
9
 * file that was distributed with this source code.
10
 */
11
12
declare(strict_types=1);
13
14
namespace League\Csv\Polyfill;
15
16
use League\Csv\Stream;
17
use function explode;
18
use function get_class;
19
use function in_array;
20
use function ltrim;
21
use function rtrim;
22
use function sprintf;
23
use function str_replace;
24
use function substr;
25
26
/**
27
 * A Polyfill to PHP's \SplFileObject to enable parsing the CSV document
28
 * without taking into account the escape character.
29
 *
30
 * @see https://php.net/manual/en/function.fgetcsv.php
31
 * @see https://php.net/manual/en/function.fgets.php
32
 * @see https://tools.ietf.org/html/rfc4180
33
 * @see http://edoceo.com/utilitas/csv-file-format
34
 *
35
 * @internal used internally to parse a CSV document without using the escape character
36
 */
37
final class EmptyEscapeParser
38
{
39
    /**
40
     * @internal
41
     */
42
    const FIELD_BREAKS = [false, '', "\r\n", "\n", "\r"];
43
44
    /**
45
     * @var \SplFileObject|Stream
46
     */
47
    private static $document;
48
49
    /**
50
     * @var string
51
     */
52
    private static $delimiter;
53
54
    /**
55
     * @var string
56
     */
57
    private static $enclosure;
58
59
    /**
60
     * @var string
61
     */
62
    private static $trim_mask;
63
64
    /**
65
     * @var string|false
66
     */
67
    private static $line;
68
69
    /**
70
     * @codeCoverageIgnore
71
     */
72
    private function __construct()
73
    {
74
    }
75
76
    /**
77
     * Converts the document into a CSV record iterator.
78
     *
79
     * In PHP7.4+ you'll be able to do
80
     *
81
     * <code>
82
     * $file = new \SplFileObject('/path/to/file.csv', 'r');
83
     * $file->setFlags(\SplFileObject::READ_CSV | \SplFileObject::READ_AHEAD | \SplFileObject::SKIP_EMPTY);
84
     * $file->setCsvControl($delimiter, $enclosure, '');
85
     * foreach ($file as $record) {
86
     *    //$record escape mechanism is blocked by the empty string
87
     * }
88
     * </code>
89
     *
90
     * In PHP7.3- you can do
91
     *
92
     * <code>
93
     * $file = new \SplFileObject('/path/to/file.csv', 'r');
94
     * $it = EmptyEscapeParser::parse($file); //parsing will be done while ignoring the escape character value.
95
     * foreach ($it as $record) {
96
     *    //fgetcsv is not directly use hence the escape char is not taken into account
97
     * }
98
     * </code>
99
     *
100
     * Each record array contains strings elements.
101
     *
102
     * @param \SplFileObject|Stream $document
103
     *
104
     * @return \Generator|array[]
105
     */
106 45
    public static function parse($document): \Generator
107
    {
108 45
        self::$document = self::filterDocument($document);
0 ignored issues
show
Documentation introduced by
$document is of type object<SplFileObject>|object<League\Csv\Stream>, but the function expects a object<League\Csv\Polyfill\object>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
109 42
        list(self::$delimiter, self::$enclosure, ) = self::$document->getCsvControl();
110 42
        self::$trim_mask = str_replace([self::$delimiter, self::$enclosure], '', " \t\0\x0B");
111 42
        self::$document->setFlags(0);
112 42
        self::$document->rewind();
113 42
        while (self::$document->valid()) {
114 42
            $record = self::extractRecord();
115 42
            if ([null] === $record || !in_array(null, $record, true)) {
116 39
                yield $record;
117
            }
118
        }
119 42
    }
120
121
    /**
122
     * Filters the submitted document.
123
     *
124
     * @return \SplFileObject|Stream
125
     */
126 3
    private static function filterDocument(object $document)
127
    {
128 3
        if ($document instanceof Stream || $document instanceof \SplFileObject) {
129 3
            return $document;
130
        }
131
132
        throw new \TypeError(sprintf(
133
            '%s::parse expects parameter 1 to be a %s or a \SplFileObject object, %s given',
134
            self::class,
135
            Stream::class,
136
            get_class($document)
137
        ));
138
    }
139
140
    /**
141
     * Extracts a record form the CSV document.
142
     */
143 42
    private static function extractRecord(): array
144
    {
145 42
        $record = [];
146 42
        self::$line = self::$document->fgets();
147
        do {
148 42
            $is_field_enclosed = false;
149 42
            $buffer = '';
150 42
            if (false !== self::$line) {
151 42
                $buffer = ltrim(self::$line, self::$trim_mask);
152
            }
153
154 42
            if (($buffer[0] ?? '') === self::$enclosure) {
155 30
                $is_field_enclosed = true;
156 30
                self::$line = $buffer;
157
            }
158
159 42
            $record[] = $is_field_enclosed ? self::extractEnclosedFieldContent() : self::extractFieldContent();
160 42
        } while (false !== self::$line);
161
162 42
        return $record;
163
    }
164
165
    /**
166
     * Extracts the content from a field without enclosure.
167
     *
168
     * - Field content can not spread on multiple document lines.
169
     * - Content must be preserved.
170
     * - Trailing line-breaks must be removed.
171
     *
172
     * @return string|null
173
     */
174 42
    private static function extractFieldContent()
175
    {
176 42
        if (in_array(self::$line, self::FIELD_BREAKS, true)) {
177 6
            self::$line = false;
178
179 6
            return null;
180
        }
181
182
        /** @var array<string> $result */
183 39
        $result = explode(self::$delimiter, self::$line, 2);
184
        /** @var string $content */
185 39
        [$content, $remainder] = $result + [1 => false];
0 ignored issues
show
Bug introduced by
The variable $content does not exist. Did you forget to declare it?

This check marks access to variables or properties that have not been declared yet. While PHP has no explicit notion of declaring a variable, accessing it before a value is assigned to it is most likely a bug.

Loading history...
Bug introduced by
The variable $remainder does not exist. Did you forget to declare it?

This check marks access to variables or properties that have not been declared yet. While PHP has no explicit notion of declaring a variable, accessing it before a value is assigned to it is most likely a bug.

Loading history...
186
187
        /* @var string|false line */
188 39
        self::$line = $remainder;
189 39
        if (false === self::$line) {
190 27
            return rtrim($content, "\r\n");
191
        }
192
193 39
        return $content;
194
    }
195
196
    /**
197
     * Extracts the content from a field with enclosure.
198
     *
199
     * - Field content can spread on multiple document lines.
200
     * - Content between consecutive enclosure characters must be preserved.
201
     * - Double enclosure sequence must be replaced by single enclosure character.
202
     * - Trailing line break must be removed if they are not part of the field content.
203
     * - Invalid field content is treated as per fgetcsv behavior.
204
     *
205
     * @return string|null
206
     */
207 30
    private static function extractEnclosedFieldContent()
208
    {
209 30
        if (false !== self::$line && self::$line[0] === self::$enclosure) {
210 30
            self::$line = substr(self::$line, 1);
211
        }
212
213 30
        $content = '';
214 30
        while (false !== self::$line) {
215
            /** @var array $result */
216 30
            $result = explode(self::$enclosure, self::$line, 2);
217 30
            [$buffer, $remainder] = $result + [1 => false];
0 ignored issues
show
Bug introduced by
The variable $buffer does not exist. Did you forget to declare it?

This check marks access to variables or properties that have not been declared yet. While PHP has no explicit notion of declaring a variable, accessing it before a value is assigned to it is most likely a bug.

Loading history...
Bug introduced by
The variable $remainder does not exist. Did you forget to declare it?

This check marks access to variables or properties that have not been declared yet. While PHP has no explicit notion of declaring a variable, accessing it before a value is assigned to it is most likely a bug.

Loading history...
218 30
            $content .= $buffer;
219 30
            self::$line = $remainder;
220 30
            if (false !== self::$line) {
221 21
                break;
222
            }
223
224 15
            if (self::$document->valid()) {
225 9
                self::$line = self::$document->fgets();
226 9
                continue;
227
            }
228
229 9
            if ($buffer === rtrim($content, "\r\n")) {
230 3
                return null;
231
            }
232
        }
233
234 27
        if (in_array(self::$line, self::FIELD_BREAKS, true)) {
235 15
            self::$line = false;
236 15
            if (!self::$document->valid()) {
237 12
                return $content;
238
            }
239
240 9
            return rtrim($content, "\r\n");
241
        }
242
243 21
        $char = self::$line[0] ?? '';
244 21
        if ($char === self::$delimiter) {
245 15
            self::$line = substr(self::$line, 1);
246
247 15
            return $content;
248
        }
249
250 18
        if ($char === self::$enclosure) {
251 9
            return $content.self::$enclosure.self::extractEnclosedFieldContent();
252
        }
253
254 9
        return $content.self::extractFieldContent();
255
    }
256
}
257