Complex classes like Taster often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use Taster, and based on these observations, apply Extract Interface, too.
1 | <?php |
||
41 | class Taster |
||
42 | { |
||
43 | /** |
||
44 | * End-of-line constants |
||
45 | */ |
||
46 | const EOL_UNIX = 'lf'; |
||
47 | const EOL_TRS80 = 'cr'; |
||
48 | const EOL_WINDOWS = 'crlf'; |
||
49 | |||
50 | /** |
||
51 | * ASCII character codes for "invisibles" |
||
52 | */ |
||
53 | const HORIZONTAL_TAB = 9; |
||
54 | const LINE_FEED = 10; |
||
55 | const CARRIAGE_RETURN = 13; |
||
56 | const SPACE = 32; |
||
57 | |||
58 | /** |
||
59 | * Data types -- Used within the lickQuotingStyle method |
||
60 | */ |
||
61 | const DATA_NONNUMERIC = 'nonnumeric'; |
||
62 | const DATA_SPECIAL = 'special'; |
||
63 | const DATA_UNKNOWN = 'unknown'; |
||
64 | |||
65 | /** |
||
66 | * Placeholder strings -- hold the place of newlines and delimiters contained |
||
67 | * within quoted text so that the explode method doesn't split incorrectly |
||
68 | */ |
||
69 | const PLACEHOLDER_NEWLINE = '[__NEWLINE__]'; |
||
70 | const PLACEHOLDER_DELIM = '[__DELIM__]'; |
||
71 | |||
72 | /** |
||
73 | * Recommended data sample size |
||
74 | */ |
||
75 | const SAMPLE_SIZE = 2500; |
||
76 | |||
77 | /** |
||
78 | * Column data types -- used within the lickHeader method to determine |
||
79 | * whether the first row contains different types of data than the rest of |
||
80 | * the rows (and thus, is likely a header row) |
||
81 | */ |
||
82 | // +-987 |
||
83 | const TYPE_NUMBER = 'number'; |
||
84 | // +-12.387 |
||
85 | const TYPE_DOUBLE = 'double'; |
||
86 | // I am a string. I can contain all kinds of stuff. |
||
87 | const TYPE_STRING = 'string'; |
||
88 | // 10-Jul-15, 9/1/2007, April 1st, 2006, etc. |
||
1 ignored issue
–
show
|
|||
89 | const TYPE_DATE = 'date'; |
||
90 | // 10:00pm, 5pm, 13:08, etc. |
||
1 ignored issue
–
show
|
|||
91 | const TYPE_TIME = 'time'; |
||
92 | // $98.96, ¥12389, £6.08, €87.00 |
||
1 ignored issue
–
show
|
|||
93 | const TYPE_CURRENCY = 'currency'; |
||
94 | // 12ab44m1n2_asdf |
||
95 | const TYPE_ALNUM = 'alnum'; |
||
96 | // abababab |
||
97 | const TYPE_ALPHA = 'alpha'; |
||
98 | |||
99 | /** |
||
100 | * @var \CSVelte\Contract\Readable The source of data to examine |
||
101 | * @access protected |
||
102 | */ |
||
103 | protected $input; |
||
104 | |||
105 | /** |
||
106 | * Sample of CSV data to use for tasting (determining CSV flavor) |
||
107 | * @var string |
||
108 | */ |
||
109 | protected $sample; |
||
110 | |||
111 | /** |
||
112 | * Class constructor--accepts a CSV input source |
||
113 | * |
||
114 | * @param \CSVelte\Contract\Readable The source of CSV data |
||
115 | * @todo It may be a good idea to skip the first line or two for the sample |
||
116 | * so that the header line(s) don't throw things off (with the exception |
||
117 | * of lickHeader() obviously) |
||
118 | */ |
||
119 | 22 | public function __construct(Readable $input) |
|
120 | { |
||
121 | 22 | $this->input = $input; |
|
122 | 22 | if (!$this->sample = $input->read(self::SAMPLE_SIZE)) { |
|
123 | 1 | throw new TasterException("Invalid input, cannot read sample.", TasterException::ERR_INVALID_SAMPLE); |
|
124 | } |
||
125 | 21 | } |
|
126 | |||
127 | /** |
||
128 | * Examine the input source and determine what "Flavor" of CSV it contains. |
||
129 | * The CSV format, while having an RFC (https://tools.ietf.org/html/rfc4180), |
||
130 | * doesn't necessarily always conform to it. And it doesn't provide meta such as the delimiting character, quote character, or what types of data are quoted. |
||
131 | * such as the delimiting character, quote character, or what types of data are quoted. |
||
132 | * are quoted. |
||
133 | * |
||
134 | * @return \CSVelte\Flavor The metadata that the CSV format doesn't provide |
||
135 | * @access public |
||
136 | * @todo Implement a lickQuote method for when lickQuoteAndDelim method fails |
||
137 | * @todo Should there bea lickEscapeChar method? the python module that inspired |
||
138 | * this library doesn't include one... |
||
139 | * @todo This should cache the results and only regenerate if $this->sample |
||
140 | * changes (or $this->input) |
||
141 | */ |
||
142 | 13 | public function lick() |
|
143 | { |
||
144 | 13 | $lineTerminator = $this->lickLineEndings(); |
|
145 | try { |
||
146 | 13 | list($quoteChar, $delimiter) = $this->lickQuoteAndDelim(); |
|
147 | 13 | } catch (TasterException $e) { |
|
148 | 5 | if ($e->getCode() !== TasterException::ERR_QUOTE_AND_DELIM) throw $e; |
|
149 | 5 | $quoteChar = '"'; |
|
150 | 5 | $delimiter = $this->lickDelimiter($lineTerminator); |
|
151 | } |
||
152 | /** |
||
153 | * @todo Should this be null? Because doubleQuote = true means this = null |
||
154 | */ |
||
155 | 13 | $escapeChar = '\\'; |
|
156 | 13 | $quoteStyle = $this->lickQuotingStyle($delimiter, $lineTerminator); |
|
157 | 13 | $header = $this->lickHeader($delimiter, $lineTerminator); |
|
158 | 13 | return new Flavor(compact('quoteChar', 'escapeChar', 'delimiter', 'lineTerminator', 'quoteStyle', 'header')); |
|
159 | } |
||
160 | |||
161 | /** |
||
162 | * Replaces all quoted columns with a blank string. I was using this method |
||
163 | * to prevent explode() from incorrectly splitting at delimiters and newlines |
||
164 | * within quotes when parsing a file. But this was before I wrote the |
||
165 | * replaceQuotedSpecialChars method which (at least to me) makes more sense. |
||
166 | * |
||
167 | * @param string The string to replace quoted strings within |
||
168 | * @return string The input string with quoted strings removed |
||
169 | * @access protected |
||
170 | * @todo Replace code that uses this method with the replaceQuotedSpecialChars |
||
171 | * method instead. I think it's cleaner. |
||
172 | */ |
||
173 | 13 | protected function removeQuotedStrings($data) |
|
174 | { |
||
175 | 13 | return preg_replace($pattern = '/(["\'])(?:(?=(\\\\?))\2.)*?\1/sm', $replace = '', $data); |
|
176 | } |
||
177 | |||
178 | /** |
||
179 | * Examine the input source to determine which character(s) are being used |
||
180 | * as the end-of-line character |
||
181 | * |
||
182 | * @return string The end-of-line char for the input data |
||
183 | * @access protected |
||
184 | * @credit pulled from stackoverflow thread *tips hat to username "Harm"* |
||
185 | * @todo This should throw an exception if it cannot determine the line ending |
||
186 | * @todo I probably will make this method protected when I'm done with testing... |
||
187 | * @todo If there is any way for this method to fail (for instance if a file ) |
||
188 | * is totally empty or contains no line breaks), then it needs to throw |
||
189 | * a relevant TasterException |
||
190 | * @todo Use replaceQuotedSpecialChars rather than removeQuotedStrings() |
||
191 | */ |
||
192 | 13 | protected function lickLineEndings() |
|
193 | { |
||
194 | 13 | $str = $this->removeQuotedStrings($this->sample); |
|
195 | $eols = [ |
||
196 | 13 | self::EOL_WINDOWS => "\r\n", // 0x0D - 0x0A - Windows, DOS OS/2 |
|
197 | 13 | self::EOL_UNIX => "\n", // 0x0A - - Unix, OSX |
|
198 | 13 | self::EOL_TRS80 => "\r", // 0x0D - - Apple ][, TRS80 |
|
199 | 13 | ]; |
|
200 | |||
201 | 13 | $curCount = 0; |
|
202 | // @todo This should return a default maybe? |
||
203 | 13 | $curEol = PHP_EOL; |
|
204 | 13 | foreach($eols as $k => $eol) { |
|
205 | 13 | if( ($count = substr_count($str, $eol)) > $curCount) { |
|
206 | 13 | $curCount = $count; |
|
207 | 13 | $curEol = $eol; |
|
208 | 13 | } |
|
209 | 13 | } |
|
210 | 13 | return $curEol; |
|
211 | } |
||
212 | |||
213 | /** |
||
214 | * The best way to determine quote and delimiter characters is when columns |
||
215 | * are quoted, often you can seek out a pattern of delim, quote, stuff, quote, delim |
||
216 | * but this only works if you have quoted columns. If you don't you have to |
||
217 | * determine these characters some other way... (see lickDelimiter) |
||
218 | * |
||
219 | * @return array A two-row array containing quotechar, delimchar |
||
220 | * @access protected |
||
221 | * @todo make protected |
||
222 | * @todo This should throw an exception if it cannot determine the delimiter |
||
223 | * this way. |
||
224 | * @todo This should check for any line endings not just \n |
||
225 | */ |
||
226 | 13 | protected function lickQuoteAndDelim() |
|
227 | { |
||
228 | /** |
||
229 | * @var array An array of pattern matches |
||
230 | */ |
||
231 | 13 | $matches = null; |
|
232 | /** |
||
233 | * @var array An array of patterns (regex) |
||
234 | */ |
||
235 | 13 | $patterns = []; |
|
236 | // delim can be anything but line breaks, quotes, alphanumeric, underscore, backslash, or any type of spaces |
||
237 | 13 | $antidelims = implode(array("\r", "\n", "\w", preg_quote('"', '/'), preg_quote("'", '/')/*, preg_quote('\\', '/')*/, preg_quote(chr(self::SPACE), '/'))); |
|
238 | 13 | $delim = '(?P<delim>[^' . $antidelims . '])'; |
|
239 | 13 | $quote = '(?P<quoteChar>"|\'|`)'; // @todo I think MS Excel uses some strange encoding for fancy open/close quotes |
|
240 | 13 | $patterns[] = '/' . $delim . ' ?' . $quote . '.*?\2\1/ms'; // ,"something", - anything but whitespace or quotes followed by a possible space followed by a quote followed by anything followed by same quote, followed by same anything but whitespace |
|
241 | 13 | $patterns[] = '/(?:^|\n)' . $quote . '.*?\1' . $delim . ' ?/ms'; // 'something', - beginning of line or line break, followed by quote followed by anything followed by quote followed by anything but whitespace or quotes |
|
242 | 13 | $patterns[] = '/' . $delim . ' ?' . $quote . '.*?\2(?:^|\n)/ms'; // ,'something' - anything but whitespace or quote followed by possible space followed by quote followed by anything followed by quote, followed by end of line |
|
243 | 13 | $patterns[] = '/(?:^|\n)' . $quote . '.*?\2(?:$|\n)/ms'; // 'something' - beginning of line followed by quote followed by anything followed by quote followed by same quote followed by end of line |
|
244 | 13 | foreach ($patterns as $pattern) { |
|
245 | // @todo I had to add the error suppression char here because it was |
||
246 | // causing undefined offset errors with certain data sets. strange... |
||
247 | 13 | if (@preg_match_all($pattern, $this->sample, $matches) && $matches) break; |
|
248 | 13 | } |
|
249 | 13 | if ($matches) { |
|
250 | 13 | $quotes = array_count_values($matches['quoteChar']); |
|
251 | 13 | arsort($quotes); |
|
252 | 13 | $quotes = array_flip($quotes); |
|
253 | 13 | if ($theQuote = array_shift($quotes)) { |
|
254 | 10 | $delims = array_count_values($matches['delim']); |
|
255 | 10 | arsort($delims); |
|
256 | 10 | $delims = array_flip($delims); |
|
257 | 10 | $theDelim = array_shift($delims); |
|
258 | 10 | return array($theQuote, $theDelim); |
|
259 | } |
||
260 | 5 | } |
|
261 | 5 | throw new TasterException("quoteChar and delimiter cannot be determined", TasterException::ERR_QUOTE_AND_DELIM); |
|
262 | } |
||
263 | |||
264 | /** |
||
265 | * Take a list of likely delimiter characters and find the one that occurs |
||
266 | * the most consistent amount of times within the provided data. |
||
267 | * |
||
268 | * @param string The character(s) used for newlines |
||
269 | * @return string One of four Flavor::QUOTING_* constants |
||
270 | * @see \CSVelte\Flavor for possible quote style constants |
||
271 | * @access protected |
||
272 | * @todo Refactor this method--It needs more thorough testing against a wider |
||
273 | * variety of CSV data to be sure it works reliably. And I'm sure there |
||
274 | * are many performance and logic improvements that could be made. This |
||
275 | * is essentially a first draft. |
||
276 | * @todo Use replaceQuotedSpecialChars rather than removeQuotedStrings |
||
277 | */ |
||
278 | 5 | protected function lickDelimiter($eol = "\n") |
|
279 | { |
||
280 | 5 | $delimiters = array(",", "\t", "|", ":", ";", "/", '\\'); |
|
281 | 5 | $lines = explode($eol, $this->removeQuotedStrings($this->sample)); |
|
282 | 5 | $start = 0; |
|
283 | 5 | $charFrequency = array(); |
|
284 | 5 | while ($start < count($lines)) { |
|
285 | 5 | foreach ($lines as $key => $line) { |
|
286 | 5 | if (!trim($line)) continue; |
|
287 | 5 | foreach ($delimiters as $char) { |
|
288 | 5 | $freq = substr_count($line, $char); |
|
289 | 5 | $charFrequency[$char][$key] = $freq; |
|
290 | 5 | } |
|
291 | 5 | } |
|
292 | 5 | $start++; |
|
293 | 5 | } |
|
294 | 5 | $averages = Utils::array_average($charFrequency); |
|
295 | 5 | $modes = Utils::array_mode($charFrequency); |
|
296 | 5 | $consistencies = array(); |
|
297 | 5 | foreach ($averages as $achar => $avg) { |
|
298 | 5 | foreach ($modes as $mchar => $mode) { |
|
299 | 5 | if ($achar == $mchar) { |
|
300 | 5 | if ($mode) { |
|
301 | 5 | $consistencies[$achar] = $avg / $mode; |
|
302 | 5 | } else { |
|
303 | 5 | $consistencies[$achar] = 0; |
|
304 | } |
||
305 | 5 | break; |
|
306 | } |
||
307 | 5 | } |
|
308 | 5 | } |
|
309 | 5 | if (empty($consistencies)) { |
|
310 | throw new TasterException('Cannot determine delimiter character', TasterException::ERR_DELIMITER); |
||
311 | } |
||
312 | 5 | arsort($consistencies); |
|
313 | 5 | return key($consistencies); |
|
314 | } |
||
315 | |||
316 | /** |
||
317 | * Determine the "style" of data quoting. The CSV format, while having an RFC |
||
318 | * (https://tools.ietf.org/html/rfc4180), doesn't necessarily always conform |
||
319 | * to it. And it doesn't provide metadata such as the delimiting character, |
||
320 | * quote character, or what types of data are quoted. So this method makes a |
||
321 | * logical guess by finding which columns have been quoted (if any) and |
||
322 | * examining their data type. Most often, CSV files will only use quotes |
||
323 | * around columns that contain special characters such as the dilimiter, |
||
324 | * the quoting character, newlines, etc. (we refer to this style as ) |
||
325 | * QUOTE_MINIMAL), but some quote all columns that contain nonnumeric data |
||
326 | * (QUOTE_NONNUMERIC). Then there are CSV files that quote all columns |
||
327 | * (QUOTE_ALL) and those that quote none (QUOTE_NONE). |
||
328 | * |
||
329 | * @param string $delim The character used as the column delimiter |
||
330 | * @param string $eol The character used for newlines |
||
331 | * @return string One of four "QUOTING_" constants defined above--see this |
||
332 | * method's description for more info. |
||
333 | * @access protected |
||
334 | * @todo Refactor this method--It needs more thorough testing against a wider |
||
335 | * variety of CSV data to be sure it works reliably. And I'm sure there |
||
336 | * are many performance and logic improvements that could be made. This |
||
337 | * is essentially a first draft. |
||
338 | */ |
||
339 | 13 | protected function lickQuotingStyle($delim, $eol) |
|
340 | { |
||
341 | 13 | $data = $this->replaceQuotedSpecialChars($this->sample, $delim); |
|
342 | |||
343 | $quoting_styles = array( |
||
344 | 13 | Flavor::QUOTE_ALL => 0, |
|
345 | 13 | Flavor::QUOTE_NONE => 0, |
|
346 | 13 | Flavor::QUOTE_MINIMAL => 0, |
|
347 | 13 | Flavor::QUOTE_NONNUMERIC => 0, |
|
348 | 13 | ); |
|
349 | |||
350 | 13 | $lines = explode($eol, $data); |
|
351 | $freq = array( |
||
352 | 13 | 'quoted' => array(), |
|
353 | 13 | 'unquoted' => array() |
|
354 | 13 | ); |
|
355 | |||
356 | 13 | foreach ($lines as $key => $line) { |
|
357 | // now we can sub back in the correct newlines |
||
358 | 13 | $line = str_replace(self::PLACEHOLDER_NEWLINE, $eol, $line); |
|
359 | 13 | $cols = explode($delim, $line); |
|
360 | 13 | foreach ($cols as $colkey => $col) { |
|
361 | // now we can sub back in the correct delim characters |
||
362 | 13 | $col = str_replace(self::PLACEHOLDER_DELIM, $delim, $col); |
|
363 | 13 | if ($this->isQuoted($col)) { |
|
364 | 10 | $col = $this->unQuote($col); |
|
365 | 10 | $type = $this->lickDataType($col); |
|
366 | // we can remove this guy all together since at lease one column is quoted |
||
367 | 10 | unset($quoting_styles[Flavor::QUOTE_NONE]); |
|
368 | 10 | $freq['quoted'][] = $type; |
|
369 | 10 | } else { |
|
370 | 13 | $type = $this->lickDataType($col); |
|
371 | // we can remove this guy all together since at lease one column is unquoted |
||
372 | 13 | unset($quoting_styles[Flavor::QUOTE_ALL]); |
|
373 | 13 | $freq['unquoted'][] = $type; |
|
374 | } |
||
375 | 13 | } |
|
376 | 13 | } |
|
377 | 13 | $types = array_unique($freq['quoted']); |
|
378 | // if quoting_styles still has QUOTE_ALL or QUOTE_NONE, then that's the one to return |
||
379 | 13 | if (array_key_exists(Flavor::QUOTE_ALL, $quoting_styles)) return Flavor::QUOTE_ALL; |
|
380 | 13 | if (array_key_exists(Flavor::QUOTE_NONE, $quoting_styles)) return Flavor::QUOTE_NONE; |
|
381 | 10 | if (count($types) == 1) { |
|
382 | 10 | if (current($types) == self::DATA_SPECIAL) return Flavor::QUOTE_MINIMAL; |
|
383 | elseif (current($types) == self::DATA_NONNUMERIC) return Flavor::QUOTE_NONNUMERIC; |
||
384 | } else { |
||
385 | 1 | if (array_key_exists(self::DATA_NONNUMERIC, array_flip($types))) { |
|
386 | // allow for a SMALL amount of error here |
||
387 | 1 | $counts = array(self::DATA_SPECIAL => 0, self::DATA_NONNUMERIC => 0); |
|
388 | array_walk($freq['quoted'], function ($val) use (&$counts) { |
||
389 | 1 | $counts[$val]++; |
|
390 | 1 | }); |
|
391 | 1 | arsort($counts); |
|
392 | 1 | $most = current($counts); |
|
393 | 1 | $least = end($counts); |
|
394 | 1 | $err_margin = $least / $most; |
|
395 | 1 | if ($err_margin < 1) return Flavor::QUOTE_NONNUMERIC; |
|
396 | } |
||
397 | } |
||
398 | return Flavor::QUOTE_MINIMAL; |
||
399 | } |
||
400 | |||
401 | /** |
||
402 | * Remove quotes around a piece of text (if there are any) |
||
403 | * |
||
404 | * @param string The data to "unquote" |
||
405 | * @return string The data passed in, only with quotes stripped (off the edges) |
||
406 | * @access protected |
||
407 | */ |
||
408 | 17 | protected function unQuote($data) |
|
412 | |||
413 | /** |
||
414 | * Determine whether a particular string of data has quotes around it. |
||
415 | * |
||
416 | * @param string The data to check |
||
417 | * @return boolean Whether the data is quoted or not |
||
418 | * @access protected |
||
419 | */ |
||
420 | 13 | protected function isQuoted($data) |
|
424 | |||
425 | /** |
||
426 | * Determine what type of data is contained within a variable |
||
427 | * Possible types: |
||
428 | * - nonnumeric - only numbers |
||
429 | * - special - contains characters that could potentially need to be quoted (possible delimiter characters) |
||
430 | * - unknown - everything else |
||
431 | * This method is really only used within the "lickQuotingStyle" method to |
||
432 | * help determine whether a particular column has been quoted due to it being |
||
433 | * nonnumeric or because it has some special character in it such as a delimiter |
||
434 | * or newline or quote. |
||
435 | * |
||
436 | * @param string The data to determine the type of |
||
437 | * @return string The type of data (one of the "DATA_" constants above) |
||
438 | * @access protected |
||
439 | * @todo I could probably eliminate this method and use an anonymous function |
||
440 | * instead. It isn't used anywhere else and its name could be misleading. |
||
441 | * Especially since I also have a lickType method that is used within the |
||
442 | * lickHeader method. |
||
443 | */ |
||
444 | 13 | protected function lickDataType($data) |
|
455 | |||
456 | /** |
||
457 | * Replace all instances of newlines and whatever character you specify (as |
||
458 | * the delimiter) that are contained within quoted text. The replacements are |
||
459 | * simply a special placeholder string. This is done so that I can use the |
||
460 | * very unsmart "explode" function and not have to worry about it exploding |
||
461 | * on delimiters or newlines within quotes. Once I have exploded, I typically |
||
462 | * sub back in the real characters before doing anything else. Although |
||
463 | * currently there is no dedicated method for doing so I just use str_replace |
||
464 | * |
||
465 | * @param string The string to do the replacements on |
||
466 | * @param string The delimiter character to replace |
||
467 | * @return string The data with replacements performed |
||
468 | * @access protected |
||
469 | * @todo I could probably pass in (maybe optionally) the newline character I |
||
470 | * want to replace as well. I'll do that if I need to. |
||
471 | */ |
||
472 | protected function replaceQuotedSpecialChars($data, $delim) |
||
480 | |||
481 | /** |
||
482 | * Determine the "type" of a particular string of data. Used for the lickHeader |
||
483 | * method to assign a type to each column to try to determine whether the |
||
484 | * first for is different than a consistent column type. |
||
485 | * |
||
486 | * @todo As I'm writing this method I'm beginning ot realize how expensive |
||
487 | * the lickHeader method is going to end up being since it has to apply all |
||
488 | * these regexes (potentially) to every column. I may end up writing a much |
||
489 | * simpler type-checking method than this if it proves to be too expensive |
||
490 | * to be practical. |
||
491 | * |
||
492 | * @param string The string of data to check the type of |
||
493 | * @return string One of the TYPE_ string constants above |
||
494 | * @access protected |
||
495 | * @uses \Carbon\Carbon date/time ilbrary/class |
||
496 | */ |
||
497 | 17 | protected function lickType($data) |
|
536 | |||
537 | /** |
||
538 | * Examines the contents of the CSV data to make a determination of whether |
||
539 | * or not it contains a header row. To make this determination, it creates |
||
540 | * an array of each column's (in each row)'s data type and length and then |
||
541 | * compares them. If all of the rows except the header look similar, it will |
||
542 | * return true. This is only a guess though. There is no programmatic way to |
||
543 | * determine 100% whether a CSV file has a header. The format does not |
||
544 | * provide metadata such as that. |
||
545 | * |
||
546 | * @param string $delim The CSV data's delimiting char (can be a variety of chars but) |
||
547 | * typically $eol is either a comma or a tab, sometimes a pipe) |
||
548 | * @param string The CSV data's end-of-line char(s) (\n \r or \r\n) |
||
549 | * @return boolean True if the data (most likely) contains a header row |
||
550 | * @access public |
||
551 | * @todo This method needs a total refactor. It's not necessary to loop twice |
||
552 | * You could get away with one loop and that would allow for me to do |
||
553 | * something like only examining enough rows to get to a particular |
||
554 | * "hasHeader" score (+-100 for instance) & then just return true|false |
||
555 | * @todo Also, break out of the first loop after a certain (perhaps even a |
||
556 | * configurable) amount of lines (you only need to examine so much data ) |
||
557 | * to reliably make a determination and this is an expensive method) |
||
558 | * @todo Because the header isn't actually part of the "flavor", |
||
559 | * I could remove the need for quote, delim, and eol by "licking" the |
||
560 | * data sample provided in the first argument. Also, I could actually |
||
561 | * create a Reader object to read the data here. |
||
562 | */ |
||
563 | 18 | public function lickHeader($delim, $eol) |
|
600 | } |
||
601 |
Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.
The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.
This check looks for comments that seem to be mostly valid code and reports them.