Soundex::daitchMokotoffWord()   F
last analyzed

Complexity

Conditions 24
Paths 1344

Size

Total Lines 111
Code Lines 62

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 24
eloc 62
nc 1344
nop 1
dl 0
loc 111
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
/**
3
 * webtrees: online genealogy
4
 * Copyright (C) 2019 webtrees development team
5
 * This program is free software: you can redistribute it and/or modify
6
 * it under the terms of the GNU General Public License as published by
7
 * the Free Software Foundation, either version 3 of the License, or
8
 * (at your option) any later version.
9
 * This program is distributed in the hope that it will be useful,
10
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
11
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12
 * GNU General Public License for more details.
13
 * You should have received a copy of the GNU General Public License
14
 * along with this program. If not, see <http://www.gnu.org/licenses/>.
15
 */
16
namespace Fisharebest\Webtrees;
17
18
/**
19
 * Phonetic matching of strings.
20
 */
21
class Soundex
22
{
23
    /**
24
     * Which algorithms are supported.
25
     *
26
     * @return string[]
27
     */
28
    public static function getAlgorithms()
29
    {
30
        return array(
31
            'std' => /* I18N: http://en.wikipedia.org/wiki/Soundex */ I18N::translate('Russell'),
32
            'dm'  => /* I18N: http://en.wikipedia.org/wiki/Daitch–Mokotoff_Soundex */ I18N::translate('Daitch-Mokotoff'),
33
        );
34
    }
35
36
    /**
37
     * Is there a match between two soundex codes?
38
     *
39
     * @param string $soundex1
40
     * @param string $soundex2
41
     *
42
     * @return bool
43
     */
44
    public static function compare($soundex1, $soundex2)
45
    {
46
        if ($soundex1 && $soundex2) {
47
            foreach (explode(':', $soundex1) as $code) {
48
                if (strpos($soundex2, $code) !== false) {
49
                    return true;
50
                }
51
            }
52
        }
53
54
        return false;
55
    }
56
57
    /**
58
     * Generate Russell soundex codes for a given text.
59
     *
60
     * @param $text
61
     *
62
     * @return null|string
63
     */
64
    public static function russell($text)
65
    {
66
        $words         = preg_split('/\s/', $text, -1, PREG_SPLIT_NO_EMPTY);
67
        $soundex_array = array();
68
        foreach ($words as $word) {
69
            $soundex = soundex($word);
70
            // Only return codes from recognisable sounds
71
            if ($soundex !== '0000') {
72
                $soundex_array[] = $soundex;
73
            }
74
        }
75
        // Combine words, e.g. “New York” as “Newyork”
76
        if (count($words) > 1) {
77
            $soundex_array[] = soundex(strtr($text, ' ', ''));
78
        }
79
        // A varchar(255) column can only hold 51 4-character codes (plus 50 delimiters)
80
        $soundex_array = array_slice(array_unique($soundex_array), 0, 51);
81
82
        if ($soundex_array) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $soundex_array of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
83
            return implode(':', $soundex_array);
84
        } else {
85
            return '';
86
        }
87
    }
88
89
    /**
90
     * Generate Daitch–Mokotoff soundex codes for a given text.
91
     *
92
     * @param $text
93
     *
94
     * @return null|string
95
     */
96
    public static function daitchMokotoff($text)
97
    {
98
        $words         = preg_split('/\s/', $text, -1, PREG_SPLIT_NO_EMPTY);
99
        $soundex_array = array();
100
        foreach ($words as $word) {
101
            $soundex_array = array_merge($soundex_array, self::daitchMokotoffWord($word));
102
        }
103
        // Combine words, e.g. “New York” as “Newyork”
104
        if (count($words) > 1) {
105
            $soundex_array = array_merge($soundex_array, self::daitchMokotoffWord(strtr($text, ' ', '')));
106
        }
107
        // A varchar(255) column can only hold 36 6-character codes (plus 35 delimiters)
108
        $soundex_array = array_slice(array_unique($soundex_array), 0, 36);
109
110
        if ($soundex_array) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $soundex_array of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
111
            return implode(':', $soundex_array);
112
        } else {
113
            return '';
114
        }
115
    }
116
117
    // Determine the Daitch–Mokotoff Soundex code for a word
118
    // Original implementation by Gerry Kroll, and analysis by Meliza Amity
119
120
    // Max. table key length (in ASCII bytes -- NOT in UTF-8 characters!)
121
    const MAXCHAR = 7;
122
123
    /**
124
     * Name transformation arrays.
125
     * Used to transform the Name string to simplify the "sounds like" table.
126
     * This is especially useful in Hebrew.
127
     *
128
     * Each array entry defines the "from" and "to" arguments of an preg($from, $to, $text)
129
     * function call to achieve the desired transformations.
130
     *
131
     * Note about the use of "\x01":
132
     * This code, which can’t legitimately occur in the kind of text we're dealing with,
133
     * is used as a place-holder so that conditional string replacements can be done.
134
     *
135
     * @var string[][]
136
     */
137
    private static $transformNameTable = array(
138
        // Force Yiddish ligatures to be treated as separate letters
139
        array('װ', 'וו'),
140
        array('ײ', 'יי'),
141
        array('ױ', 'וי'),
142
        array('בו', 'בע'),
143
        array('פו', 'פע'),
144
        array('ומ', 'עמ'),
145
        array('ום', 'עם'),
146
        array('ונ', 'ענ'),
147
        array('ון', 'ען'),
148
        array('וו', 'ב'),
149
        array("\x01", ''),
150
        array('ייה$', "\x01ה"),
151
        array('ייע$', "\x01ע"),
152
        array('יי', 'ע'),
153
        array("\x01", 'יי'),
154
    );
155
156
    /**
157
     * The DM sound coding table is organized this way:
158
     * key: a variable-length string that corresponds to the UTF-8 character sequence
159
     * represented by the table entry. Currently, that string can be up to 7
160
     * bytes long. This maximum length is defined by the value of global variable
161
     * $maxchar.
162
     *
163
     * value: an array as follows:
164
     * [0]:  zero if not a vowel
165
     * [1]:  sound value when this string is at the beginning of the word
166
     * [2]:  sound value when this string is followed by a vowel
167
     * [3]:  sound value for other cases
168
     * [1],[2],[3] can be repeated several times to create branches in the code
169
     * an empty sound value means "ignore in this state"
170
     *
171
     * @var string[][]
172
     */
173
    private static $dmsounds = array(
174
        'A'       => array('1', '0', '', ''),
175
        'À'       => array('1', '0', '', ''),
176
        'Á'       => array('1', '0', '', ''),
177
        'Â'       => array('1', '0', '', ''),
178
        'Ã'       => array('1', '0', '', ''),
179
        'Ä'       => array('1', '0', '1', '', '0', '', ''),
180
        'Å'       => array('1', '0', '', ''),
181
        'Ă'       => array('1', '0', '', ''),
182
        'Ą'       => array('1', '', '', '', '', '', '6'),
183
        'Ạ'       => array('1', '0', '', ''),
184
        'Ả'       => array('1', '0', '', ''),
185
        'Ấ'       => array('1', '0', '', ''),
186
        'Ầ'       => array('1', '0', '', ''),
187
        'Ẩ'       => array('1', '0', '', ''),
188
        'Ẫ'       => array('1', '0', '', ''),
189
        'Ậ'       => array('1', '0', '', ''),
190
        'Ắ'       => array('1', '0', '', ''),
191
        'Ằ'       => array('1', '0', '', ''),
192
        'Ẳ'       => array('1', '0', '', ''),
193
        'Ẵ'       => array('1', '0', '', ''),
194
        'Ặ'       => array('1', '0', '', ''),
195
        'AE'      => array('1', '0', '1', ''),
196
        'Æ'       => array('1', '0', '1', ''),
197
        'AI'      => array('1', '0', '1', ''),
198
        'AJ'      => array('1', '0', '1', ''),
199
        'AU'      => array('1', '0', '7', ''),
200
        'AV'      => array('1', '0', '7', '', '7', '7', '7'),
201
        'ÄU'      => array('1', '0', '1', ''),
202
        'AY'      => array('1', '0', '1', ''),
203
        'B'       => array('0', '7', '7', '7'),
204
        'C'       => array('0', '5', '5', '5', '34', '4', '4'),
205
        'Ć'       => array('0', '4', '4', '4'),
206
        'Č'       => array('0', '4', '4', '4'),
207
        'Ç'       => array('0', '4', '4', '4'),
208
        'CH'      => array('0', '5', '5', '5', '34', '4', '4'),
209
        'CHS'     => array('0', '5', '54', '54'),
210
        'CK'      => array('0', '5', '5', '5', '45', '45', '45'),
211
        'CCS'     => array('0', '4', '4', '4'),
212
        'CS'      => array('0', '4', '4', '4'),
213
        'CSZ'     => array('0', '4', '4', '4'),
214
        'CZ'      => array('0', '4', '4', '4'),
215
        'CZS'     => array('0', '4', '4', '4'),
216
        'D'       => array('0', '3', '3', '3'),
217
        'Ď'       => array('0', '3', '3', '3'),
218
        'Đ'       => array('0', '3', '3', '3'),
219
        'DRS'     => array('0', '4', '4', '4'),
220
        'DRZ'     => array('0', '4', '4', '4'),
221
        'DS'      => array('0', '4', '4', '4'),
222
        'DSH'     => array('0', '4', '4', '4'),
223
        'DSZ'     => array('0', '4', '4', '4'),
224
        'DT'      => array('0', '3', '3', '3'),
225
        'DDZ'     => array('0', '4', '4', '4'),
226
        'DDZS'    => array('0', '4', '4', '4'),
227
        'DZ'      => array('0', '4', '4', '4'),
228
        'DŹ'      => array('0', '4', '4', '4'),
229
        'DŻ'      => array('0', '4', '4', '4'),
230
        'DZH'     => array('0', '4', '4', '4'),
231
        'DZS'     => array('0', '4', '4', '4'),
232
        'E'       => array('1', '0', '', ''),
233
        'È'       => array('1', '0', '', ''),
234
        'É'       => array('1', '0', '', ''),
235
        'Ê'       => array('1', '0', '', ''),
236
        'Ë'       => array('1', '0', '', ''),
237
        'Ĕ'       => array('1', '0', '', ''),
238
        'Ė'       => array('1', '0', '', ''),
239
        'Ę'       => array('1', '', '', '6', '', '', ''),
240
        'Ẹ'       => array('1', '0', '', ''),
241
        'Ẻ'       => array('1', '0', '', ''),
242
        'Ẽ'       => array('1', '0', '', ''),
243
        'Ế'       => array('1', '0', '', ''),
244
        'Ề'       => array('1', '0', '', ''),
245
        'Ể'       => array('1', '0', '', ''),
246
        'Ễ'       => array('1', '0', '', ''),
247
        'Ệ'       => array('1', '0', '', ''),
248
        'EAU'     => array('1', '0', '', ''),
249
        'EI'      => array('1', '0', '1', ''),
250
        'EJ'      => array('1', '0', '1', ''),
251
        'EU'      => array('1', '1', '1', ''),
252
        'EY'      => array('1', '0', '1', ''),
253
        'F'       => array('0', '7', '7', '7'),
254
        'FB'      => array('0', '7', '7', '7'),
255
        'G'       => array('0', '5', '5', '5', '34', '4', '4'),
256
        'Ğ'       => array('0', '', '', ''),
257
        'GGY'     => array('0', '5', '5', '5'),
258
        'GY'      => array('0', '5', '5', '5'),
259
        'H'       => array('0', '5', '5', '', '5', '5', '5'),
260
        'I'       => array('1', '0', '', ''),
261
        'Ì'       => array('1', '0', '', ''),
262
        'Í'       => array('1', '0', '', ''),
263
        'Î'       => array('1', '0', '', ''),
264
        'Ï'       => array('1', '0', '', ''),
265
        'Ĩ'       => array('1', '0', '', ''),
266
        'Į'       => array('1', '0', '', ''),
267
        'İ'       => array('1', '0', '', ''),
268
        'Ỉ'       => array('1', '0', '', ''),
269
        'Ị'       => array('1', '0', '', ''),
270
        'IA'      => array('1', '1', '', ''),
271
        'IE'      => array('1', '1', '', ''),
272
        'IO'      => array('1', '1', '', ''),
273
        'IU'      => array('1', '1', '', ''),
274
        'J'       => array('0', '1', '', '', '4', '4', '4', '5', '5', ''),
275
        'K'       => array('0', '5', '5', '5'),
276
        'KH'      => array('0', '5', '5', '5'),
277
        'KS'      => array('0', '5', '54', '54'),
278
        'L'       => array('0', '8', '8', '8'),
279
        'Ľ'       => array('0', '8', '8', '8'),
280
        'Ĺ'       => array('0', '8', '8', '8'),
281
        'Ł'       => array('0', '7', '7', '7', '8', '8', '8'),
282
        'LL'      => array('0', '8', '8', '8', '58', '8', '8', '1', '8', '8'),
283
        'LLY'     => array('0', '8', '8', '8', '1', '8', '8'),
284
        'LY'      => array('0', '8', '8', '8', '1', '8', '8'),
285
        'M'       => array('0', '6', '6', '6'),
286
        'MĔ'      => array('0', '66', '66', '66'),
287
        'MN'      => array('0', '66', '66', '66'),
288
        'N'       => array('0', '6', '6', '6'),
289
        'Ń'       => array('0', '6', '6', '6'),
290
        'Ň'       => array('0', '6', '6', '6'),
291
        'Ñ'       => array('0', '6', '6', '6'),
292
        'NM'      => array('0', '66', '66', '66'),
293
        'O'       => array('1', '0', '', ''),
294
        'Ò'       => array('1', '0', '', ''),
295
        'Ó'       => array('1', '0', '', ''),
296
        'Ô'       => array('1', '0', '', ''),
297
        'Õ'       => array('1', '0', '', ''),
298
        'Ö'       => array('1', '0', '', ''),
299
        'Ø'       => array('1', '0', '', ''),
300
        'Ő'       => array('1', '0', '', ''),
301
        'Œ'       => array('1', '0', '', ''),
302
        'Ơ'       => array('1', '0', '', ''),
303
        'Ọ'       => array('1', '0', '', ''),
304
        'Ỏ'       => array('1', '0', '', ''),
305
        'Ố'       => array('1', '0', '', ''),
306
        'Ồ'       => array('1', '0', '', ''),
307
        'Ổ'       => array('1', '0', '', ''),
308
        'Ỗ'       => array('1', '0', '', ''),
309
        'Ộ'       => array('1', '0', '', ''),
310
        'Ớ'       => array('1', '0', '', ''),
311
        'Ờ'       => array('1', '0', '', ''),
312
        'Ở'       => array('1', '0', '', ''),
313
        'Ỡ'       => array('1', '0', '', ''),
314
        'Ợ'       => array('1', '0', '', ''),
315
        'OE'      => array('1', '0', '', ''),
316
        'OI'      => array('1', '0', '1', ''),
317
        'OJ'      => array('1', '0', '1', ''),
318
        'OU'      => array('1', '0', '', ''),
319
        'OY'      => array('1', '0', '1', ''),
320
        'P'       => array('0', '7', '7', '7'),
321
        'PF'      => array('0', '7', '7', '7'),
322
        'PH'      => array('0', '7', '7', '7'),
323
        'Q'       => array('0', '5', '5', '5'),
324
        'R'       => array('0', '9', '9', '9'),
325
        'Ř'       => array('0', '4', '4', '4'),
326
        'RS'      => array('0', '4', '4', '4', '94', '94', '94'),
327
        'RZ'      => array('0', '4', '4', '4', '94', '94', '94'),
328
        'S'       => array('0', '4', '4', '4'),
329
        'Ś'       => array('0', '4', '4', '4'),
330
        'Š'       => array('0', '4', '4', '4'),
331
        'Ş'       => array('0', '4', '4', '4'),
332
        'SC'      => array('0', '2', '4', '4'),
333
        'ŠČ'      => array('0', '2', '4', '4'),
334
        'SCH'     => array('0', '4', '4', '4'),
335
        'SCHD'    => array('0', '2', '43', '43'),
336
        'SCHT'    => array('0', '2', '43', '43'),
337
        'SCHTCH'  => array('0', '2', '4', '4'),
338
        'SCHTSCH' => array('0', '2', '4', '4'),
339
        'SCHTSH'  => array('0', '2', '4', '4'),
340
        'SD'      => array('0', '2', '43', '43'),
341
        'SH'      => array('0', '4', '4', '4'),
342
        'SHCH'    => array('0', '2', '4', '4'),
343
        'SHD'     => array('0', '2', '43', '43'),
344
        'SHT'     => array('0', '2', '43', '43'),
345
        'SHTCH'   => array('0', '2', '4', '4'),
346
        'SHTSH'   => array('0', '2', '4', '4'),
347
        'ß'       => array('0', '', '4', '4'),
348
        'ST'      => array('0', '2', '43', '43'),
349
        'STCH'    => array('0', '2', '4', '4'),
350
        'STRS'    => array('0', '2', '4', '4'),
351
        'STRZ'    => array('0', '2', '4', '4'),
352
        'STSCH'   => array('0', '2', '4', '4'),
353
        'STSH'    => array('0', '2', '4', '4'),
354
        'SSZ'     => array('0', '4', '4', '4'),
355
        'SZ'      => array('0', '4', '4', '4'),
356
        'SZCS'    => array('0', '2', '4', '4'),
357
        'SZCZ'    => array('0', '2', '4', '4'),
358
        'SZD'     => array('0', '2', '43', '43'),
359
        'SZT'     => array('0', '2', '43', '43'),
360
        'T'       => array('0', '3', '3', '3'),
361
        'Ť'       => array('0', '3', '3', '3'),
362
        'Ţ'       => array('0', '3', '3', '3', '4', '4', '4'),
363
        'TC'      => array('0', '4', '4', '4'),
364
        'TCH'     => array('0', '4', '4', '4'),
365
        'TH'      => array('0', '3', '3', '3'),
366
        'TRS'     => array('0', '4', '4', '4'),
367
        'TRZ'     => array('0', '4', '4', '4'),
368
        'TS'      => array('0', '4', '4', '4'),
369
        'TSCH'    => array('0', '4', '4', '4'),
370
        'TSH'     => array('0', '4', '4', '4'),
371
        'TSZ'     => array('0', '4', '4', '4'),
372
        'TTCH'    => array('0', '4', '4', '4'),
373
        'TTS'     => array('0', '4', '4', '4'),
374
        'TTSCH'   => array('0', '4', '4', '4'),
375
        'TTSZ'    => array('0', '4', '4', '4'),
376
        'TTZ'     => array('0', '4', '4', '4'),
377
        'TZ'      => array('0', '4', '4', '4'),
378
        'TZS'     => array('0', '4', '4', '4'),
379
        'U'       => array('1', '0', '', ''),
380
        'Ù'       => array('1', '0', '', ''),
381
        'Ú'       => array('1', '0', '', ''),
382
        'Û'       => array('1', '0', '', ''),
383
        'Ü'       => array('1', '0', '', ''),
384
        'Ũ'       => array('1', '0', '', ''),
385
        'Ū'       => array('1', '0', '', ''),
386
        'Ů'       => array('1', '0', '', ''),
387
        'Ű'       => array('1', '0', '', ''),
388
        'Ų'       => array('1', '0', '', ''),
389
        'Ư'       => array('1', '0', '', ''),
390
        'Ụ'       => array('1', '0', '', ''),
391
        'Ủ'       => array('1', '0', '', ''),
392
        'Ứ'       => array('1', '0', '', ''),
393
        'Ừ'       => array('1', '0', '', ''),
394
        'Ử'       => array('1', '0', '', ''),
395
        'Ữ'       => array('1', '0', '', ''),
396
        'Ự'       => array('1', '0', '', ''),
397
        'UE'      => array('1', '0', '', ''),
398
        'UI'      => array('1', '0', '1', ''),
399
        'UJ'      => array('1', '0', '1', ''),
400
        'UY'      => array('1', '0', '1', ''),
401
        'UW'      => array('1', '0', '1', '', '0', '7', '7'),
402
        'V'       => array('0', '7', '7', '7'),
403
        'W'       => array('0', '7', '7', '7'),
404
        'X'       => array('0', '5', '54', '54'),
405
        'Y'       => array('1', '1', '', ''),
406
        'Ý'       => array('1', '1', '', ''),
407
        'Ỳ'       => array('1', '1', '', ''),
408
        'Ỵ'       => array('1', '1', '', ''),
409
        'Ỷ'       => array('1', '1', '', ''),
410
        'Ỹ'       => array('1', '1', '', ''),
411
        'Z'       => array('0', '4', '4', '4'),
412
        'Ź'       => array('0', '4', '4', '4'),
413
        'Ż'       => array('0', '4', '4', '4'),
414
        'Ž'       => array('0', '4', '4', '4'),
415
        'ZD'      => array('0', '2', '43', '43'),
416
        'ZDZ'     => array('0', '2', '4', '4'),
417
        'ZDZH'    => array('0', '2', '4', '4'),
418
        'ZH'      => array('0', '4', '4', '4'),
419
        'ZHD'     => array('0', '2', '43', '43'),
420
        'ZHDZH'   => array('0', '2', '4', '4'),
421
        'ZS'      => array('0', '4', '4', '4'),
422
        'ZSCH'    => array('0', '4', '4', '4'),
423
        'ZSH'     => array('0', '4', '4', '4'),
424
        'ZZS'     => array('0', '4', '4', '4'),
425
        // Cyrillic alphabet
426
        'А'   => array('1', '0', '', ''),
427
        'Б'   => array('0', '7', '7', '7'),
428
        'В'   => array('0', '7', '7', '7'),
429
        'Г'   => array('0', '5', '5', '5'),
430
        'Д'   => array('0', '3', '3', '3'),
431
        'ДЗ'  => array('0', '4', '4', '4'),
432
        'Е'   => array('1', '0', '', ''),
433
        'Ё'   => array('1', '0', '', ''),
434
        'Ж'   => array('0', '4', '4', '4'),
435
        'З'   => array('0', '4', '4', '4'),
436
        'И'   => array('1', '0', '', ''),
437
        'Й'   => array('1', '1', '', '', '4', '4', '4'),
438
        'К'   => array('0', '5', '5', '5'),
439
        'Л'   => array('0', '8', '8', '8'),
440
        'М'   => array('0', '6', '6', '6'),
441
        'Н'   => array('0', '6', '6', '6'),
442
        'О'   => array('1', '0', '', ''),
443
        'П'   => array('0', '7', '7', '7'),
444
        'Р'   => array('0', '9', '9', '9'),
445
        'РЖ'  => array('0', '4', '4', '4'),
446
        'С'   => array('0', '4', '4', '4'),
447
        'Т'   => array('0', '3', '3', '3'),
448
        'У'   => array('1', '0', '', ''),
449
        'Ф'   => array('0', '7', '7', '7'),
450
        'Х'   => array('0', '5', '5', '5'),
451
        'Ц'   => array('0', '4', '4', '4'),
452
        'Ч'   => array('0', '4', '4', '4'),
453
        'Ш'   => array('0', '4', '4', '4'),
454
        'Щ'   => array('0', '2', '4', '4'),
455
        'Ъ'   => array('0', '', '', ''),
456
        'Ы'   => array('0', '1', '', ''),
457
        'Ь'   => array('0', '', '', ''),
458
        'Э'   => array('1', '0', '', ''),
459
        'Ю'   => array('0', '1', '', ''),
460
        'Я'   => array('0', '1', '', ''),
461
        // Greek alphabet
462
        'Α'   => array('1', '0', '', ''),
463
        'Ά'   => array('1', '0', '', ''),
464
        'ΑΙ'  => array('1', '0', '1', ''),
465
        'ΑΥ'  => array('1', '0', '1', ''),
466
        'Β'   => array('0', '7', '7', '7'),
467
        'Γ'   => array('0', '5', '5', '5'),
468
        'Δ'   => array('0', '3', '3', '3'),
469
        'Ε'   => array('1', '0', '', ''),
470
        'Έ'   => array('1', '0', '', ''),
471
        'ΕΙ'  => array('1', '0', '1', ''),
472
        'ΕΥ'  => array('1', '1', '1', ''),
473
        'Ζ'   => array('0', '4', '4', '4'),
474
        'Η'   => array('1', '0', '', ''),
475
        'Ή'   => array('1', '0', '', ''),
476
        'Θ'   => array('0', '3', '3', '3'),
477
        'Ι'   => array('1', '0', '', ''),
478
        'Ί'   => array('1', '0', '', ''),
479
        'Ϊ'   => array('1', '0', '', ''),
480
        'ΐ'   => array('1', '0', '', ''),
481
        'Κ'   => array('0', '5', '5', '5'),
482
        'Λ'   => array('0', '8', '8', '8'),
483
        'Μ'   => array('0', '6', '6', '6'),
484
        'ΜΠ'  => array('0', '7', '7', '7'),
485
        'Ν'   => array('0', '6', '6', '6'),
486
        'ΝΤ'  => array('0', '3', '3', '3'),
487
        'Ξ'   => array('0', '5', '54', '54'),
488
        'Ο'   => array('1', '0', '', ''),
489
        'Ό'   => array('1', '0', '', ''),
490
        'ΟΙ'  => array('1', '0', '1', ''),
491
        'ΟΥ'  => array('1', '0', '1', ''),
492
        'Π'   => array('0', '7', '7', '7'),
493
        'Ρ'   => array('0', '9', '9', '9'),
494
        'Σ'   => array('0', '4', '4', '4'),
495
        'ς'   => array('0', '', '', '4'),
496
        'Τ'   => array('0', '3', '3', '3'),
497
        'ΤΖ'  => array('0', '4', '4', '4'),
498
        'ΤΣ'  => array('0', '4', '4', '4'),
499
        'Υ'   => array('1', '1', '', ''),
500
        'Ύ'   => array('1', '1', '', ''),
501
        'Ϋ'   => array('1', '1', '', ''),
502
        'ΰ'   => array('1', '1', '', ''),
503
        'ΥΚ'  => array('1', '5', '5', '5'),
504
        'ΥΥ'  => array('1', '65', '65', '65'),
505
        'Φ'   => array('0', '7', '7', '7'),
506
        'Χ'   => array('0', '5', '5', '5'),
507
        'Ψ'   => array('0', '7', '7', '7'),
508
        'Ω'   => array('1', '0', '', ''),
509
        'Ώ'   => array('1', '0', '', ''),
510
        // Hebrew alphabet
511
        'א'     => array('1', '0', '', ''),
512
        'או'    => array('1', '0', '7', ''),
513
        'אג'    => array('1', '4', '4', '4', '5', '5', '5', '34', '34', '34'),
514
        'בב'    => array('0', '7', '7', '7', '77', '77', '77'),
515
        'ב'     => array('0', '7', '7', '7'),
516
        'גג'    => array('0', '4', '4', '4', '5', '5', '5', '45', '45', '45', '55', '55', '55', '54', '54', '54'),
517
        'גד'    => array('0', '43', '43', '43', '53', '53', '53'),
518
        'גה'    => array('0', '45', '45', '45', '55', '55', '55'),
519
        'גז'    => array('0', '44', '44', '44', '45', '45', '45'),
520
        'גח'    => array('0', '45', '45', '45', '55', '55', '55'),
521
        'גכ'    => array('0', '45', '45', '45', '55', '55', '55'),
522
        'גך'    => array('0', '45', '45', '45', '55', '55', '55'),
523
        'גצ'    => array('0', '44', '44', '44', '45', '45', '45'),
524
        'גץ'    => array('0', '44', '44', '44', '45', '45', '45'),
525
        'גק'    => array('0', '45', '45', '45', '54', '54', '54'),
526
        'גש'    => array('0', '44', '44', '44', '54', '54', '54'),
527
        'גת'    => array('0', '43', '43', '43', '53', '53', '53'),
528
        'ג'     => array('0', '4', '4', '4', '5', '5', '5'),
529
        'דז'    => array('0', '4', '4', '4'),
530
        'דד'    => array('0', '3', '3', '3', '33', '33', '33'),
531
        'דט'    => array('0', '33', '33', '33'),
532
        'דש'    => array('0', '4', '4', '4'),
533
        'דצ'    => array('0', '4', '4', '4'),
534
        'דץ'    => array('0', '4', '4', '4'),
535
        'ד'     => array('0', '3', '3', '3'),
536
        'הג'    => array('0', '54', '54', '54', '55', '55', '55'),
537
        'הכ'    => array('0', '55', '55', '55'),
538
        'הח'    => array('0', '55', '55', '55'),
539
        'הק'    => array('0', '55', '55', '55', '5', '5', '5'),
540
        'הה'    => array('0', '5', '5', '', '55', '55', ''),
541
        'ה'     => array('0', '5', '5', ''),
542
        'וי'    => array('1', '', '', '', '7', '7', '7'),
543
        'ו'     => array('1', '7', '7', '7', '7', '', ''),
544
        'וו'    => array('1', '7', '7', '7', '7', '', ''),
545
        'וופ'   => array('1', '7', '7', '7', '77', '77', '77'),
546
        'זש'    => array('0', '4', '4', '4', '44', '44', '44'),
547
        'זדז'   => array('0', '2', '4', '4'),
548
        'ז'     => array('0', '4', '4', '4'),
549
        'זג'    => array('0', '44', '44', '44', '45', '45', '45'),
550
        'זז'    => array('0', '4', '4', '4', '44', '44', '44'),
551
        'זס'    => array('0', '44', '44', '44'),
552
        'זצ'    => array('0', '44', '44', '44'),
553
        'זץ'    => array('0', '44', '44', '44'),
554
        'חג'    => array('0', '54', '54', '54', '53', '53', '53'),
555
        'חח'    => array('0', '5', '5', '5', '55', '55', '55'),
556
        'חק'    => array('0', '55', '55', '55', '5', '5', '5'),
557
        'חכ'    => array('0', '45', '45', '45', '55', '55', '55'),
558
        'חס'    => array('0', '5', '54', '54'),
559
        'חש'    => array('0', '5', '54', '54'),
560
        'ח'     => array('0', '5', '5', '5'),
561
        'טש'    => array('0', '4', '4', '4'),
562
        'טד'    => array('0', '33', '33', '33'),
563
        'טי'    => array('0', '3', '3', '3', '4', '4', '4', '3', '3', '34'),
564
        'טת'    => array('0', '33', '33', '33'),
565
        'טט'    => array('0', '3', '3', '3', '33', '33', '33'),
566
        'ט'     => array('0', '3', '3', '3'),
567
        'י'     => array('1', '1', '', ''),
568
        'יא'    => array('1', '1', '', '', '1', '1', '1'),
569
        'כג'    => array('0', '55', '55', '55', '54', '54', '54'),
570
        'כש'    => array('0', '5', '54', '54'),
571
        'כס'    => array('0', '5', '54', '54'),
572
        'ככ'    => array('0', '5', '5', '5', '55', '55', '55'),
573
        'כך'    => array('0', '5', '5', '5', '55', '55', '55'),
574
        'כ'     => array('0', '5', '5', '5'),
575
        'כח'    => array('0', '55', '55', '55', '5', '5', '5'),
576
        'ך'     => array('0', '', '5', '5'),
577
        'ל'     => array('0', '8', '8', '8'),
578
        'לל'    => array('0', '88', '88', '88', '8', '8', '8'),
579
        'מנ'    => array('0', '66', '66', '66'),
580
        'מן'    => array('0', '66', '66', '66'),
581
        'ממ'    => array('0', '6', '6', '6', '66', '66', '66'),
582
        'מם'    => array('0', '6', '6', '6', '66', '66', '66'),
583
        'מ'     => array('0', '6', '6', '6'),
584
        'ם'     => array('0', '', '6', '6'),
585
        'נמ'    => array('0', '66', '66', '66'),
586
        'נם'    => array('0', '66', '66', '66'),
587
        'ננ'    => array('0', '6', '6', '6', '66', '66', '66'),
588
        'נן'    => array('0', '6', '6', '6', '66', '66', '66'),
589
        'נ'     => array('0', '6', '6', '6'),
590
        'ן'     => array('0', '', '6', '6'),
591
        'סתש'   => array('0', '2', '4', '4'),
592
        'סתז'   => array('0', '2', '4', '4'),
593
        'סטז'   => array('0', '2', '4', '4'),
594
        'סטש'   => array('0', '2', '4', '4'),
595
        'סצד'   => array('0', '2', '4', '4'),
596
        'סט'    => array('0', '2', '4', '4', '43', '43', '43'),
597
        'סת'    => array('0', '2', '4', '4', '43', '43', '43'),
598
        'סג'    => array('0', '44', '44', '44', '4', '4', '4'),
599
        'סס'    => array('0', '4', '4', '4', '44', '44', '44'),
600
        'סצ'    => array('0', '44', '44', '44'),
601
        'סץ'    => array('0', '44', '44', '44'),
602
        'סז'    => array('0', '44', '44', '44'),
603
        'סש'    => array('0', '44', '44', '44'),
604
        'ס'     => array('0', '4', '4', '4'),
605
        'ע'     => array('1', '0', '', ''),
606
        'פב'    => array('0', '7', '7', '7', '77', '77', '77'),
607
        'פוו'   => array('0', '7', '7', '7', '77', '77', '77'),
608
        'פפ'    => array('0', '7', '7', '7', '77', '77', '77'),
609
        'פף'    => array('0', '7', '7', '7', '77', '77', '77'),
610
        'פ'     => array('0', '7', '7', '7'),
611
        'ף'     => array('0', '', '7', '7'),
612
        'צג'    => array('0', '44', '44', '44', '45', '45', '45'),
613
        'צז'    => array('0', '44', '44', '44'),
614
        'צס'    => array('0', '44', '44', '44'),
615
        'צצ'    => array('0', '4', '4', '4', '5', '5', '5', '44', '44', '44', '54', '54', '54', '45', '45', '45'),
616
        'צץ'    => array('0', '4', '4', '4', '5', '5', '5', '44', '44', '44', '54', '54', '54'),
617
        'צש'    => array('0', '44', '44', '44', '4', '4', '4', '5', '5', '5'),
618
        'צ'     => array('0', '4', '4', '4', '5', '5', '5'),
619
        'ץ'     => array('0', '', '4', '4'),
620
        'קה'    => array('0', '55', '55', '5'),
621
        'קס'    => array('0', '5', '54', '54'),
622
        'קש'    => array('0', '5', '54', '54'),
623
        'קק'    => array('0', '5', '5', '5', '55', '55', '55'),
624
        'קח'    => array('0', '55', '55', '55'),
625
        'קכ'    => array('0', '55', '55', '55'),
626
        'קך'    => array('0', '55', '55', '55'),
627
        'קג'    => array('0', '55', '55', '55', '54', '54', '54'),
628
        'ק'     => array('0', '5', '5', '5'),
629
        'רר'    => array('0', '99', '99', '99', '9', '9', '9'),
630
        'ר'     => array('0', '9', '9', '9'),
631
        'שטז'   => array('0', '2', '4', '4'),
632
        'שתש'   => array('0', '2', '4', '4'),
633
        'שתז'   => array('0', '2', '4', '4'),
634
        'שטש'   => array('0', '2', '4', '4'),
635
        'שד'    => array('0', '2', '43', '43'),
636
        'שז'    => array('0', '44', '44', '44'),
637
        'שס'    => array('0', '44', '44', '44'),
638
        'שת'    => array('0', '2', '43', '43'),
639
        'שג'    => array('0', '4', '4', '4', '44', '44', '44', '4', '43', '43'),
640
        'שט'    => array('0', '2', '43', '43', '44', '44', '44'),
641
        'שצ'    => array('0', '44', '44', '44', '45', '45', '45'),
642
        'שץ'    => array('0', '44', '', '44', '45', '', '45'),
643
        'שש'    => array('0', '4', '4', '4', '44', '44', '44'),
644
        'ש'     => array('0', '4', '4', '4'),
645
        'תג'    => array('0', '34', '34', '34'),
646
        'תז'    => array('0', '34', '34', '34'),
647
        'תש'    => array('0', '4', '4', '4'),
648
        'תת'    => array('0', '3', '3', '3', '4', '4', '4', '33', '33', '33', '44', '44', '44', '34', '34', '34', '43', '43', '43'),
649
        'ת'     => array('0', '3', '3', '3', '4', '4', '4'),
650
        // Arabic alphabet
651
        'ا'   => array('1', '0', '', ''),
652
        'ب'   => array('0', '7', '7', '7'),
653
        'ت'   => array('0', '3', '3', '3'),
654
        'ث'   => array('0', '3', '3', '3'),
655
        'ج'   => array('0', '4', '4', '4'),
656
        'ح'   => array('0', '5', '5', '5'),
657
        'خ'   => array('0', '5', '5', '5'),
658
        'د'   => array('0', '3', '3', '3'),
659
        'ذ'   => array('0', '3', '3', '3'),
660
        'ر'   => array('0', '9', '9', '9'),
661
        'ز'   => array('0', '4', '4', '4'),
662
        'س'   => array('0', '4', '4', '4'),
663
        'ش'   => array('0', '4', '4', '4'),
664
        'ص'   => array('0', '4', '4', '4'),
665
        'ض'   => array('0', '3', '3', '3'),
666
        'ط'   => array('0', '3', '3', '3'),
667
        'ظ'   => array('0', '4', '4', '4'),
668
        'ع'   => array('1', '0', '', ''),
669
        'غ'   => array('0', '0', '', ''),
670
        'ف'   => array('0', '7', '7', '7'),
671
        'ق'   => array('0', '5', '5', '5'),
672
        'ك'   => array('0', '5', '5', '5'),
673
        'ل'   => array('0', '8', '8', '8'),
674
        'لا'  => array('0', '8', '8', '8'),
675
        'م'   => array('0', '6', '6', '6'),
676
        'ن'   => array('0', '6', '6', '6'),
677
        'هن'  => array('0', '66', '66', '66'),
678
        'ه'   => array('0', '5', '5', ''),
679
        'و'   => array('1', '', '', '', '7', '', ''),
680
        'ي'   => array('0', '1', '', ''),
681
        'آ'   => array('0', '1', '', ''),
682
        'ة'   => array('0', '', '', '3'),
683
        'ی'   => array('0', '1', '', ''),
684
        'ى'   => array('1', '1', '', ''),
685
    );
686
687
    /**
688
     * Calculate the Daitch-Mokotoff soundex for a word.
689
     *
690
     * @param string $name
691
     *
692
     * @return string[] List of possible DM codes for the word.
693
     */
694
    private static function daitchMokotoffWord($name)
695
    {
696
        // Apply special transformation rules to the input string
697
        $name = I18N::strtoupper($name);
698
        foreach (self::$transformNameTable as $transformRule) {
699
            $name = str_replace($transformRule[0], $transformRule[1], $name);
700
        }
701
702
        // Initialize
703
        $name_script = I18N::textScript($name);
704
        $noVowels    = ($name_script == 'Hebr' || $name_script == 'Arab');
705
706
        $lastPos         = strlen($name) - 1;
707
        $currPos         = 0;
708
        $state           = 1; // 1: start of input string, 2: before vowel, 3: other
709
        $result          = array(); // accumulate complete 6-digit D-M codes here
710
        $partialResult   = array(); // accumulate incomplete D-M codes here
711
        $partialResult[] = array('!'); // initialize 1st partial result  ('!' stops "duplicate sound" check)
712
713
        // Loop through the input string.
714
        // Stop when the string is exhausted or when no more partial results remain
715
        while (count($partialResult) !== 0 && $currPos <= $lastPos) {
716
            // Find the DM coding table entry for the chunk at the current position
717
            $thisEntry = substr($name, $currPos, self::MAXCHAR); // Get maximum length chunk
718
            while ($thisEntry != '') {
719
                if (isset(self::$dmsounds[$thisEntry])) {
720
                    break;
721
                }
722
                $thisEntry = substr($thisEntry, 0, -1); // Not in table: try a shorter chunk
723
            }
724
            if ($thisEntry === '') {
725
                $currPos++; // Not in table: advance pointer to next byte
726
                continue; // and try again
727
            }
728
729
            $soundTableEntry = self::$dmsounds[$thisEntry];
730
            $workingResult   = $partialResult;
731
            $partialResult   = array();
732
            $currPos += strlen($thisEntry);
733
734
            // Not at beginning of input string
735
            if ($state != 1) {
736
                if ($currPos <= $lastPos) {
737
                    // Determine whether the next chunk is a vowel
738
                    $nextEntry = substr($name, $currPos, self::MAXCHAR); // Get maximum length chunk
739
                    while ($nextEntry != '') {
740
                        if (isset(self::$dmsounds[$nextEntry])) {
741
                            break;
742
                        }
743
                        $nextEntry = substr($nextEntry, 0, -1); // Not in table: try a shorter chunk
744
                    }
745
                } else {
746
                    $nextEntry = '';
747
                }
748
                if ($nextEntry != '' && self::$dmsounds[$nextEntry][0] != '0') {
749
                    $state = 2;
750
                } else {
751
                    // Next chunk is a vowel
752
                    $state = 3;
753
                }
754
            }
755
756
            while ($state < count($soundTableEntry)) {
757
                // empty means 'ignore this sound in this state'
758
                if ($soundTableEntry[$state] == '') {
759
                    foreach ($workingResult as $workingEntry) {
760
                        $tempEntry = $workingEntry;
761
                        $tempEntry[count($tempEntry) - 1] .= '!'; // Prevent false 'doubles'
762
                        $partialResult[] = $tempEntry;
763
                    }
764
                } else {
765
                    foreach ($workingResult as $workingEntry) {
766
                        if ($soundTableEntry[$state] !== $workingEntry[count($workingEntry) - 1]) {
767
                            // Incoming sound isn't a duplicate of the previous sound
768
                            $workingEntry[] = $soundTableEntry[$state];
769
                        } else {
770
                            // Incoming sound is a duplicate of the previous sound
771
                            // For Hebrew and Arabic, we need to create a pair of D-M sound codes,
772
                            // one of the pair with only a single occurrence of the duplicate sound,
773
                            // the other with both occurrences
774
                            if ($noVowels) {
775
                                $workingEntry[] = $soundTableEntry[$state];
776
                            }
777
                        }
778
                        if (count($workingEntry) < 7) {
779
                            $partialResult[] = $workingEntry;
780
                        } else {
781
                            // This is the 6th code in the sequence
782
                            // We're looking for 7 entries because the first is '!' and doesn't count
783
                            $tempResult = str_replace('!', '', implode('', $workingEntry));
784
                            // Only return codes from recognisable sounds
785
                            if ($tempResult) {
786
                                $result[] = substr($tempResult . '000000', 0, 6);
787
                            }
788
                        }
789
                    }
790
                }
791
                $state = $state + 3; // Advance to next triplet while keeping the same basic state
792
            }
793
        }
794
795
        // Zero-fill and copy all remaining partial results
796
        foreach ($partialResult as $workingEntry) {
797
            $tempResult = str_replace('!', '', implode('', $workingEntry));
798
            // Only return codes from recognisable sounds
799
            if ($tempResult) {
800
                $result[] = substr($tempResult . '000000', 0, 6);
801
            }
802
        }
803
804
        return $result;
805
    }
806
}
807