| 1 |  |  | # -*- coding: utf-8 -*- | 
            
                                                                                                            
                            
            
                                    
            
            
                | 2 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 3 |  |  | # Copyright 2014-2018 by Christopher C. Little. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 4 |  |  | # This file is part of Abydos. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 5 |  |  | # | 
            
                                                                                                            
                            
            
                                    
            
            
                | 6 |  |  | # Abydos is free software: you can redistribute it and/or modify | 
            
                                                                                                            
                            
            
                                    
            
            
                | 7 |  |  | # it under the terms of the GNU General Public License as published by | 
            
                                                                                                            
                            
            
                                    
            
            
                | 8 |  |  | # the Free Software Foundation, either version 3 of the License, or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 9 |  |  | # (at your option) any later version. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 10 |  |  | # | 
            
                                                                                                            
                            
            
                                    
            
            
                | 11 |  |  | # Abydos is distributed in the hope that it will be useful, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 12 |  |  | # but WITHOUT ANY WARRANTY; without even the implied warranty of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 13 |  |  | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 
            
                                                                                                            
                            
            
                                    
            
            
                | 14 |  |  | # GNU General Public License for more details. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 15 |  |  | # | 
            
                                                                                                            
                            
            
                                    
            
            
                | 16 |  |  | # You should have received a copy of the GNU General Public License | 
            
                                                                                                            
                            
            
                                    
            
            
                | 17 |  |  | # along with Abydos. If not, see <http://www.gnu.org/licenses/>. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 18 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 19 |  |  | """abydos.stemmer. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 20 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 21 |  |  | The stemmer module defines word stemmers including: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 22 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 23 |  |  |     - the Lovins stemmer | 
            
                                                                                                            
                            
            
                                    
            
            
                | 24 |  |  |     - the Porter and Porter2 (Snowball English) stemmers | 
            
                                                                                                            
                            
            
                                    
            
            
                | 25 |  |  |     - Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish | 
            
                                                                                                            
                            
            
                                    
            
            
                | 26 |  |  |     - CLEF German, German plus, and Swedish stemmers | 
            
                                                                                                            
                            
            
                                    
            
            
                | 27 |  |  |     - Caumann's German stemmer | 
            
                                                                                                            
                            
            
                                    
            
            
                | 28 |  |  | """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 29 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 30 |  |  | from __future__ import unicode_literals | 
            
                                                                                                            
                            
            
                                    
            
            
                | 31 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 32 |  |  | import unicodedata | 
            
                                                                                                            
                            
            
                                    
            
            
                | 33 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 34 |  |  | from six import text_type | 
            
                                                                                                            
                            
            
                                    
            
            
                | 35 |  |  | from six.moves import range | 
            
                                                                                                            
                            
            
                                    
            
            
                | 36 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 37 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 38 |  |  | def lovins(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 39 |  |  |     """Return Lovins stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 40 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 41 |  |  |     Lovins stemmer | 
            
                                                                                                            
                            
            
                                    
            
            
                | 42 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 43 |  |  |     The Lovins stemmer is described in Julie Beth Lovins's article at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 44 |  |  |     http://www.mt-archive.info/MT-1968-Lovins.pdf | 
            
                                                                                                            
                            
            
                                    
            
            
                | 45 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 46 |  |  |     :param word: the word to stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 47 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 48 |  |  |     :rtype: string | 
            
                                                                                                            
                            
            
                                    
            
            
                | 49 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 50 |  |  |     >>> lovins('reading') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 51 |  |  |     'read' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 52 |  |  |     >>> lovins('suspension') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 53 |  |  |     'suspens' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 54 |  |  |     >>> lovins('elusiveness') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 55 |  |  |     'elus' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 56 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 57 |  |  |     # pylint: disable=too-many-branches, too-many-locals | 
            
                                                                                                            
                            
            
                                    
            
            
                | 58 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 59 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 60 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 61 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 62 |  |  |     def cond_b(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 63 |  |  |         """Return Lovins' condition B.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 64 |  |  |         return len(word)-suffix_len >= 3 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 65 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 66 |  |  |     def cond_c(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 67 |  |  |         """Return Lovins' condition C.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 68 |  |  |         return len(word)-suffix_len >= 4 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 69 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 70 |  |  |     def cond_d(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 71 |  |  |         """Return Lovins' condition D.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 72 |  |  |         return len(word)-suffix_len >= 5 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 73 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 74 |  |  |     def cond_e(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 75 |  |  |         """Return Lovins' condition E.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 76 |  |  |         return word[-suffix_len-1] != 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 77 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 78 |  |  |     def cond_f(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 79 |  |  |         """Return Lovins' condition F.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 80 |  |  |         return (len(word)-suffix_len >= 3 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 81 |  |  |                 word[-suffix_len-1] != 'e') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 82 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 83 |  |  |     def cond_g(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 84 |  |  |         """Return Lovins' condition G.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 85 |  |  |         return (len(word)-suffix_len >= 3 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 86 |  |  |                 word[-suffix_len-1] == 'f') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 87 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 88 |  |  |     def cond_h(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 89 |  |  |         """Return Lovins' condition H.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 90 |  |  |         return (word[-suffix_len-1] == 't' or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 91 |  |  |                 word[-suffix_len-2:-suffix_len] == 'll') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 92 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 93 |  |  |     def cond_i(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 94 |  |  |         """Return Lovins' condition I.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 95 |  |  |         return word[-suffix_len-1] not in {'e', 'o'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 96 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 97 |  |  |     def cond_j(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 98 |  |  |         """Return Lovins' condition J.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 99 |  |  |         return word[-suffix_len-1] not in {'a', 'e'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 100 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 101 |  |  |     def cond_k(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 102 |  |  |         """Return Lovins' condition K.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 103 |  |  |         return (len(word)-suffix_len >= 3 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 104 |  |  |                 (word[-suffix_len-1] in {'i', 'l'} or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 105 |  |  |                  (word[-suffix_len-3] == 'u' and word[-suffix_len-1] == 'e'))) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 106 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 107 |  |  |     def cond_l(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 108 |  |  |         """Return Lovins' condition L.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 109 |  |  |         return (word[-suffix_len-1] not in {'s', 'u', 'x'} or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 110 |  |  |                 word[-suffix_len-1] == 'os') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 111 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 112 |  |  |     def cond_m(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 113 |  |  |         """Return Lovins' condition M.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 114 |  |  |         return word[-suffix_len-1] not in {'a', 'c', 'e', 'm'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 115 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 116 |  |  |     def cond_n(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 117 |  |  |         """Return Lovins' condition N.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 118 |  |  |         if len(word)-suffix_len >= 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 119 |  |  |             if word[-suffix_len-3] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 120 |  |  |                 if len(word)-suffix_len >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 121 |  |  |                     return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 122 |  |  |             else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 123 |  |  |                 return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 124 |  |  |         return False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 125 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 126 |  |  |     def cond_o(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 127 |  |  |         """Return Lovins' condition O.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 128 |  |  |         return word[-suffix_len-1] in {'i', 'l'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 129 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 130 |  |  |     def cond_p(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 131 |  |  |         """Return Lovins' condition P.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 132 |  |  |         return word[-suffix_len-1] != 'c' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 133 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 134 |  |  |     def cond_q(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 135 |  |  |         """Return Lovins' condition Q.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 136 |  |  |         return (len(word)-suffix_len >= 3 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 137 |  |  |                 word[-suffix_len-1] not in {'l', 'n'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 138 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 139 |  |  |     def cond_r(word, suffix_len): | 
                            
                    |  |  |  | 
                                                                                        
                                                                                     | 
            
                                                                                                            
                            
            
                                    
            
            
                | 140 |  |  |         """Return Lovins' condition R.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 141 |  |  |         return word[-suffix_len-1] in {'n', 'r'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 142 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 143 |  |  |     def cond_s(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 144 |  |  |         """Return Lovins' condition S.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 145 |  |  |         return (word[-suffix_len-2:-suffix_len] == 'dr' or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 146 |  |  |                 (word[-suffix_len-1] == 't' and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 147 |  |  |                  word[-suffix_len-2:-suffix_len] != 'tt')) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 148 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 149 |  |  |     def cond_t(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 150 |  |  |         """Return Lovins' condition T.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 151 |  |  |         return (word[-suffix_len-1] in {'s', 't'} and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 152 |  |  |                 word[-suffix_len-2:-suffix_len] != 'ot') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 153 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 154 |  |  |     def cond_u(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 155 |  |  |         """Return Lovins' condition U.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 156 |  |  |         return word[-suffix_len-1] in {'l', 'm', 'n', 'r'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 157 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 158 |  |  |     def cond_v(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 159 |  |  |         """Return Lovins' condition V.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 160 |  |  |         return word[-suffix_len-1] == 'c' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 161 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 162 |  |  |     def cond_w(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 163 |  |  |         """Return Lovins' condition W.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 164 |  |  |         return word[-suffix_len-1] not in {'s', 'u'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 165 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 166 |  |  |     def cond_x(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 167 |  |  |         """Return Lovins' condition X.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 168 |  |  |         return (word[-suffix_len-1] in {'i', 'l'} or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 169 |  |  |                 (word[-suffix_len-3:-suffix_len] == 'u' and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 170 |  |  |                  word[-suffix_len-1] == 'e')) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 171 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 172 |  |  |     def cond_y(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 173 |  |  |         """Return Lovins' condition Y.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 174 |  |  |         return word[-suffix_len-2:-suffix_len] == 'in' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 175 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 176 |  |  |     def cond_z(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 177 |  |  |         """Return Lovins' condition Z.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 178 |  |  |         return word[-suffix_len-1] != 'f' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 179 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 180 |  |  |     def cond_aa(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 181 |  |  |         """Return Lovins' condition AA.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 182 |  |  |         return (word[-suffix_len-1] in {'d', 'f', 'l', 't'} or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 183 |  |  |                 word[-suffix_len-2:-suffix_len] in {'ph', 'th', 'er', 'or', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 184 |  |  |                                                     'es'}) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 185 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 186 |  |  |     def cond_bb(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 187 |  |  |         """Return Lovins' condition BB.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 188 |  |  |         return (len(word)-suffix_len >= 3 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 189 |  |  |                 word[-suffix_len-3:-suffix_len] != 'met' and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 190 |  |  |                 word[-suffix_len-4:-suffix_len] != 'ryst') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 191 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 192 |  |  |     def cond_cc(word, suffix_len): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 193 |  |  |         """Return Lovins' condition CC.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 194 |  |  |         return word[-suffix_len-1] == 'l' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 195 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 196 |  |  |     suffix = {'alistically': cond_b, 'arizability': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 197 |  |  |               'izationally': cond_b, 'antialness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 198 |  |  |               'arisations': None, 'arizations': None, 'entialness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 199 |  |  |               'allically': cond_c, 'antaneous': None, 'antiality': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 200 |  |  |               'arisation': None, 'arization': None, 'ationally': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 201 |  |  |               'ativeness': None, 'eableness': cond_e, 'entations': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 202 |  |  |               'entiality': None, 'entialize': None, 'entiation': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 203 |  |  |               'ionalness': None, 'istically': None, 'itousness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 204 |  |  |               'izability': None, 'izational': None, 'ableness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 205 |  |  |               'arizable': None, 'entation': None, 'entially': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 206 |  |  |               'eousness': None, 'ibleness': None, 'icalness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 207 |  |  |               'ionalism': None, 'ionality': None, 'ionalize': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 208 |  |  |               'iousness': None, 'izations': None, 'lessness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 209 |  |  |               'ability': None, 'aically': None, 'alistic': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 210 |  |  |               'alities': None, 'ariness': cond_e, 'aristic': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 211 |  |  |               'arizing': None, 'ateness': None, 'atingly': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 212 |  |  |               'ational': cond_b, 'atively': None, 'ativism': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 213 |  |  |               'elihood': cond_e, 'encible': None, 'entally': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 214 |  |  |               'entials': None, 'entiate': None, 'entness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 215 |  |  |               'fulness': None, 'ibility': None, 'icalism': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 216 |  |  |               'icalist': None, 'icality': None, 'icalize': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 217 |  |  |               'ication': cond_g, 'icianry': None, 'ination': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 218 |  |  |               'ingness': None, 'ionally': None, 'isation': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 219 |  |  |               'ishness': None, 'istical': None, 'iteness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 220 |  |  |               'iveness': None, 'ivistic': None, 'ivities': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 221 |  |  |               'ization': cond_f, 'izement': None, 'oidally': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 222 |  |  |               'ousness': None, 'aceous': None, 'acious': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 223 |  |  |               'action': cond_g, 'alness': None, 'ancial': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 224 |  |  |               'ancies': None, 'ancing': cond_b, 'ariser': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 225 |  |  |               'arized': None, 'arizer': None, 'atable': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 226 |  |  |               'ations': cond_b, 'atives': None, 'eature': cond_z, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 227 |  |  |               'efully': None, 'encies': None, 'encing': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 228 |  |  |               'ential': None, 'enting': cond_c, 'entist': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 229 |  |  |               'eously': None, 'ialist': None, 'iality': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 230 |  |  |               'ialize': None, 'ically': None, 'icance': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 231 |  |  |               'icians': None, 'icists': None, 'ifully': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 232 |  |  |               'ionals': None, 'ionate': cond_d, 'ioning': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 233 |  |  |               'ionist': None, 'iously': None, 'istics': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 234 |  |  |               'izable': cond_e, 'lessly': None, 'nesses': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 235 |  |  |               'oidism': None, 'acies': None, 'acity': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 236 |  |  |               'aging': cond_b, 'aical': None, 'alist': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 237 |  |  |               'alism': cond_b, 'ality': None, 'alize': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 238 |  |  |               'allic': cond_bb, 'anced': cond_b, 'ances': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 239 |  |  |               'antic': cond_c, 'arial': None, 'aries': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 240 |  |  |               'arily': None, 'arity': cond_b, 'arize': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 241 |  |  |               'aroid': None, 'ately': None, 'ating': cond_i, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 242 |  |  |               'ation': cond_b, 'ative': None, 'ators': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 243 |  |  |               'atory': None, 'ature': cond_e, 'early': cond_y, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 244 |  |  |               'ehood': None, 'eless': None, 'elity': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 245 |  |  |               'ement': None, 'enced': None, 'ences': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 246 |  |  |               'eness': cond_e, 'ening': cond_e, 'ental': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 247 |  |  |               'ented': cond_c, 'ently': None, 'fully': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 248 |  |  |               'ially': None, 'icant': None, 'ician': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 249 |  |  |               'icide': None, 'icism': None, 'icist': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 250 |  |  |               'icity': None, 'idine': cond_i, 'iedly': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 251 |  |  |               'ihood': None, 'inate': None, 'iness': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 252 |  |  |               'ingly': cond_b, 'inism': cond_j, 'inity': cond_cc, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 253 |  |  |               'ional': None, 'ioned': None, 'ished': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 254 |  |  |               'istic': None, 'ities': None, 'itous': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 255 |  |  |               'ively': None, 'ivity': None, 'izers': cond_f, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 256 |  |  |               'izing': cond_f, 'oidal': None, 'oides': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 257 |  |  |               'otide': None, 'ously': None, 'able': None, 'ably': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 258 |  |  |               'ages': cond_b, 'ally': cond_b, 'ance': cond_b, 'ancy': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 259 |  |  |               'ants': cond_b, 'aric': None, 'arly': cond_k, 'ated': cond_i, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 260 |  |  |               'ates': None, 'atic': cond_b, 'ator': None, 'ealy': cond_y, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 261 |  |  |               'edly': cond_e, 'eful': None, 'eity': None, 'ence': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 262 |  |  |               'ency': None, 'ened': cond_e, 'enly': cond_e, 'eous': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 263 |  |  |               'hood': None, 'ials': None, 'ians': None, 'ible': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 264 |  |  |               'ibly': None, 'ical': None, 'ides': cond_l, 'iers': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 265 |  |  |               'iful': None, 'ines': cond_m, 'ings': cond_n, 'ions': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 266 |  |  |               'ious': None, 'isms': cond_b, 'ists': None, 'itic': cond_h, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 267 |  |  |               'ized': cond_f, 'izer': cond_f, 'less': None, 'lily': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 268 |  |  |               'ness': None, 'ogen': None, 'ward': None, 'wise': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 269 |  |  |               'ying': cond_b, 'yish': None, 'acy': None, 'age': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 270 |  |  |               'aic': None, 'als': cond_bb, 'ant': cond_b, 'ars': cond_o, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 271 |  |  |               'ary': cond_f, 'ata': None, 'ate': None, 'eal': cond_y, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 272 |  |  |               'ear': cond_y, 'ely': cond_e, 'ene': cond_e, 'ent': cond_c, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 273 |  |  |               'ery': cond_e, 'ese': None, 'ful': None, 'ial': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 274 |  |  |               'ian': None, 'ics': None, 'ide': cond_l, 'ied': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 275 |  |  |               'ier': None, 'ies': cond_p, 'ily': None, 'ine': cond_m, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 276 |  |  |               'ing': cond_n, 'ion': cond_q, 'ish': cond_c, 'ism': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 277 |  |  |               'ist': None, 'ite': cond_aa, 'ity': None, 'ium': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 278 |  |  |               'ive': None, 'ize': cond_f, 'oid': None, 'one': cond_r, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 279 |  |  |               'ous': None, 'ae': None, 'al': cond_bb, 'ar': cond_x, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 280 |  |  |               'as': cond_b, 'ed': cond_e, 'en': cond_f, 'es': cond_e, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 281 |  |  |               'ia': None, 'ic': None, 'is': None, 'ly': cond_b, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 282 |  |  |               'on': cond_s, 'or': cond_t, 'um': cond_u, 'us': cond_v, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 283 |  |  |               'yl': cond_r, '\'s': None, 's\'': None, 'a': None, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 284 |  |  |               'e': None, 'i': None, 'o': None, 's': cond_w, 'y': cond_b} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 285 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 286 |  |  |     for suffix_len in range(11, 0, -1): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 287 |  |  |         ending = word[-suffix_len:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 288 |  |  |         if (ending in suffix and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 289 |  |  |                 len(word)-suffix_len >= 2 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 290 |  |  |                 (suffix[ending] is None or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 291 |  |  |                  suffix[ending](word, suffix_len))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 292 |  |  |             word = word[:-suffix_len] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 293 |  |  |             break | 
            
                                                                                                            
                            
            
                                    
            
            
                | 294 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 295 |  |  |     def recode9(stem): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 296 |  |  |         """Return Lovins' conditional recode rule 9.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 297 |  |  |         if stem[-3:-2] in {'a', 'i', 'o'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 298 |  |  |             return stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 299 |  |  |         return stem[:-2]+'l' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 300 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 301 |  |  |     def recode24(stem): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 302 |  |  |         """Return Lovins' conditional recode rule 24.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 303 |  |  |         if stem[-4:-3] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 304 |  |  |             return stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 305 |  |  |         return stem[:-1]+'s' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 306 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 307 |  |  |     def recode28(stem): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 308 |  |  |         """Return Lovins' conditional recode rule 28.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 309 |  |  |         if stem[-4:-3] in {'p', 't'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 310 |  |  |             return stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 311 |  |  |         return stem[:-1]+'s' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 312 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 313 |  |  |     def recode30(stem): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 314 |  |  |         """Return Lovins' conditional recode rule 30.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 315 |  |  |         if stem[-4:-3] == 'm': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 316 |  |  |             return stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 317 |  |  |         return stem[:-1]+'s' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 318 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 319 |  |  |     def recode32(stem): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 320 |  |  |         """Return Lovins' conditional recode rule 32.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 321 |  |  |         if stem[-3:-2] == 'n': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 322 |  |  |             return stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 323 |  |  |         return stem[:-1]+'s' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 324 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 325 |  |  |     if word[-2:] in {'bb', 'dd', 'gg', 'll', 'mm', 'nn', 'pp', 'rr', 'ss', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 326 |  |  |                      'tt'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 327 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 328 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 329 |  |  |     recode = (('iev', 'ief'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 330 |  |  |               ('uct', 'uc'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 331 |  |  |               ('umpt', 'um'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 332 |  |  |               ('rpt', 'rb'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 333 |  |  |               ('urs', 'ur'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 334 |  |  |               ('istr', 'ister'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 335 |  |  |               ('metr', 'meter'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 336 |  |  |               ('olv', 'olut'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 337 |  |  |               ('ul', recode9), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 338 |  |  |               ('bex', 'bic'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 339 |  |  |               ('dex', 'dic'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 340 |  |  |               ('pex', 'pic'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 341 |  |  |               ('tex', 'tic'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 342 |  |  |               ('ax', 'ac'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 343 |  |  |               ('ex', 'ec'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 344 |  |  |               ('ix', 'ic'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 345 |  |  |               ('lux', 'luc'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 346 |  |  |               ('uad', 'uas'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 347 |  |  |               ('vad', 'vas'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 348 |  |  |               ('cid', 'cis'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 349 |  |  |               ('lid', 'lis'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 350 |  |  |               ('erid', 'eris'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 351 |  |  |               ('pand', 'pans'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 352 |  |  |               ('end', recode24), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 353 |  |  |               ('ond', 'ons'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 354 |  |  |               ('lud', 'lus'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 355 |  |  |               ('rud', 'rus'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 356 |  |  |               ('her', recode28), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 357 |  |  |               ('mit', 'mis'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 358 |  |  |               ('ent', recode30), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 359 |  |  |               ('ert', 'ers'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 360 |  |  |               ('et', recode32), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 361 |  |  |               ('yt', 'ys'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 362 |  |  |               ('yz', 'ys')) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 363 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 364 |  |  |     for ending, replacement in recode: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 365 |  |  |         if word.endswith(ending): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 366 |  |  |             if callable(replacement): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 367 |  |  |                 word = replacement(word) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 368 |  |  |             else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 369 |  |  |                 word = word[:-len(ending)] + replacement | 
            
                                                                                                            
                            
            
                                    
            
            
                | 370 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 371 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 372 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 373 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 374 |  |  | def _m_degree(term, vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 375 |  |  |     """Return Porter helper function _m_degree value. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 376 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 377 |  |  |     m-degree is equal to the number of V to C transitions | 
            
                                                                                                            
                            
            
                                    
            
            
                | 378 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 379 |  |  |     :param term: the word for which to calculate the m-degree | 
            
                                                                                                            
                            
            
                                    
            
            
                | 380 |  |  |     :param vowels: the set of vowels in the language | 
            
                                                                                                            
                            
            
                                    
            
            
                | 381 |  |  |     :returns: the m-degree as defined in the Porter stemmer definition | 
            
                                                                                                            
                            
            
                                    
            
            
                | 382 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 383 |  |  |     mdeg = 0 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 384 |  |  |     last_was_vowel = False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 385 |  |  |     for letter in term: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 386 |  |  |         if letter in vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 387 |  |  |             last_was_vowel = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 388 |  |  |         else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 389 |  |  |             if last_was_vowel: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 390 |  |  |                 mdeg += 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 391 |  |  |             last_was_vowel = False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 392 |  |  |     return mdeg | 
            
                                                                                                            
                            
            
                                    
            
            
                | 393 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 394 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 395 |  |  | def _sb_has_vowel(term, vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 396 |  |  |     """Return Porter helper function _sb_has_vowel value. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 397 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 398 |  |  |     :param term: the word to scan for vowels | 
            
                                                                                                            
                            
            
                                    
            
            
                | 399 |  |  |     :param vowels: the set of vowels in the language | 
            
                                                                                                            
                            
            
                                    
            
            
                | 400 |  |  |     :returns: true iff a vowel exists in the term (as defined in the Porter | 
            
                                                                                                            
                            
            
                                    
            
            
                | 401 |  |  |         stemmer definition) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 402 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 403 |  |  |     for letter in term: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 404 |  |  |         if letter in vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 405 |  |  |             return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 406 |  |  |     return False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 407 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 408 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 409 |  |  | def _ends_in_doubled_cons(term, vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 410 |  |  |     """Return Porter helper function _ends_in_doubled_cons value. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 411 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 412 |  |  |     :param term: the word to check for a final doubled consonant | 
            
                                                                                                            
                            
            
                                    
            
            
                | 413 |  |  |     :param vowels: the set of vowels in the language | 
            
                                                                                                            
                            
            
                                    
            
            
                | 414 |  |  |     :returns: true iff the stem ends in a doubled consonant (as defined in the | 
            
                                                                                                            
                            
            
                                    
            
            
                | 415 |  |  |         Porter stemmer definition) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 416 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 417 |  |  |     if len(term) > 1 and term[-1] not in vowels and term[-2] == term[-1]: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 418 |  |  |         return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 419 |  |  |     return False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 420 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 421 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 422 |  |  | def _ends_in_cvc(term, vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 423 |  |  |     """Return Porter helper function _ends_in_cvc value. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 424 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 425 |  |  |     :param term: the word to scan for cvc | 
            
                                                                                                            
                            
            
                                    
            
            
                | 426 |  |  |     :param vowels: the set of vowels in the language | 
            
                                                                                                            
                            
            
                                    
            
            
                | 427 |  |  |     :returns: true iff the stem ends in cvc (as defined in the Porter stemmer | 
            
                                                                                                            
                            
            
                                    
            
            
                | 428 |  |  |         definition) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 429 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 430 |  |  |     if len(term) > 2 and (term[-1] not in vowels and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 431 |  |  |                           term[-2] in vowels and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 432 |  |  |                           term[-3] not in vowels and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 433 |  |  |                           term[-1] not in tuple('wxY')): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 434 |  |  |         return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 435 |  |  |     return False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 436 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 437 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 438 |  |  | def porter(word, early_english=False): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 439 |  |  |     """Return Porter stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 440 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 441 |  |  |     The Porter stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 442 |  |  |     http://snowball.tartarus.org/algorithms/porter/stemmer.html | 
            
                                                                                                            
                            
            
                                    
            
            
                | 443 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 444 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 445 |  |  |     :param early_english: set to True in order to remove -eth & -est (2nd & 3rd | 
            
                                                                                                            
                            
            
                                    
            
            
                | 446 |  |  |         person singular verbal agreement suffixes) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 447 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 448 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 449 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 450 |  |  |     >>> porter('reading') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 451 |  |  |     'read' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 452 |  |  |     >>> porter('suspension') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 453 |  |  |     'suspens' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 454 |  |  |     >>> porter('elusiveness') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 455 |  |  |     'elus' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 456 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 457 |  |  |     >>> porter('eateth', early_english=True) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 458 |  |  |     'eat' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 459 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 460 |  |  |     # pylint: disable=too-many-branches | 
            
                                                                                                            
                            
            
                                    
            
            
                | 461 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 462 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 463 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 464 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 465 |  |  |     # Return word if stem is shorter than 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 466 |  |  |     if len(word) < 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 467 |  |  |         return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 468 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 469 |  |  |     _vowels = {'a', 'e', 'i', 'o', 'u', 'y'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 470 |  |  |     # Re-map consonantal y to Y (Y will be C, y will be V) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 471 |  |  |     if word[0] == 'y': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 472 |  |  |         word = 'Y' + word[1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 473 |  |  |     for i in range(1, len(word)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 474 |  |  |         if word[i] == 'y' and word[i-1] in _vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 475 |  |  |             word = word[:i] + 'Y' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 476 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 477 |  |  |     # Step 1a | 
            
                                                                                                            
                            
            
                                    
            
            
                | 478 |  |  |     if word[-1] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 479 |  |  |         if word[-4:] == 'sses': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 480 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 481 |  |  |         elif word[-3:] == 'ies': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 482 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 483 |  |  |         elif word[-2:] == 'ss': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 484 |  |  |             pass | 
            
                                                                                                            
                            
            
                                    
            
            
                | 485 |  |  |         else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 486 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 487 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 488 |  |  |     # Step 1b | 
            
                                                                                                            
                            
            
                                    
            
            
                | 489 |  |  |     step1b_flag = False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 490 |  |  |     if word[-3:] == 'eed': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 491 |  |  |         if _m_degree(word[:-3], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 492 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 493 |  |  |     elif word[-2:] == 'ed': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 494 |  |  |         if _sb_has_vowel(word[:-2], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 495 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 496 |  |  |             step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 497 |  |  |     elif word[-3:] == 'ing': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 498 |  |  |         if _sb_has_vowel(word[:-3], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 499 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 500 |  |  |             step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 501 |  |  |     elif early_english: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 502 |  |  |         if word[-3:] == 'est': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 503 |  |  |             if _sb_has_vowel(word[:-3], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 504 |  |  |                 word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 505 |  |  |                 step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 506 |  |  |         elif word[-3:] == 'eth': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 507 |  |  |             if _sb_has_vowel(word[:-3], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 508 |  |  |                 word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 509 |  |  |                 step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 510 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 511 |  |  |     if step1b_flag: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 512 |  |  |         if word[-2:] in {'at', 'bl', 'iz'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 513 |  |  |             word += 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 514 |  |  |         elif (_ends_in_doubled_cons(word, _vowels) and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 515 |  |  |               word[-1] not in {'l', 's', 'z'}): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 516 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 517 |  |  |         elif _m_degree(word, _vowels) == 1 and _ends_in_cvc(word, _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 518 |  |  |             word += 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 519 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 520 |  |  |     # Step 1c | 
            
                                                                                                            
                            
            
                                    
            
            
                | 521 |  |  |     if word[-1] in {'Y', 'y'} and _sb_has_vowel(word[:-1], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 522 |  |  |         word = word[:-1] + 'i' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 523 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 524 |  |  |     # Step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 525 |  |  |     if len(word) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 526 |  |  |         if word[-2] == 'a': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 527 |  |  |             if word[-7:] == 'ational': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 528 |  |  |                 if _m_degree(word[:-7], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 529 |  |  |                     word = word[:-5] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 530 |  |  |             elif word[-6:] == 'tional': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 531 |  |  |                 if _m_degree(word[:-6], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 532 |  |  |                     word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 533 |  |  |         elif word[-2] == 'c': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 534 |  |  |             if word[-4:] in {'enci', 'anci'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 535 |  |  |                 if _m_degree(word[:-4], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 536 |  |  |                     word = word[:-1] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 537 |  |  |         elif word[-2] == 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 538 |  |  |             if word[-4:] == 'izer': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 539 |  |  |                 if _m_degree(word[:-4], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 540 |  |  |                     word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 541 |  |  |         elif word[-2] == 'g': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 542 |  |  |             if word[-4:] == 'logi': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 543 |  |  |                 if _m_degree(word[:-4], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 544 |  |  |                     word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 545 |  |  |         elif word[-2] == 'l': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 546 |  |  |             if word[-3:] == 'bli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 547 |  |  |                 if _m_degree(word[:-3], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 548 |  |  |                     word = word[:-1] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 549 |  |  |             elif word[-4:] == 'alli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 550 |  |  |                 if _m_degree(word[:-4], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 551 |  |  |                     word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 552 |  |  |             elif word[-5:] == 'entli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 553 |  |  |                 if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 554 |  |  |                     word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 555 |  |  |             elif word[-3:] == 'eli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 556 |  |  |                 if _m_degree(word[:-3], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 557 |  |  |                     word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 558 |  |  |             elif word[-5:] == 'ousli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 559 |  |  |                 if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 560 |  |  |                     word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 561 |  |  |         elif word[-2] == 'o': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 562 |  |  |             if word[-7:] == 'ization': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 563 |  |  |                 if _m_degree(word[:-7], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 564 |  |  |                     word = word[:-5] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 565 |  |  |             elif word[-5:] == 'ation': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 566 |  |  |                 if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 567 |  |  |                     word = word[:-3] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 568 |  |  |             elif word[-4:] == 'ator': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 569 |  |  |                 if _m_degree(word[:-4], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 570 |  |  |                     word = word[:-2] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 571 |  |  |         elif word[-2] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 572 |  |  |             if word[-5:] == 'alism': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 573 |  |  |                 if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 574 |  |  |                     word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 575 |  |  |             elif word[-7:] in {'iveness', 'fulness', 'ousness'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 576 |  |  |                 if _m_degree(word[:-7], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 577 |  |  |                     word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 578 |  |  |         elif word[-2] == 't': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 579 |  |  |             if word[-5:] == 'aliti': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 580 |  |  |                 if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 581 |  |  |                     word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 582 |  |  |             elif word[-5:] == 'iviti': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 583 |  |  |                 if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 584 |  |  |                     word = word[:-3] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 585 |  |  |             elif word[-6:] == 'biliti': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 586 |  |  |                 if _m_degree(word[:-6], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 587 |  |  |                     word = word[:-5] + 'le' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 588 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 589 |  |  |     # Step 3 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 590 |  |  |     if word[-5:] == 'icate': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 591 |  |  |         if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 592 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 593 |  |  |     elif word[-5:] == 'ative': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 594 |  |  |         if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 595 |  |  |             word = word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 596 |  |  |     elif word[-5:] in {'alize', 'iciti'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 597 |  |  |         if _m_degree(word[:-5], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 598 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 599 |  |  |     elif word[-4:] == 'ical': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 600 |  |  |         if _m_degree(word[:-4], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 601 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 602 |  |  |     elif word[-3:] == 'ful': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 603 |  |  |         if _m_degree(word[:-3], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 604 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 605 |  |  |     elif word[-4:] == 'ness': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 606 |  |  |         if _m_degree(word[:-4], _vowels) > 0: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 607 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 608 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 609 |  |  |     # Step 4 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 610 |  |  |     if word[-2:] == 'al': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 611 |  |  |         if _m_degree(word[:-2], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 612 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 613 |  |  |     elif word[-4:] == 'ance': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 614 |  |  |         if _m_degree(word[:-4], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 615 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 616 |  |  |     elif word[-4:] == 'ence': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 617 |  |  |         if _m_degree(word[:-4], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 618 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 619 |  |  |     elif word[-2:] == 'er': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 620 |  |  |         if _m_degree(word[:-2], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 621 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 622 |  |  |     elif word[-2:] == 'ic': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 623 |  |  |         if _m_degree(word[:-2], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 624 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 625 |  |  |     elif word[-4:] == 'able': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 626 |  |  |         if _m_degree(word[:-4], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 627 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 628 |  |  |     elif word[-4:] == 'ible': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 629 |  |  |         if _m_degree(word[:-4], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 630 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 631 |  |  |     elif word[-3:] == 'ant': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 632 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 633 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 634 |  |  |     elif word[-5:] == 'ement': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 635 |  |  |         if _m_degree(word[:-5], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 636 |  |  |             word = word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 637 |  |  |     elif word[-4:] == 'ment': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 638 |  |  |         if _m_degree(word[:-4], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 639 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 640 |  |  |     elif word[-3:] == 'ent': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 641 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 642 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 643 |  |  |     elif word[-4:] in {'sion', 'tion'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 644 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 645 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 646 |  |  |     elif word[-2:] == 'ou': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 647 |  |  |         if _m_degree(word[:-2], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 648 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 649 |  |  |     elif word[-3:] == 'ism': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 650 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 651 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 652 |  |  |     elif word[-3:] == 'ate': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 653 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 654 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 655 |  |  |     elif word[-3:] == 'iti': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 656 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 657 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 658 |  |  |     elif word[-3:] == 'ous': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 659 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 660 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 661 |  |  |     elif word[-3:] == 'ive': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 662 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 663 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 664 |  |  |     elif word[-3:] == 'ize': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 665 |  |  |         if _m_degree(word[:-3], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 666 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 667 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 668 |  |  |     # Step 5a | 
            
                                                                                                            
                            
            
                                    
            
            
                | 669 |  |  |     if word[-1] == 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 670 |  |  |         if _m_degree(word[:-1], _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 671 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 672 |  |  |         elif (_m_degree(word[:-1], _vowels) == 1 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 673 |  |  |               not _ends_in_cvc(word[:-1], _vowels)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 674 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 675 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 676 |  |  |     # Step 5b | 
            
                                                                                                            
                            
            
                                    
            
            
                | 677 |  |  |     if word[-2:] == 'll' and _m_degree(word, _vowels) > 1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 678 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 679 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 680 |  |  |     # Change 'Y' back to 'y' if it survived stemming | 
            
                                                                                                            
                            
            
                                    
            
            
                | 681 |  |  |     for i in range(len(word)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 682 |  |  |         if word[i] == 'Y': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 683 |  |  |             word = word[:i] + 'y' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 684 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 685 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 686 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 687 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 688 |  |  | def _sb_r1(term, vowels, r1_prefixes=None): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 689 |  |  |     """Return the R1 region, as defined in the Porter2 specification.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 690 |  |  |     vowel_found = False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 691 |  |  |     if hasattr(r1_prefixes, '__iter__'): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 692 |  |  |         for prefix in r1_prefixes: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 693 |  |  |             if term[:len(prefix)] == prefix: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 694 |  |  |                 return len(prefix) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 695 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 696 |  |  |     for i in range(len(term)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 697 |  |  |         if not vowel_found and term[i] in vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 698 |  |  |             vowel_found = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 699 |  |  |         elif vowel_found and term[i] not in vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 700 |  |  |             return i + 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 701 |  |  |     return len(term) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 702 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 703 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 704 |  |  | def _sb_r2(term, vowels, r1_prefixes=None): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 705 |  |  |     """Return the R2 region, as defined in the Porter2 specification.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 706 |  |  |     r1_start = _sb_r1(term, vowels, r1_prefixes) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 707 |  |  |     return r1_start + _sb_r1(term[r1_start:], vowels) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 708 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 709 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 710 |  |  | def _sb_ends_in_short_syllable(term, vowels, codanonvowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 711 |  |  |     """Return True iff term ends in a short syllable. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 712 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 713 |  |  |     (...according to the Porter2 specification.) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 714 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 715 |  |  |     NB: This is akin to the CVC test from the Porter stemmer. The description | 
            
                                                                                                            
                            
            
                                    
            
            
                | 716 |  |  |     is unfortunately poor/ambiguous. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 717 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 718 |  |  |     if not term: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 719 |  |  |         return False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 720 |  |  |     if len(term) == 2: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 721 |  |  |         if term[-2] in vowels and term[-1] not in vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 722 |  |  |             return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 723 |  |  |     elif len(term) >= 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 724 |  |  |         if ((term[-3] not in vowels and term[-2] in vowels and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 725 |  |  |              term[-1] in codanonvowels)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 726 |  |  |             return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 727 |  |  |     return False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 728 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 729 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 730 |  |  | def _sb_short_word(term, vowels, codanonvowels, r1_prefixes=None): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 731 |  |  |     """Return True iff term is a short word. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 732 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 733 |  |  |     (...according to the Porter2 specification.) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 734 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 735 |  |  |     if ((_sb_r1(term, vowels, r1_prefixes) == len(term) and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 736 |  |  |          _sb_ends_in_short_syllable(term, vowels, codanonvowels))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 737 |  |  |         return True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 738 |  |  |     return False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 739 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 740 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 741 |  |  | def porter2(word, early_english=False): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 742 |  |  |     """Return the Porter2 (Snowball English) stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 743 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 744 |  |  |     The Porter2 (Snowball English) stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 745 |  |  |     http://snowball.tartarus.org/algorithms/english/stemmer.html | 
            
                                                                                                            
                            
            
                                    
            
            
                | 746 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 747 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 748 |  |  |     :param early_english: set to True in order to remove -eth & -est (2nd & 3rd | 
            
                                                                                                            
                            
            
                                    
            
            
                | 749 |  |  |         person singular verbal agreement suffixes) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 750 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 751 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 752 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 753 |  |  |     >>> porter2('reading') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 754 |  |  |     'read' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 755 |  |  |     >>> porter2('suspension') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 756 |  |  |     'suspens' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 757 |  |  |     >>> porter2('elusiveness') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 758 |  |  |     'elus' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 759 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 760 |  |  |     >>> porter2('eateth', early_english=True) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 761 |  |  |     'eat' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 762 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 763 |  |  |     # pylint: disable=too-many-branches | 
            
                                                                                                            
                            
            
                                    
            
            
                | 764 |  |  |     # pylint: disable=too-many-return-statements | 
            
                                                                                                            
                            
            
                                    
            
            
                | 765 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 766 |  |  |     _vowels = {'a', 'e', 'i', 'o', 'u', 'y'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 767 |  |  |     _codanonvowels = {"'", 'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 768 |  |  |                       'n', 'p', 'q', 'r', 's', 't', 'v', 'z'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 769 |  |  |     _doubles = {'bb', 'dd', 'ff', 'gg', 'mm', 'nn', 'pp', 'rr', 'tt'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 770 |  |  |     _li = {'c', 'd', 'e', 'g', 'h', 'k', 'm', 'n', 'r', 't'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 771 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 772 |  |  |     # R1 prefixes should be in order from longest to shortest to prevent | 
            
                                                                                                            
                            
            
                                    
            
            
                | 773 |  |  |     # masking | 
            
                                                                                                            
                            
            
                                    
            
            
                | 774 |  |  |     _r1_prefixes = ('commun', 'gener', 'arsen') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 775 |  |  |     _exception1dict = {  # special changes: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 776 |  |  |         'skis': 'ski', 'skies': 'sky', 'dying': 'die', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 777 |  |  |         'lying': 'lie', 'tying': 'tie', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 778 |  |  |         # special -LY cases: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 779 |  |  |         'idly': 'idl', 'gently': 'gentl', 'ugly': 'ugli', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 780 |  |  |         'early': 'earli', 'only': 'onli', 'singly': 'singl'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 781 |  |  |     _exception1set = {'sky', 'news', 'howe', 'atlas', 'cosmos', 'bias', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 782 |  |  |                       'andes'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 783 |  |  |     _exception2set = {'inning', 'outing', 'canning', 'herring', 'earring', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 784 |  |  |                       'proceed', 'exceed', 'succeed'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 785 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 786 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 787 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 788 |  |  |     # replace apostrophe-like characters with U+0027, per | 
            
                                                                                                            
                            
            
                                    
            
            
                | 789 |  |  |     # http://snowball.tartarus.org/texts/apostrophe.html | 
            
                                                                                                            
                            
            
                                    
            
            
                | 790 |  |  |     word = word.replace('’', '\'') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 791 |  |  |     word = word.replace('’', '\'') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 792 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 793 |  |  |     # Exceptions 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 794 |  |  |     if word in _exception1dict: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 795 |  |  |         return _exception1dict[word] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 796 |  |  |     elif word in _exception1set: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 797 |  |  |         return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 798 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 799 |  |  |     # Return word if stem is shorter than 3 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 800 |  |  |     if len(word) < 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 801 |  |  |         return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 802 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 803 |  |  |     # Remove initial ', if present. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 804 |  |  |     while word and word[0] == '\'': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 805 |  |  |         word = word[1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 806 |  |  |         # Return word if stem is shorter than 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 807 |  |  |         if len(word) < 2: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 808 |  |  |             return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 809 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 810 |  |  |     # Re-map vocalic Y to y (Y will be C, y will be V) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 811 |  |  |     if word[0] == 'y': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 812 |  |  |         word = 'Y' + word[1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 813 |  |  |     for i in range(1, len(word)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 814 |  |  |         if word[i] == 'y' and word[i-1] in _vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 815 |  |  |             word = word[:i] + 'Y' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 816 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 817 |  |  |     r1_start = _sb_r1(word, _vowels, _r1_prefixes) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 818 |  |  |     r2_start = _sb_r2(word, _vowels, _r1_prefixes) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 819 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 820 |  |  |     # Step 0 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 821 |  |  |     if word[-3:] == '\'s\'': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 822 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 823 |  |  |     elif word[-2:] == '\'s': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 824 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 825 |  |  |     elif word[-1:] == '\'': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 826 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 827 |  |  |     # Return word if stem is shorter than 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 828 |  |  |     if len(word) < 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 829 |  |  |         return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 830 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 831 |  |  |     # Step 1a | 
            
                                                                                                            
                            
            
                                    
            
            
                | 832 |  |  |     if word[-4:] == 'sses': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 833 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 834 |  |  |     elif word[-3:] in {'ied', 'ies'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 835 |  |  |         if len(word) > 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 836 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 837 |  |  |         else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 838 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 839 |  |  |     elif word[-2:] in {'us', 'ss'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 840 |  |  |         pass | 
            
                                                                                                            
                            
            
                                    
            
            
                | 841 |  |  |     elif word[-1] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 842 |  |  |         if _sb_has_vowel(word[:-2], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 843 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 844 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 845 |  |  |     # Exceptions 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 846 |  |  |     if word in _exception2set: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 847 |  |  |         return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 848 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 849 |  |  |     # Step 1b | 
            
                                                                                                            
                            
            
                                    
            
            
                | 850 |  |  |     step1b_flag = False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 851 |  |  |     if word[-5:] == 'eedly': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 852 |  |  |         if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 853 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 854 |  |  |     elif word[-5:] == 'ingly': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 855 |  |  |         if _sb_has_vowel(word[:-5], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 856 |  |  |             word = word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 857 |  |  |             step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 858 |  |  |     elif word[-4:] == 'edly': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 859 |  |  |         if _sb_has_vowel(word[:-4], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 860 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 861 |  |  |             step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 862 |  |  |     elif word[-3:] == 'eed': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 863 |  |  |         if len(word[r1_start:]) >= 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 864 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 865 |  |  |     elif word[-3:] == 'ing': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 866 |  |  |         if _sb_has_vowel(word[:-3], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 867 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 868 |  |  |             step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 869 |  |  |     elif word[-2:] == 'ed': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 870 |  |  |         if _sb_has_vowel(word[:-2], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 871 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 872 |  |  |             step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 873 |  |  |     elif early_english: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 874 |  |  |         if word[-3:] == 'est': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 875 |  |  |             if _sb_has_vowel(word[:-3], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 876 |  |  |                 word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 877 |  |  |                 step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 878 |  |  |         elif word[-3:] == 'eth': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 879 |  |  |             if _sb_has_vowel(word[:-3], _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 880 |  |  |                 word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 881 |  |  |                 step1b_flag = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 882 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 883 |  |  |     if step1b_flag: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 884 |  |  |         if word[-2:] in {'at', 'bl', 'iz'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 885 |  |  |             word += 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 886 |  |  |         elif word[-2:] in _doubles: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 887 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 888 |  |  |         elif _sb_short_word(word, _vowels, _codanonvowels, _r1_prefixes): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 889 |  |  |             word += 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 890 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 891 |  |  |     # Step 1c | 
            
                                                                                                            
                            
            
                                    
            
            
                | 892 |  |  |     if ((len(word) > 2 and word[-1] in {'Y', 'y'} and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 893 |  |  |          word[-2] not in _vowels)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 894 |  |  |         word = word[:-1] + 'i' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 895 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 896 |  |  |     # Step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 897 |  |  |     if word[-2] == 'a': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 898 |  |  |         if word[-7:] == 'ational': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 899 |  |  |             if len(word[r1_start:]) >= 7: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 900 |  |  |                 word = word[:-5] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 901 |  |  |         elif word[-6:] == 'tional': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 902 |  |  |             if len(word[r1_start:]) >= 6: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 903 |  |  |                 word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 904 |  |  |     elif word[-2] == 'c': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 905 |  |  |         if word[-4:] in {'enci', 'anci'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 906 |  |  |             if len(word[r1_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 907 |  |  |                 word = word[:-1] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 908 |  |  |     elif word[-2] == 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 909 |  |  |         if word[-4:] == 'izer': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 910 |  |  |             if len(word[r1_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 911 |  |  |                 word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 912 |  |  |     elif word[-2] == 'g': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 913 |  |  |         if word[-3:] == 'ogi': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 914 |  |  |             if ((r1_start >= 1 and len(word[r1_start:]) >= 3 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 915 |  |  |                  word[-4] == 'l')): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 916 |  |  |                 word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 917 |  |  |     elif word[-2] == 'l': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 918 |  |  |         if word[-6:] == 'lessli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 919 |  |  |             if len(word[r1_start:]) >= 6: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 920 |  |  |                 word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 921 |  |  |         elif word[-5:] in {'entli', 'fulli', 'ousli'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 922 |  |  |             if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 923 |  |  |                 word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 924 |  |  |         elif word[-4:] == 'abli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 925 |  |  |             if len(word[r1_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 926 |  |  |                 word = word[:-1] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 927 |  |  |         elif word[-4:] == 'alli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 928 |  |  |             if len(word[r1_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 929 |  |  |                 word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 930 |  |  |         elif word[-3:] == 'bli': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 931 |  |  |             if len(word[r1_start:]) >= 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 932 |  |  |                 word = word[:-1] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 933 |  |  |         elif word[-2:] == 'li': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 934 |  |  |             if ((r1_start >= 1 and len(word[r1_start:]) >= 2 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 935 |  |  |                  word[-3] in _li)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 936 |  |  |                 word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 937 |  |  |     elif word[-2] == 'o': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 938 |  |  |         if word[-7:] == 'ization': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 939 |  |  |             if len(word[r1_start:]) >= 7: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 940 |  |  |                 word = word[:-5] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 941 |  |  |         elif word[-5:] == 'ation': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 942 |  |  |             if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 943 |  |  |                 word = word[:-3] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 944 |  |  |         elif word[-4:] == 'ator': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 945 |  |  |             if len(word[r1_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 946 |  |  |                 word = word[:-2] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 947 |  |  |     elif word[-2] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 948 |  |  |         if word[-7:] in {'fulness', 'ousness', 'iveness'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 949 |  |  |             if len(word[r1_start:]) >= 7: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 950 |  |  |                 word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 951 |  |  |         elif word[-5:] == 'alism': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 952 |  |  |             if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 953 |  |  |                 word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 954 |  |  |     elif word[-2] == 't': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 955 |  |  |         if word[-6:] == 'biliti': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 956 |  |  |             if len(word[r1_start:]) >= 6: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 957 |  |  |                 word = word[:-5] + 'le' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 958 |  |  |         elif word[-5:] == 'aliti': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 959 |  |  |             if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 960 |  |  |                 word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 961 |  |  |         elif word[-5:] == 'iviti': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 962 |  |  |             if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 963 |  |  |                 word = word[:-3] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 964 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 965 |  |  |     # Step 3 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 966 |  |  |     if word[-7:] == 'ational': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 967 |  |  |         if len(word[r1_start:]) >= 7: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 968 |  |  |             word = word[:-5] + 'e' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 969 |  |  |     elif word[-6:] == 'tional': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 970 |  |  |         if len(word[r1_start:]) >= 6: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 971 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 972 |  |  |     elif word[-5:] in {'alize', 'icate', 'iciti'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 973 |  |  |         if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 974 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 975 |  |  |     elif word[-5:] == 'ative': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 976 |  |  |         if len(word[r2_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 977 |  |  |             word = word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 978 |  |  |     elif word[-4:] == 'ical': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 979 |  |  |         if len(word[r1_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 980 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 981 |  |  |     elif word[-4:] == 'ness': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 982 |  |  |         if len(word[r1_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 983 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 984 |  |  |     elif word[-3:] == 'ful': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 985 |  |  |         if len(word[r1_start:]) >= 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 986 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 987 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 988 |  |  |     # Step 4 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 989 |  |  |     for suffix in ('ement', 'ance', 'ence', 'able', 'ible', 'ment', 'ant', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 990 |  |  |                    'ent', 'ism', 'ate', 'iti', 'ous', 'ive', 'ize', 'al', 'er', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 991 |  |  |                    'ic'): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 992 |  |  |         if word[-len(suffix):] == suffix: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 993 |  |  |             if len(word[r2_start:]) >= len(suffix): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 994 |  |  |                 word = word[:-len(suffix)] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 995 |  |  |             break | 
            
                                                                                                            
                            
            
                                    
            
            
                | 996 |  |  |     else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 997 |  |  |         if word[-3:] == 'ion': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 998 |  |  |             if ((len(word[r2_start:]) >= 3 and len(word) >= 4 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 999 |  |  |                  word[-4] in tuple('st'))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1000 |  |  |                 word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1001 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1002 |  |  |     # Step 5 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1003 |  |  |     if word[-1] == 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1004 |  |  |         if (len(word[r2_start:]) >= 1 or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1005 |  |  |                 (len(word[r1_start:]) >= 1 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1006 |  |  |                  not _sb_ends_in_short_syllable(word[:-1], _vowels, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1007 |  |  |                                                 _codanonvowels))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1008 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1009 |  |  |     elif word[-1] == 'l': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1010 |  |  |         if len(word[r2_start:]) >= 1 and word[-2] == 'l': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1011 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1012 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1013 |  |  |     # Change 'Y' back to 'y' if it survived stemming | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1014 |  |  |     for i in range(0, len(word)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1015 |  |  |         if word[i] == 'Y': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1016 |  |  |             word = word[:i] + 'y' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1017 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1018 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1019 |  |  |  | 
            
                                                                                                            
                                                                
            
                                    
            
            
                | 1020 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1021 |  |  | def sb_german(word, alternate_vowels=False): | 
            
                                                                        
                            
            
                                    
            
            
                | 1022 |  |  |     """Return Snowball German stem. | 
            
                                                                        
                            
            
                                    
            
            
                | 1023 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1024 |  |  |     The Snowball German stemmer is defined at: | 
            
                                                                        
                            
            
                                    
            
            
                | 1025 |  |  |     http://snowball.tartarus.org/algorithms/german/stemmer.html | 
            
                                                                        
                            
            
                                    
            
            
                | 1026 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1027 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                        
                            
            
                                    
            
            
                | 1028 |  |  |     :param alternate_vowels: composes ae as ä, oe as ö, and ue as ü before | 
            
                                                                        
                            
            
                                    
            
            
                | 1029 |  |  |         running the algorithm | 
            
                                                                        
                            
            
                                    
            
            
                | 1030 |  |  |     :returns: word stem | 
            
                                                                        
                            
            
                                    
            
            
                | 1031 |  |  |     :rtype: str | 
            
                                                                        
                            
            
                                    
            
            
                | 1032 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1033 |  |  |     >>> sb_german('lesen') | 
            
                                                                        
                            
            
                                    
            
            
                | 1034 |  |  |     'les' | 
            
                                                                        
                            
            
                                    
            
            
                | 1035 |  |  |     >>> sb_german('graues') | 
            
                                                                        
                            
            
                                    
            
            
                | 1036 |  |  |     'grau' | 
            
                                                                        
                            
            
                                    
            
            
                | 1037 |  |  |     >>> sb_german('buchstabieren') | 
            
                                                                        
                            
            
                                    
            
            
                | 1038 |  |  |     'buchstabi' | 
            
                                                                        
                            
            
                                    
            
            
                | 1039 |  |  |     """ | 
            
                                                                        
                            
            
                                    
            
            
                | 1040 |  |  |     # pylint: disable=too-many-branches | 
            
                                                                        
                            
            
                                    
            
            
                | 1041 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1042 |  |  |     _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'ä', 'ö', 'ü'} | 
            
                                                                        
                            
            
                                    
            
            
                | 1043 |  |  |     _s_endings = {'b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 'r', 't'} | 
            
                                                                        
                            
            
                                    
            
            
                | 1044 |  |  |     _st_endings = {'b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 't'} | 
            
                                                                        
                            
            
                                    
            
            
                | 1045 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1046 |  |  |     # lowercase, normalize, and compose | 
            
                                                                        
                            
            
                                    
            
            
                | 1047 |  |  |     word = unicodedata.normalize('NFC', word.lower()) | 
            
                                                                        
                            
            
                                    
            
            
                | 1048 |  |  |     word = word.replace('ß', 'ss') | 
            
                                                                        
                            
            
                                    
            
            
                | 1049 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1050 |  |  |     if len(word) > 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1051 |  |  |         for i in range(2, len(word)): | 
            
                                                                        
                            
            
                                    
            
            
                | 1052 |  |  |             if word[i] in _vowels and word[i-2] in _vowels: | 
            
                                                                        
                            
            
                                    
            
            
                | 1053 |  |  |                 if word[i-1] == 'u': | 
            
                                                                        
                            
            
                                    
            
            
                | 1054 |  |  |                     word = word[:i-1] + 'U' + word[i:] | 
            
                                                                        
                            
            
                                    
            
            
                | 1055 |  |  |                 elif word[i-1] == 'y': | 
            
                                                                        
                            
            
                                    
            
            
                | 1056 |  |  |                     word = word[:i-1] + 'Y' + word[i:] | 
            
                                                                        
                            
            
                                    
            
            
                | 1057 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1058 |  |  |     if alternate_vowels: | 
            
                                                                        
                            
            
                                    
            
            
                | 1059 |  |  |         word = word.replace('ae', 'ä') | 
            
                                                                        
                            
            
                                    
            
            
                | 1060 |  |  |         word = word.replace('oe', 'ö') | 
            
                                                                        
                            
            
                                    
            
            
                | 1061 |  |  |         word = word.replace('que', 'Q') | 
            
                                                                        
                            
            
                                    
            
            
                | 1062 |  |  |         word = word.replace('ue', 'ü') | 
            
                                                                        
                            
            
                                    
            
            
                | 1063 |  |  |         word = word.replace('Q', 'que') | 
            
                                                                        
                            
            
                                    
            
            
                | 1064 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1065 |  |  |     r1_start = max(3, _sb_r1(word, _vowels)) | 
            
                                                                        
                            
            
                                    
            
            
                | 1066 |  |  |     r2_start = _sb_r2(word, _vowels) | 
            
                                                                        
                            
            
                                    
            
            
                | 1067 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1068 |  |  |     # Step 1 | 
            
                                                                        
                            
            
                                    
            
            
                | 1069 |  |  |     niss_flag = False | 
            
                                                                        
                            
            
                                    
            
            
                | 1070 |  |  |     if word[-3:] == 'ern': | 
            
                                                                        
                            
            
                                    
            
            
                | 1071 |  |  |         if len(word[r1_start:]) >= 3: | 
            
                                                                        
                            
            
                                    
            
            
                | 1072 |  |  |             word = word[:-3] | 
            
                                                                        
                            
            
                                    
            
            
                | 1073 |  |  |     elif word[-2:] == 'em': | 
            
                                                                        
                            
            
                                    
            
            
                | 1074 |  |  |         if len(word[r1_start:]) >= 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1075 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1076 |  |  |     elif word[-2:] == 'er': | 
            
                                                                        
                            
            
                                    
            
            
                | 1077 |  |  |         if len(word[r1_start:]) >= 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1078 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1079 |  |  |     elif word[-2:] == 'en': | 
            
                                                                        
                            
            
                                    
            
            
                | 1080 |  |  |         if len(word[r1_start:]) >= 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1081 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1082 |  |  |             niss_flag = True | 
            
                                                                        
                            
            
                                    
            
            
                | 1083 |  |  |     elif word[-2:] == 'es': | 
            
                                                                        
                            
            
                                    
            
            
                | 1084 |  |  |         if len(word[r1_start:]) >= 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1085 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1086 |  |  |             niss_flag = True | 
            
                                                                        
                            
            
                                    
            
            
                | 1087 |  |  |     elif word[-1:] == 'e': | 
            
                                                                        
                            
            
                                    
            
            
                | 1088 |  |  |         if len(word[r1_start:]) >= 1: | 
            
                                                                        
                            
            
                                    
            
            
                | 1089 |  |  |             word = word[:-1] | 
            
                                                                        
                            
            
                                    
            
            
                | 1090 |  |  |             niss_flag = True | 
            
                                                                        
                            
            
                                    
            
            
                | 1091 |  |  |     elif word[-1:] == 's': | 
            
                                                                        
                            
            
                                    
            
            
                | 1092 |  |  |         if ((len(word[r1_start:]) >= 1 and len(word) >= 2 and | 
            
                                                                        
                            
            
                                    
            
            
                | 1093 |  |  |              word[-2] in _s_endings)): | 
            
                                                                        
                            
            
                                    
            
            
                | 1094 |  |  |             word = word[:-1] | 
            
                                                                        
                            
            
                                    
            
            
                | 1095 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1096 |  |  |     if niss_flag and word[-4:] == 'niss': | 
            
                                                                        
                            
            
                                    
            
            
                | 1097 |  |  |         word = word[:-1] | 
            
                                                                        
                            
            
                                    
            
            
                | 1098 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1099 |  |  |     # Step 2 | 
            
                                                                        
                            
            
                                    
            
            
                | 1100 |  |  |     if word[-3:] == 'est': | 
            
                                                                        
                            
            
                                    
            
            
                | 1101 |  |  |         if len(word[r1_start:]) >= 3: | 
            
                                                                        
                            
            
                                    
            
            
                | 1102 |  |  |             word = word[:-3] | 
            
                                                                        
                            
            
                                    
            
            
                | 1103 |  |  |     elif word[-2:] == 'en': | 
            
                                                                        
                            
            
                                    
            
            
                | 1104 |  |  |         if len(word[r1_start:]) >= 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1105 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1106 |  |  |     elif word[-2:] == 'er': | 
            
                                                                        
                            
            
                                    
            
            
                | 1107 |  |  |         if len(word[r1_start:]) >= 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1108 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1109 |  |  |     elif word[-2:] == 'st': | 
            
                                                                        
                            
            
                                    
            
            
                | 1110 |  |  |         if ((len(word[r1_start:]) >= 2 and len(word) >= 6 and | 
            
                                                                        
                            
            
                                    
            
            
                | 1111 |  |  |              word[-3] in _st_endings)): | 
            
                                                                        
                            
            
                                    
            
            
                | 1112 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1113 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1114 |  |  |     # Step 3 | 
            
                                                                        
                            
            
                                    
            
            
                | 1115 |  |  |     if word[-4:] == 'isch': | 
            
                                                                        
                            
            
                                    
            
            
                | 1116 |  |  |         if len(word[r2_start:]) >= 4 and word[-5] != 'e': | 
            
                                                                        
                            
            
                                    
            
            
                | 1117 |  |  |             word = word[:-4] | 
            
                                                                        
                            
            
                                    
            
            
                | 1118 |  |  |     elif word[-4:] in {'lich', 'heit'}: | 
            
                                                                        
                            
            
                                    
            
            
                | 1119 |  |  |         if len(word[r2_start:]) >= 4: | 
            
                                                                        
                            
            
                                    
            
            
                | 1120 |  |  |             word = word[:-4] | 
            
                                                                        
                            
            
                                    
            
            
                | 1121 |  |  |             if ((word[-2:] in {'er', 'en'} and | 
            
                                                                        
                            
            
                                    
            
            
                | 1122 |  |  |                  len(word[r1_start:]) >= 2)): | 
            
                                                                        
                            
            
                                    
            
            
                | 1123 |  |  |                 word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1124 |  |  |     elif word[-4:] == 'keit': | 
            
                                                                        
                            
            
                                    
            
            
                | 1125 |  |  |         if len(word[r2_start:]) >= 4: | 
            
                                                                        
                            
            
                                    
            
            
                | 1126 |  |  |             word = word[:-4] | 
            
                                                                        
                            
            
                                    
            
            
                | 1127 |  |  |             if word[-4:] == 'lich' and len(word[r2_start:]) >= 4: | 
            
                                                                        
                            
            
                                    
            
            
                | 1128 |  |  |                 word = word[:-4] | 
            
                                                                        
                            
            
                                    
            
            
                | 1129 |  |  |             elif word[-2:] == 'ig' and len(word[r2_start:]) >= 2: | 
            
                                                                        
                            
            
                                    
            
            
                | 1130 |  |  |                 word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1131 |  |  |     elif word[-3:] in {'end', 'ung'}: | 
            
                                                                        
                            
            
                                    
            
            
                | 1132 |  |  |         if len(word[r2_start:]) >= 3: | 
            
                                                                        
                            
            
                                    
            
            
                | 1133 |  |  |             word = word[:-3] | 
            
                                                                        
                            
            
                                    
            
            
                | 1134 |  |  |             if ((word[-2:] == 'ig' and len(word[r2_start:]) >= 2 and | 
            
                                                                        
                            
            
                                    
            
            
                | 1135 |  |  |                  word[-3] != 'e')): | 
            
                                                                        
                            
            
                                    
            
            
                | 1136 |  |  |                 word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1137 |  |  |     elif word[-2:] in {'ig', 'ik'}: | 
            
                                                                        
                            
            
                                    
            
            
                | 1138 |  |  |         if len(word[r2_start:]) >= 2 and word[-3] != 'e': | 
            
                                                                        
                            
            
                                    
            
            
                | 1139 |  |  |             word = word[:-2] | 
            
                                                                        
                            
            
                                    
            
            
                | 1140 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1141 |  |  |     # Change 'Y' and 'U' back to lowercase if survived stemming | 
            
                                                                        
                            
            
                                    
            
            
                | 1142 |  |  |     for i in range(0, len(word)): | 
            
                                                                        
                            
            
                                    
            
            
                | 1143 |  |  |         if word[i] == 'Y': | 
            
                                                                        
                            
            
                                    
            
            
                | 1144 |  |  |             word = word[:i] + 'y' + word[i+1:] | 
            
                                                                        
                            
            
                                    
            
            
                | 1145 |  |  |         elif word[i] == 'U': | 
            
                                                                        
                            
            
                                    
            
            
                | 1146 |  |  |             word = word[:i] + 'u' + word[i+1:] | 
            
                                                                        
                            
            
                                    
            
            
                | 1147 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1148 |  |  |     # Remove umlauts | 
            
                                                                        
                            
            
                                    
            
            
                | 1149 |  |  |     _umlauts = dict(zip((ord(_) for _ in 'äöü'), 'aou')) | 
            
                                                                        
                            
            
                                    
            
            
                | 1150 |  |  |     word = word.translate(_umlauts) | 
            
                                                                        
                            
            
                                    
            
            
                | 1151 |  |  |  | 
            
                                                                        
                            
            
                                    
            
            
                | 1152 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1153 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1154 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1155 |  |  | def sb_dutch(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1156 |  |  |     """Return Snowball Dutch stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1157 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1158 |  |  |     The Snowball Dutch stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1159 |  |  |     http://snowball.tartarus.org/algorithms/dutch/stemmer.html | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1160 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1161 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1162 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1163 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1164 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1165 |  |  |     >>> sb_dutch('lezen') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1166 |  |  |     'lez' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1167 |  |  |     >>> sb_dutch('opschorting') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1168 |  |  |     'opschort' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1169 |  |  |     >>> sb_dutch('ongrijpbaarheid') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1170 |  |  |     'ongrijp' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1171 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1172 |  |  |     # pylint: disable=too-many-branches | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1173 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1174 |  |  |     _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'è'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1175 |  |  |     _not_s_endings = {'a', 'e', 'i', 'j', 'o', 'u', 'y', 'è'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1176 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1177 |  |  |     def _undouble(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1178 |  |  |         """Undouble endings -kk, -dd, and -tt.""" | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1179 |  |  |         if ((len(word) > 1 and word[-1] == word[-2] and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1180 |  |  |              word[-1] in {'d', 'k', 't'})): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1181 |  |  |             return word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1182 |  |  |         return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1183 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1184 |  |  |     # lowercase, normalize, decompose, filter umlauts & acutes out, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1185 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1186 |  |  |     _accented = dict(zip((ord(_) for _ in 'äëïöüáéíóú'), 'aeiouaeiou')) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1187 |  |  |     word = word.translate(_accented) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1188 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1189 |  |  |     for i in range(len(word)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1190 |  |  |         if i == 0 and word[0] == 'y': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1191 |  |  |             word = 'Y' + word[1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1192 |  |  |         elif word[i] == 'y' and word[i-1] in _vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1193 |  |  |             word = word[:i] + 'Y' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1194 |  |  |         elif (word[i] == 'i' and word[i-1] in _vowels and i+1 < len(word) and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1195 |  |  |               word[i+1] in _vowels): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1196 |  |  |             word = word[:i] + 'I' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1197 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1198 |  |  |     r1_start = max(3, _sb_r1(word, _vowels)) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1199 |  |  |     r2_start = _sb_r2(word, _vowels) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1200 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1201 |  |  |     # Step 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1202 |  |  |     if word[-5:] == 'heden': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1203 |  |  |         if len(word[r1_start:]) >= 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1204 |  |  |             word = word[:-3] + 'id' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1205 |  |  |     elif word[-3:] == 'ene': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1206 |  |  |         if ((len(word[r1_start:]) >= 3 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1207 |  |  |              (word[-4] not in _vowels and word[-6:-3] != 'gem'))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1208 |  |  |             word = _undouble(word[:-3]) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1209 |  |  |     elif word[-2:] == 'en': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1210 |  |  |         if ((len(word[r1_start:]) >= 2 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1211 |  |  |              (word[-3] not in _vowels and word[-5:-2] != 'gem'))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1212 |  |  |             word = _undouble(word[:-2]) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1213 |  |  |     elif word[-2:] == 'se': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1214 |  |  |         if len(word[r1_start:]) >= 2 and word[-3] not in _not_s_endings: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1215 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1216 |  |  |     elif word[-1:] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1217 |  |  |         if len(word[r1_start:]) >= 1 and word[-2] not in _not_s_endings: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1218 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1219 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1220 |  |  |     # Step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1221 |  |  |     e_removed = False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1222 |  |  |     if word[-1:] == 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1223 |  |  |         if len(word[r1_start:]) >= 1 and word[-2] not in _vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1224 |  |  |             word = _undouble(word[:-1]) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1225 |  |  |             e_removed = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1226 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1227 |  |  |     # Step 3a | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1228 |  |  |     if word[-4:] == 'heid': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1229 |  |  |         if len(word[r2_start:]) >= 4 and word[-5] != 'c': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1230 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1231 |  |  |             if word[-2:] == 'en': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1232 |  |  |                 if ((len(word[r1_start:]) >= 2 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1233 |  |  |                      (word[-3] not in _vowels and word[-5:-2] != 'gem'))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1234 |  |  |                     word = _undouble(word[:-2]) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1235 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1236 |  |  |     # Step 3b | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1237 |  |  |     if word[-4:] == 'lijk': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1238 |  |  |         if len(word[r2_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1239 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1240 |  |  |             # Repeat step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1241 |  |  |             if word[-1:] == 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1242 |  |  |                 if len(word[r1_start:]) >= 1 and word[-2] not in _vowels: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1243 |  |  |                     word = _undouble(word[:-1]) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1244 |  |  |     elif word[-4:] == 'baar': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1245 |  |  |         if len(word[r2_start:]) >= 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1246 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1247 |  |  |     elif word[-3:] in ('end', 'ing'): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1248 |  |  |         if len(word[r2_start:]) >= 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1249 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1250 |  |  |             if ((word[-2:] == 'ig' and len(word[r2_start:]) >= 2 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1251 |  |  |                  word[-3] != 'e')): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1252 |  |  |                 word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1253 |  |  |             else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1254 |  |  |                 word = _undouble(word) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1255 |  |  |     elif word[-3:] == 'bar': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1256 |  |  |         if len(word[r2_start:]) >= 3 and e_removed: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1257 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1258 |  |  |     elif word[-2:] == 'ig': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1259 |  |  |         if len(word[r2_start:]) >= 2 and word[-3] != 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1260 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1261 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1262 |  |  |     # Step 4 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1263 |  |  |     if ((len(word) >= 4 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1264 |  |  |          word[-3] == word[-2] and word[-2] in {'a', 'e', 'o', 'u'} and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1265 |  |  |          word[-4] not in _vowels and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1266 |  |  |          word[-1] not in _vowels and word[-1] != 'I')): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1267 |  |  |         word = word[:-2] + word[-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1268 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1269 |  |  |     # Change 'Y' and 'U' back to lowercase if survived stemming | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1270 |  |  |     for i in range(0, len(word)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1271 |  |  |         if word[i] == 'Y': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1272 |  |  |             word = word[:i] + 'y' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1273 |  |  |         elif word[i] == 'I': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1274 |  |  |             word = word[:i] + 'i' + word[i+1:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1275 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1276 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1277 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1278 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1279 |  |  | def sb_norwegian(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1280 |  |  |     """Return Snowball Norwegian stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1281 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1282 |  |  |     The Snowball Norwegian stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1283 |  |  |     http://snowball.tartarus.org/algorithms/norwegian/stemmer.html | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1284 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1285 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1286 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1287 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1288 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1289 |  |  |     >>> sb_norwegian('lese') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1290 |  |  |     'les' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1291 |  |  |     >>> sb_norwegian('suspensjon') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1292 |  |  |     'suspensjon' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1293 |  |  |     >>> sb_norwegian('sikkerhet') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1294 |  |  |     'sikker' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1295 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1296 |  |  |     _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'å', 'æ', 'ø'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1297 |  |  |     _s_endings = {'b', 'c', 'd', 'f', 'g', 'h', 'j', 'l', 'm', 'n', 'o', 'p', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1298 |  |  |                   'r', 't', 'v', 'y', 'z'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1299 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1300 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1301 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1302 |  |  |     r1_start = min(max(3, _sb_r1(word, _vowels)), len(word)) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1303 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1304 |  |  |     # Step 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1305 |  |  |     _r1 = word[r1_start:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1306 |  |  |     if _r1[-7:] == 'hetenes': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1307 |  |  |         word = word[:-7] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1308 |  |  |     elif _r1[-6:] in {'hetene', 'hetens'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1309 |  |  |         word = word[:-6] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1310 |  |  |     elif _r1[-5:] in {'heten', 'heter', 'endes'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1311 |  |  |         word = word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1312 |  |  |     elif _r1[-4:] in {'ande', 'ende', 'edes', 'enes', 'erte'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1313 |  |  |         if word[-4:] == 'erte': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1314 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1315 |  |  |         else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1316 |  |  |             word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1317 |  |  |     elif _r1[-3:] in {'ede', 'ane', 'ene', 'ens', 'ers', 'ets', 'het', 'ast', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1318 |  |  |                       'ert'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1319 |  |  |         if word[-3:] == 'ert': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1320 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1321 |  |  |         else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1322 |  |  |             word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1323 |  |  |     elif _r1[-2:] in {'en', 'ar', 'er', 'as', 'es', 'et'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1324 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1325 |  |  |     elif _r1[-1:] in {'a', 'e'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1326 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1327 |  |  |     elif _r1[-1:] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1328 |  |  |         if (((len(word) > 1 and word[-2] in _s_endings) or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1329 |  |  |              (len(word) > 2 and word[-2] == 'k' and word[-3] not in _vowels))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1330 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1331 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1332 |  |  |     # Step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1333 |  |  |     if word[r1_start:][-2:] in {'dt', 'vt'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1334 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1335 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1336 |  |  |     # Step 3 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1337 |  |  |     _r1 = word[r1_start:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1338 |  |  |     if _r1[-7:] == 'hetslov': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1339 |  |  |         word = word[:-7] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1340 |  |  |     elif _r1[-4:] in {'eleg', 'elig', 'elov', 'slov'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1341 |  |  |         word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1342 |  |  |     elif _r1[-3:] in {'leg', 'eig', 'lig', 'els', 'lov'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1343 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1344 |  |  |     elif _r1[-2:] == 'ig': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1345 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1346 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1347 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1348 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1349 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1350 |  |  | def sb_swedish(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1351 |  |  |     """Return Snowball Swedish stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1352 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1353 |  |  |     The Snowball Swedish stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1354 |  |  |     http://snowball.tartarus.org/algorithms/swedish/stemmer.html | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1355 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1356 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1357 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1358 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1359 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1360 |  |  |     >>> sb_swedish('undervisa') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1361 |  |  |     'undervis' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1362 |  |  |     >>> sb_swedish('suspension') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1363 |  |  |     'suspension' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1364 |  |  |     >>> sb_swedish('visshet') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1365 |  |  |     'viss' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1366 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1367 |  |  |     _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'ä', 'å', 'ö'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1368 |  |  |     _s_endings =  {'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1369 |  |  |                    'o', 'p', 'r', 't', 'v', 'y'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1370 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1371 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1372 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1373 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1374 |  |  |     r1_start = min(max(3, _sb_r1(word, _vowels)), len(word)) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1375 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1376 |  |  |     # Step 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1377 |  |  |     _r1 = word[r1_start:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1378 |  |  |     if _r1[-7:] == 'heterna': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1379 |  |  |         word = word[:-7] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1380 |  |  |     elif _r1[-6:] == 'hetens': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1381 |  |  |         word = word[:-6] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1382 |  |  |     elif _r1[-5:] in {'anden', 'heten', 'heter', 'arnas', 'ernas', 'ornas', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1383 |  |  |                       'andes', 'arens', 'andet'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1384 |  |  |         word = word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1385 |  |  |     elif _r1[-4:] in {'arna', 'erna', 'orna', 'ande', 'arne', 'aste', 'aren', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1386 |  |  |                       'ades', 'erns']): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1387 |  |  |         word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1388 |  |  |     elif _r1[-3:] in {'ade', 'are', 'ern', 'ens', 'het', 'ast'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1389 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1390 |  |  |     elif _r1[-2:] in {'ad', 'en', 'ar', 'er', 'or', 'as', 'es', 'at'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1391 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1392 |  |  |     elif _r1[-1:] in {'a', 'e'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1393 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1394 |  |  |     elif _r1[-1:] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1395 |  |  |         if len(word) > 1 and word[-2] in _s_endings: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1396 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1397 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1398 |  |  |     # Step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1399 |  |  |     if word[r1_start:][-2:] in {'dd', 'gd', 'nn', 'dt', 'gt', 'kt', 'tt'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1400 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1401 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1402 |  |  |     # Step 3 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1403 |  |  |     _r1 = word[r1_start:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1404 |  |  |     if _r1[-5:] == 'fullt': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1405 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1406 |  |  |     elif _r1[-4:] == 'löst': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1407 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1408 |  |  |     elif _r1[-3:] in {'lig', 'els'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1409 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1410 |  |  |     elif _r1[-2:] == 'ig': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1411 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1412 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1413 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1414 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1415 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1416 |  |  | def sb_danish(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1417 |  |  |     """Return Snowball Danish stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1418 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1419 |  |  |     The Snowball Danish stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1420 |  |  |     http://snowball.tartarus.org/algorithms/danish/stemmer.html | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1421 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1422 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1423 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1424 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1425 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1426 |  |  |     >>> sb_danish('underviser') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1427 |  |  |     'undervis' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1428 |  |  |     >>> sb_danish('suspension') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1429 |  |  |     'suspension' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1430 |  |  |     >>> sb_danish('sikkerhed') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1431 |  |  |     'sikker' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1432 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1433 |  |  |     _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'å', 'æ', 'ø'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1434 |  |  |     _s_endings =  {'a', 'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1435 |  |  |                    'o', 'p', 'r', 't', 'v', 'y', 'z', 'å'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1436 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1437 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1438 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1439 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1440 |  |  |     r1_start = min(max(3, _sb_r1(word, _vowels)), len(word)) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1441 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1442 |  |  |     # Step 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1443 |  |  |     _r1 = word[r1_start:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1444 |  |  |     if _r1[-7:] == 'erendes': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1445 |  |  |         word = word[:-7] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1446 |  |  |     elif _r1[-6:] in {'erende', 'hedens'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1447 |  |  |         word = word[:-6] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1448 |  |  |     elif _r1[-5:] in {'ethed', 'erede', 'heden', 'heder', 'endes', 'ernes', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1449 |  |  |                       'erens', 'erets'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1450 |  |  |         word = word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1451 |  |  |     elif _r1[-4:] in {'ered', 'ende', 'erne', 'eren', 'erer', 'heds', 'enes', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1452 |  |  |                       'eres', 'eret'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1453 |  |  |         word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1454 |  |  |     elif _r1[-3:] in {'hed', 'ene', 'ere', 'ens', 'ers', 'ets'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1455 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1456 |  |  |     elif _r1[-2:] in {'en', 'er', 'es', 'et'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1457 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1458 |  |  |     elif _r1[-1:] == 'e': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1459 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1460 |  |  |     elif _r1[-1:] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1461 |  |  |         if len(word) > 1 and word[-2] in _s_endings: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1462 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1463 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1464 |  |  |     # Step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1465 |  |  |     if word[r1_start:][-2:] in {'gd', 'dt', 'gt', 'kt'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1466 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1467 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1468 |  |  |     # Step 3 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1469 |  |  |     if word[-4:] == 'igst': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1470 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1471 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1472 |  |  |     _r1 = word[r1_start:] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1473 |  |  |     repeat_step2 = False | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1474 |  |  |     if _r1[-4:] == 'elig': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1475 |  |  |         word = word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1476 |  |  |         repeat_step2 = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1477 |  |  |     elif _r1[-4:] == 'løst': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1478 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1479 |  |  |     elif _r1[-3:] in {'lig', 'els'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1480 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1481 |  |  |         repeat_step2 = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1482 |  |  |     elif _r1[-2:] == 'ig': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1483 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1484 |  |  |         repeat_step2 = True | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1485 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1486 |  |  |     if repeat_step2: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1487 |  |  |         if word[r1_start:][-2:] in {'gd', 'dt', 'gt', 'kt'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1488 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1489 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1490 |  |  |     # Step 4 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1491 |  |  |     if ((len(word[r1_start:]) >= 1 and len(word) >= 2 and | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1492 |  |  |          word[-1] == word[-2] and word[-1] not in _vowels)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1493 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1494 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1495 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1496 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1497 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1498 |  |  | def clef_german(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1499 |  |  |     """Return CLEF German stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1500 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1501 |  |  |     The CLEF German stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1502 |  |  |     http://members.unine.ch/jacques.savoy/clef/germanStemmer.txt | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1503 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1504 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1505 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1506 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1507 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1508 |  |  |     >>> clef_german('lesen') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1509 |  |  |     'lese' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1510 |  |  |     >>> clef_german('graues') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1511 |  |  |     'grau' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1512 |  |  |     >>> clef_german('buchstabieren') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1513 |  |  |     'buchstabier' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1514 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1515 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1516 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1517 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1518 |  |  |     # remove umlauts | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1519 |  |  |     _umlauts = dict(zip((ord(_) for _ in 'äöü'), 'aou')) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1520 |  |  |     word = word.translate(_umlauts) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1521 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1522 |  |  |     # remove plurals | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1523 |  |  |     wlen = len(word)-1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1524 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1525 |  |  |     if wlen > 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1526 |  |  |         if wlen > 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1527 |  |  |             if word[-3:] == 'nen': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1528 |  |  |                 return word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1529 |  |  |         if wlen > 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1530 |  |  |             if word[-2:] in {'en', 'se', 'es', 'er'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1531 |  |  |                 return word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1532 |  |  |         if word[-1] in {'e', 'n', 'r', 's'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1533 |  |  |             return word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1534 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1535 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1536 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1537 |  |  | def clef_german_plus(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1538 |  |  |     """Return 'CLEF German stemmer plus' stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1539 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1540 |  |  |     The CLEF German stemmer plus is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1541 |  |  |     http://members.unine.ch/jacques.savoy/clef/germanStemmerPlus.txt | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1542 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1543 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1544 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1545 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1546 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1547 |  |  |     >>> clef_german_plus('lesen') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1548 |  |  |     'les' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1549 |  |  |     >>> clef_german_plus('graues') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1550 |  |  |     'grau' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1551 |  |  |     >>> clef_german_plus('buchstabieren') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1552 |  |  |     'buchstabi' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1553 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1554 |  |  |     _st_ending = {'b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 't'} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1555 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1556 |  |  |     # lowercase, normalize, and compose | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1557 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1558 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1559 |  |  |     # remove umlauts | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1560 |  |  |     _accents = dict(zip((ord(_) for _ in 'äàáâöòóôïìíîüùúû'), | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1561 |  |  |                         'aaaaooooiiiiuuuu')) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1562 |  |  |     word = word.translate(_accents) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1563 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1564 |  |  |     # Step 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1565 |  |  |     wlen = len(word)-1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1566 |  |  |     if wlen > 4 and word[-3:] == 'ern': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1567 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1568 |  |  |     elif wlen > 3 and word[-2:] in {'em', 'en', 'er', 'es'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1569 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1570 |  |  |     elif wlen > 2 and (word[-1] == 'e' or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1571 |  |  |                        (word[-1] == 's' and word[-2] in _st_ending)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1572 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1573 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1574 |  |  |     # Step 2 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1575 |  |  |     wlen = len(word)-1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1576 |  |  |     if wlen > 4 and word[-3:] == 'est': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1577 |  |  |         word = word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1578 |  |  |     elif wlen > 3 and (word[-2:] in {'er', 'en'} or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1579 |  |  |                        (word[-2:] == 'st' and word[-3] in _st_ending)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1580 |  |  |         word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1581 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1582 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1583 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1584 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1585 |  |  | def clef_swedish(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1586 |  |  |     """Return CLEF Swedish stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1587 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1588 |  |  |     The CLEF Swedish stemmer is defined at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1589 |  |  |     http://members.unine.ch/jacques.savoy/clef/swedishStemmer.txt | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1590 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1591 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1592 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1593 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1594 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1595 |  |  |     >>> clef_swedish('undervisa') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1596 |  |  |     'undervis' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1597 |  |  |     >>> clef_swedish('suspension') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1598 |  |  |     'suspensio' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1599 |  |  |     >>> clef_swedish('visshet') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1600 |  |  |     'viss' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1601 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1602 |  |  |     wlen = len(word)-1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1603 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1604 |  |  |     if wlen > 3 and word[-1] == 's': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1605 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1606 |  |  |         wlen -= 1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1607 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1608 |  |  |     if wlen > 6: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1609 |  |  |         if word[-5:] in {'elser', 'heten'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1610 |  |  |             return word[:-5] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1611 |  |  |     if wlen > 5: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1612 |  |  |         if word[-4:] in {'arne', 'erna', 'ande', 'else', 'aste', 'orna', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1613 |  |  |                          'aren'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1614 |  |  |             return word[:-4] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1615 |  |  |     if wlen > 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1616 |  |  |         if word[-3:] in {'are', 'ast', 'het'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1617 |  |  |             return word[:-3] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1618 |  |  |     if wlen > 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1619 |  |  |         if word[-2:] in {'ar', 'er', 'or', 'en', 'at', 'te', 'et'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1620 |  |  |             return word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1621 |  |  |     if wlen > 2: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1622 |  |  |         if word[-1] in {'a', 'e', 'n', 't'}: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1623 |  |  |             return word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1624 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1625 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1626 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1627 |  |  | def caumanns(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1628 |  |  |     """Return Caumanns German stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1629 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1630 |  |  |     Jörg Caumanns' stemmer is described in his article at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1631 |  |  |     http://edocs.fu-berlin.de/docs/servlets/MCRFileNodeServlet/FUDOCS_derivate_000000000350/tr-b-99-16.pdf | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1632 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1633 |  |  |     This implementation is based on the GermanStemFilter described at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1634 |  |  |     http://www.evelix.ch/unternehmen/Blog/evelix/2013/11/11/inner-workings-of-the-german-analyzer-in-lucene | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1635 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1636 |  |  |     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1637 |  |  |     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1638 |  |  |     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1639 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1640 |  |  |     >>> caumanns('lesen') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1641 |  |  |     'les' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1642 |  |  |     >>> caumanns('graues') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1643 |  |  |     'grau' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1644 |  |  |     >>> caumanns('buchstabieren') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1645 |  |  |     'buchstabier' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1646 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1647 |  |  |     if not word: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1648 |  |  |         return '' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1649 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1650 |  |  |     upper_initial = word[0].isupper() | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1651 |  |  |     word = unicodedata.normalize('NFC', text_type(word.lower())) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1652 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1653 |  |  |     # # Part 2: Substitution | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1654 |  |  |     # 1. Change umlauts to corresponding vowels & ß to ss | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1655 |  |  |     _umlauts = dict(zip((ord(_) for _ in 'äöü'), 'aou')) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1656 |  |  |     word = word.translate(_umlauts) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1657 |  |  |     word = word.replace('ß', 'ss') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1658 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1659 |  |  |     # 2. Change second of doubled characters to * | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1660 |  |  |     newword = word[0] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1661 |  |  |     for i in range(1, len(word)): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1662 |  |  |         if newword[i-1] == word[i]: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1663 |  |  |             newword += '*' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1664 |  |  |         else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1665 |  |  |             newword += word[i] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1666 |  |  |     word = newword | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1667 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1668 |  |  |     # 3. Replace sch, ch, ei, ie with $, §, %, & | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1669 |  |  |     word = word.replace('sch', '$') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1670 |  |  |     word = word.replace('ch', '§') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1671 |  |  |     word = word.replace('ei', '%') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1672 |  |  |     word = word.replace('ie', '&') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1673 |  |  |     word = word.replace('ig', '#') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1674 |  |  |     word = word.replace('st', '!') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1675 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1676 |  |  |     # # Part 1: Recursive Context-Free Stripping | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1677 |  |  |     # 1. Remove the following 7 suffixes recursively | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1678 |  |  |     while len(word) > 3: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1679 |  |  |         if (((len(word) > 4 and word[-2:] in {'em', 'er'}) or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1680 |  |  |              (len(word) > 5 and word[-2:] == 'nd'))): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1681 |  |  |             word = word[:-2] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1682 |  |  |         elif ((word[-1] in {'e', 's', 'n'}) or | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1683 |  |  |               (not upper_initial and word[-1] in {'t', '!'})): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1684 |  |  |             word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1685 |  |  |         else: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1686 |  |  |             break | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1687 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1688 |  |  |     # Additional optimizations: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1689 |  |  |     if len(word) > 5 and word[-5:] == 'erin*': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1690 |  |  |         word = word[:-1] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1691 |  |  |     if word[-1] == 'z': | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1692 |  |  |         word = word[:-1] + 'x' | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1693 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1694 |  |  |     # Reverse substitutions: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1695 |  |  |     word = word.replace('$', 'sch') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1696 |  |  |     word = word.replace('§', 'ch') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1697 |  |  |     word = word.replace('%', 'ei') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1698 |  |  |     word = word.replace('&', 'ie') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1699 |  |  |     word = word.replace('#', 'ig') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1700 |  |  |     word = word.replace('!', 'st') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1701 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1702 |  |  |     # Expand doubled | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1703 |  |  |     word = ''.join([word[0]] + [word[i-1] if word[i] == '*' else word[i] for | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1704 |  |  |                                 i in range(1, len(word))]) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1705 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1706 |  |  |     # Finally, convert gege to ge | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1707 |  |  |     if len(word) > 4: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1708 |  |  |         word = word.replace('gege', 'ge', 1) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1709 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1710 |  |  |     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1711 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1712 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1713 |  |  | # def uealite(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1714 |  |  | #     """Return UEA-Lite stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1715 |  |  | # | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1716 |  |  | #     The UEA-Lite stemmer is defined in Marie-Claire Jenkins and Dan Smith's | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1717 |  |  | #     article at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1718 |  |  | # http://wayback.archive.org/web/20121012154211/http://www.uea.ac.uk/polopoly_fs/1.85493!stemmer25feb.pdf | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1719 |  |  | # | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1720 |  |  | #     :param word: the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1721 |  |  | #     :returns: word stem | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1722 |  |  | #     :rtype: str | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1723 |  |  | #     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1724 |  |  | #     return word | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1725 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1726 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1727 |  |  | def lancaster(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1728 |  |  |     """Return Lancaster stem. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1729 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1730 |  |  |     Implementation of the Lancaster Stemming Algorithm, developed by | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1731 |  |  |     Chris Paice, with the assistance of Gareth Husk | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1732 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1733 |  |  |     Arguments: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1734 |  |  |     word -- the word to calculate the stem of | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1735 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1736 |  |  |     Description: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1737 |  |  |     The Lancaster Stemming Algorithm, described at: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1738 |  |  |     http://wayback.archive.org/web/20140826000545/http://www.comp.lancs.ac.uk/computing/research/stemming/Links/paice.htm | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1739 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1740 |  |  |     Based on the Paice & Husk's original Pascal reference implementation: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1741 |  |  |     http://wayback.archive.org/web/20150104225538/http://www.comp.lancs.ac.uk/computing/research/stemming/Files/Pascal.zip | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1742 |  |  |     """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1743 |  |  |     _lancaster_rules = ('ai*2.', 'a*1.', 'bb1.', 'city3s.', 'ci2>', 'cn1t>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1744 |  |  |                         'dd1.', 'dei3y>', 'deec2ss.', 'dee1.', 'de2>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1745 |  |  |                         'dooh4>', 'e1>', 'feil1v.', 'fi2>', 'gni3>', 'gai3y.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1746 |  |  |                         'ga2>', 'gg1.', 'ht*2.', 'hsiug5ct.', 'hsi3>', 'i*1.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1747 |  |  |                         'i1y>', 'ji1d.', 'juf1s.', 'ju1d.', 'jo1d.', 'jeh1r.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1748 |  |  |                         'jrev1t.', 'jsim2t.', 'jn1d.', 'j1s.', 'lbaifi6.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1749 |  |  |                         'lbai4y.', 'lba3>', 'lbi3.', 'lib2l>', 'lc1.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1750 |  |  |                         'lufi4y.', 'luf3>', 'lu2.', 'lai3>', 'lau3>', 'la2>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1751 |  |  |                         'll1.', 'mui3.', 'mu*2.', 'msi3>', 'mm1.', 'nois4j>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1752 |  |  |                         'noix4ct.', 'noi3>', 'nai3>', 'na2>', 'nee0.', 'ne2>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1753 |  |  |                         'nn1.', 'pihs4>', 'pp1.', 're2>', 'rae0.', 'ra2.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1754 |  |  |                         'ro2>', 'ru2>', 'rr1.', 'rt1>', 'rei3y>', 'sei3y>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1755 |  |  |                         'sis2.', 'si2>', 'ssen4>', 'ss0.', 'suo3>', 'su*2.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1756 |  |  |                         's*1>', 's0.', 'tacilp4y.', 'ta2>', 'tnem4>', 'tne3>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1757 |  |  |                         'tna3>', 'tpir2b.', 'tpro2b.', 'tcud1.', 'tpmus2.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1758 |  |  |                         'tpec2iv.', 'tulo2v.', 'tsis0.', 'tsi3>', 'tt1.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1759 |  |  |                         'uqi3.', 'ugo1.', 'vis3j>', 'vie0.', 'vi2>', 'ylb1>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1760 |  |  |                         'yli3y>', 'ylp0.', 'yl2>', 'ygo1.', 'yhp1.', 'ymo1.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1761 |  |  |                         'ypo1.', 'yti3>', 'yte3>', 'ytl2.', 'yrtsi5.', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1762 |  |  |                         'yra3>', 'yro3>', 'yfi3.', 'ycn2t>', 'yca3>', 'zi2>', | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1763 |  |  |                         'zy1s.') | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1764 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1765 |  |  |     _rule_table = [] | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1766 |  |  |     _rule_index = {'a': -1, 'b': -1, 'c': -1, 'd': -1, 'e': -1, 'f': -1, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1767 |  |  |                    'g': -1, 'h': -1, 'i': -1, 'j': -1, 'k': -1, 'l': -1, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1768 |  |  |                    'm': -1, 'n': -1, 'o': -1, 'p': -1, 'q': -1, 'r': -1, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1769 |  |  |                    's': -1, 't': -1, 'u': -1, 'v': -1, 'w': -1, 'x': -1, | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1770 |  |  |                    'y': -1, 'z': -1} | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1771 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1772 |  |  |     def read_rules(stem_rules=_lancaster_rules): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1773 |  |  |         """Read the rules table. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1774 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1775 |  |  |         read_rules reads in stemming rules from a text file and enter them | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1776 |  |  |         into _rule_table. _rule_index is set up to provide faster access to | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1777 |  |  |         relevant rules. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1778 |  |  |         """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1779 |  |  |         for rule in stem_rules: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1780 |  |  |             _rule_table.append(rule) | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1781 |  |  |             if _rule_index[rule[0]] == -1: | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1782 |  |  |                 _rule_index[rule[0]] = len(_rule_table)-1 | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1783 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1784 |  |  |     def stemmers(word): | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1785 |  |  |         """Reduce a word. | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1786 |  |  |  | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1787 |  |  |         stemmers takes the specified word and reduces it to a set by | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1788 |  |  |         referring to _rule_table | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1789 |  |  |         """ | 
            
                                                                                                            
                            
            
                                    
            
            
                | 1790 |  |  |         # TODO: This looks very incomplete. | 
            
                                                                                                            
                                                                
            
                                    
            
            
                | 1791 |  |  |         return word | 
            
                                                        
            
                                    
            
            
                | 1792 |  |  |  |