Conditions | 14 |
Total Lines | 55 |
Lines | 0 |
Ratio | 0 % |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like SubmissionFile.attachment_md5() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | from django.db import models |
||
63 | def attachment_md5(self): |
||
64 | ''' |
||
65 | Calculate the checksum of the file upload. |
||
66 | For binary files (e.g. PDFs), the MD5 of the file itself is used. |
||
67 | |||
68 | Archives are unpacked and the MD5 is generated from the sanitized textfiles |
||
69 | in the archive. This is done with some smartness: |
||
70 | - Whitespace and tabs are removed before comparison. |
||
71 | - For MD5, ordering is important, so we compute it on the sorted list of |
||
72 | file hashes. |
||
73 | ''' |
||
74 | MAX_MD5_FILE_SIZE = 10000 |
||
75 | md5_set = [] |
||
76 | |||
77 | def md5_add_text(text): |
||
78 | try: |
||
79 | text = str(text, errors='ignore') |
||
80 | text = text.replace(' ', '').replace( |
||
81 | '\n', '').replace('\t', '') |
||
82 | hexvalues = hashlib.md5(text.encode('utf-8')).hexdigest() |
||
83 | md5_set.append(hexvalues) |
||
84 | except Exception as e: |
||
85 | # not unicode decodable |
||
86 | pass |
||
87 | |||
88 | def md5_add_file(f): |
||
89 | try: |
||
90 | md5 = hashlib.md5() |
||
91 | for chunk in f.chunks(): |
||
92 | md5.update(chunk) |
||
93 | md5_set.append(md5.hexdigest()) |
||
94 | except Exception: |
||
95 | pass |
||
96 | |||
97 | try: |
||
98 | if zipfile.is_zipfile(self.attachment.path): |
||
99 | zf = zipfile.ZipFile(self.attachment.path, 'r') |
||
100 | for zipinfo in zf.infolist(): |
||
101 | if zipinfo.file_size < MAX_MD5_FILE_SIZE: |
||
102 | md5_add_text(zf.read(zipinfo)) |
||
103 | elif tarfile.is_tarfile(self.attachment.path): |
||
104 | tf = tarfile.open(self.attachment.path, 'r') |
||
105 | for tarinfo in tf.getmembers(): |
||
106 | if tarinfo.isfile(): |
||
107 | if tarinfo.size < MAX_MD5_FILE_SIZE: |
||
108 | md5_add_text(tf.extractfile(tarinfo).read()) |
||
109 | else: |
||
110 | md5_add_file(self.attachment) |
||
111 | except Exception as e: |
||
112 | logger.warning( |
||
113 | "Exception on archive MD5 computation, using file checksum: " + str(e)) |
||
114 | |||
115 | result = hashlib.md5( |
||
116 | ''.join(sorted(md5_set)).encode('utf-8')).hexdigest() |
||
117 | return result |
||
118 | |||
212 |