| Conditions | 14 |
| Total Lines | 55 |
| Lines | 0 |
| Ratio | 0 % |
| Changes | 0 | ||
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like SubmissionFile.attachment_md5() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | from django.db import models |
||
| 63 | def attachment_md5(self): |
||
| 64 | ''' |
||
| 65 | Calculate the checksum of the file upload. |
||
| 66 | For binary files (e.g. PDFs), the MD5 of the file itself is used. |
||
| 67 | |||
| 68 | Archives are unpacked and the MD5 is generated from the sanitized textfiles |
||
| 69 | in the archive. This is done with some smartness: |
||
| 70 | - Whitespace and tabs are removed before comparison. |
||
| 71 | - For MD5, ordering is important, so we compute it on the sorted list of |
||
| 72 | file hashes. |
||
| 73 | ''' |
||
| 74 | MAX_MD5_FILE_SIZE = 10000 |
||
| 75 | md5_set = [] |
||
| 76 | |||
| 77 | def md5_add_text(text): |
||
| 78 | try: |
||
| 79 | text = str(text, errors='ignore') |
||
| 80 | text = text.replace(' ', '').replace( |
||
| 81 | '\n', '').replace('\t', '') |
||
| 82 | hexvalues = hashlib.md5(text.encode('utf-8')).hexdigest() |
||
| 83 | md5_set.append(hexvalues) |
||
| 84 | except Exception as e: |
||
| 85 | # not unicode decodable |
||
| 86 | pass |
||
| 87 | |||
| 88 | def md5_add_file(f): |
||
| 89 | try: |
||
| 90 | md5 = hashlib.md5() |
||
| 91 | for chunk in f.chunks(): |
||
| 92 | md5.update(chunk) |
||
| 93 | md5_set.append(md5.hexdigest()) |
||
| 94 | except Exception: |
||
| 95 | pass |
||
| 96 | |||
| 97 | try: |
||
| 98 | if zipfile.is_zipfile(self.attachment.path): |
||
| 99 | zf = zipfile.ZipFile(self.attachment.path, 'r') |
||
| 100 | for zipinfo in zf.infolist(): |
||
| 101 | if zipinfo.file_size < MAX_MD5_FILE_SIZE: |
||
| 102 | md5_add_text(zf.read(zipinfo)) |
||
| 103 | elif tarfile.is_tarfile(self.attachment.path): |
||
| 104 | tf = tarfile.open(self.attachment.path, 'r') |
||
| 105 | for tarinfo in tf.getmembers(): |
||
| 106 | if tarinfo.isfile(): |
||
| 107 | if tarinfo.size < MAX_MD5_FILE_SIZE: |
||
| 108 | md5_add_text(tf.extractfile(tarinfo).read()) |
||
| 109 | else: |
||
| 110 | md5_add_file(self.attachment) |
||
| 111 | except Exception as e: |
||
| 112 | logger.warning( |
||
| 113 | "Exception on archive MD5 computation, using file checksum: " + str(e)) |
||
| 114 | |||
| 115 | result = hashlib.md5( |
||
| 116 | ''.join(sorted(md5_set)).encode('utf-8')).hexdigest() |
||
| 117 | return result |
||
| 118 | |||
| 212 |