| Conditions | 17 |
| Total Lines | 64 |
| Lines | 0 |
| Ratio | 0 % |
| Changes | 2 | ||
| Bugs | 0 | Features | 2 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like PNGReader.execute() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | """Package to convert PNG to TXT |
||
| 37 | def execute(self): |
||
| 38 | """Execute the command |
||
| 39 | """ |
||
| 40 | self.logger.debug("::: PNG reading :::") |
||
| 41 | # super(PNGReader, self).get_file() |
||
| 42 | |||
| 43 | procs = str(self.proc_count) |
||
| 44 | |||
| 45 | png_dir = join(self.unzipped, "png") |
||
| 46 | txt_dir = join(self.unzipped, "txt") |
||
| 47 | |||
| 48 | command_list = [ |
||
| 49 | [join(self.ocropus_dir, 'ocropus-nlbin'), "-Q", procs, join(png_dir, '*.png')], |
||
| 50 | [join(self.ocropus_dir, 'ocropus-gpageseg'), "-Q", procs, join(png_dir, '*.bin.png')], |
||
| 51 | [join(self.ocropus_dir, 'ocropus-rpred'), "-Q", procs, "-m", self.rpred_model, |
||
| 52 | join(png_dir, '*/*.bin.png')], |
||
| 53 | ] |
||
| 54 | |||
| 55 | # Execute the list of command |
||
| 56 | for command in command_list: |
||
| 57 | try: |
||
| 58 | self.logger.debug("> "+str(command)) |
||
| 59 | |||
| 60 | cmdout = check_output(self.python+command, stderr=STDOUT) |
||
| 61 | self.logger.info(cmdout) |
||
| 62 | except Exception, e: |
||
| 63 | print e |
||
| 64 | self.logger.fatal("An exception has been caugth: "+str(e.message)) |
||
| 65 | self.finalize() |
||
| 66 | return 1 |
||
| 67 | |||
| 68 | # Build the resulting text file from every line file |
||
| 69 | txt_files = [join(png_dir, subdir, f) for subdir in listdir(png_dir) if isdir(join(png_dir, subdir)) |
||
| 70 | for f in listdir(join(png_dir, subdir)) if f.endswith(".txt")] |
||
| 71 | self.logger.debug(str(len(txt_files)) + " text file(s) found") |
||
| 72 | |||
| 73 | for f in txt_files: |
||
| 74 | dirs = split(dirname(f)) |
||
| 75 | |||
| 76 | filename = basename(f) |
||
| 77 | pagenum = dirs[-1].split("-")[-1] |
||
| 78 | |||
| 79 | move(f, join(txt_dir, "segments", pagenum+"-"+filename)) |
||
| 80 | |||
| 81 | txt_files = sorted([join(txt_dir, "segments", f) for f in listdir(join(txt_dir, "segments")) |
||
| 82 | if f.endswith(".txt")]) |
||
| 83 | |||
| 84 | text = "" |
||
| 85 | for f in txt_files: |
||
| 86 | with open(f, "r") as txt: |
||
| 87 | lines = txt.readlines() |
||
| 88 | |||
| 89 | for l in lines: |
||
| 90 | text += l |
||
| 91 | |||
| 92 | pdf_files = [join(self.unzipped, f) for f in listdir(self.unzipped) if isfile(join(self.unzipped, f)) and |
||
| 93 | f.endswith(".pdf")] |
||
| 94 | txt_filename = splitext(basename(pdf_files[0]))[0]+".txt" |
||
| 95 | |||
| 96 | with open(join(txt_dir, txt_filename), "w") as output: |
||
| 97 | output.write(text) |
||
| 98 | |||
| 99 | self.finalize() |
||
| 100 | return 0 |
||
| 101 | |||
| 107 |