Conditions | 17 |
Total Lines | 64 |
Lines | 0 |
Ratio | 0 % |
Changes | 2 | ||
Bugs | 0 | Features | 2 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like PNGReader.execute() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | """Package to convert PNG to TXT |
||
37 | def execute(self): |
||
38 | """Execute the command |
||
39 | """ |
||
40 | self.logger.debug("::: PNG reading :::") |
||
41 | # super(PNGReader, self).get_file() |
||
42 | |||
43 | procs = str(self.proc_count) |
||
44 | |||
45 | png_dir = join(self.unzipped, "png") |
||
46 | txt_dir = join(self.unzipped, "txt") |
||
47 | |||
48 | command_list = [ |
||
49 | [join(self.ocropus_dir, 'ocropus-nlbin'), "-Q", procs, join(png_dir, '*.png')], |
||
50 | [join(self.ocropus_dir, 'ocropus-gpageseg'), "-Q", procs, join(png_dir, '*.bin.png')], |
||
51 | [join(self.ocropus_dir, 'ocropus-rpred'), "-Q", procs, "-m", self.rpred_model, |
||
52 | join(png_dir, '*/*.bin.png')], |
||
53 | ] |
||
54 | |||
55 | # Execute the list of command |
||
56 | for command in command_list: |
||
57 | try: |
||
58 | self.logger.debug("> "+str(command)) |
||
59 | |||
60 | cmdout = check_output(self.python+command, stderr=STDOUT) |
||
61 | self.logger.info(cmdout) |
||
62 | except Exception, e: |
||
63 | print e |
||
64 | self.logger.fatal("An exception has been caugth: "+str(e.message)) |
||
65 | self.finalize() |
||
66 | return 1 |
||
67 | |||
68 | # Build the resulting text file from every line file |
||
69 | txt_files = [join(png_dir, subdir, f) for subdir in listdir(png_dir) if isdir(join(png_dir, subdir)) |
||
70 | for f in listdir(join(png_dir, subdir)) if f.endswith(".txt")] |
||
71 | self.logger.debug(str(len(txt_files)) + " text file(s) found") |
||
72 | |||
73 | for f in txt_files: |
||
74 | dirs = split(dirname(f)) |
||
75 | |||
76 | filename = basename(f) |
||
77 | pagenum = dirs[-1].split("-")[-1] |
||
78 | |||
79 | move(f, join(txt_dir, "segments", pagenum+"-"+filename)) |
||
80 | |||
81 | txt_files = sorted([join(txt_dir, "segments", f) for f in listdir(join(txt_dir, "segments")) |
||
82 | if f.endswith(".txt")]) |
||
83 | |||
84 | text = "" |
||
85 | for f in txt_files: |
||
86 | with open(f, "r") as txt: |
||
87 | lines = txt.readlines() |
||
88 | |||
89 | for l in lines: |
||
90 | text += l |
||
91 | |||
92 | pdf_files = [join(self.unzipped, f) for f in listdir(self.unzipped) if isfile(join(self.unzipped, f)) and |
||
93 | f.endswith(".pdf")] |
||
94 | txt_filename = splitext(basename(pdf_files[0]))[0]+".txt" |
||
95 | |||
96 | with open(join(txt_dir, txt_filename), "w") as output: |
||
97 | output.write(text) |
||
98 | |||
99 | self.finalize() |
||
100 | return 0 |
||
101 | |||
107 |