PNGReader.execute()   F
last analyzed

Complexity

Conditions 17

Size

Total Lines 64

Duplication

Lines 0
Ratio 0 %

Importance

Changes 2
Bugs 0 Features 2
Metric Value
cc 17
c 2
b 0
f 2
dl 0
loc 64
rs 2.8365

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like PNGReader.execute() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""Package to convert PNG to TXT
2
3
.. Authors:
4
    Philippe Dessauw
5
    [email protected]
6
7
.. Sponsor:
8
    Alden Dima
9
    [email protected]
10
    Information Systems Group
11
    Software and Systems Division
12
    Information Technology Laboratory
13
    National Institute of Standards and Technology
14
    http://www.nist.gov/itl/ssd/is
15
"""
16
from os.path import join, isdir, split, dirname, basename, splitext, isfile
17
from subprocess import check_output, STDOUT
18
from os import listdir
19
from shutil import move
20
from pipeline.command import Command
21
22
23
class PNGReader(Command):
24
    """Command to convert PNG to TXT
25
    """
26
27
    def __init__(self, filename, logger, config):
28
        super(PNGReader, self).__init__(filename, logger, config)
29
30
        self.proc_count = 1
31
        self.ocropus_dir = self.config["command"]["ocropy"]["location"]
32
        self.rpred_model = self.config["command"]["ocropy"]["model"]
33
        self.python = ["python"]
34
35
        self.logger.info("PNG reader initialized")
36
37
    def execute(self):
38
        """Execute the command
39
        """
40
        self.logger.debug("::: PNG reading :::")
41
        # super(PNGReader, self).get_file()
42
43
        procs = str(self.proc_count)
44
45
        png_dir = join(self.unzipped, "png")
46
        txt_dir = join(self.unzipped, "txt")
47
48
        command_list = [
49
            [join(self.ocropus_dir, 'ocropus-nlbin'), "-Q", procs, join(png_dir, '*.png')],
50
            [join(self.ocropus_dir, 'ocropus-gpageseg'), "-Q", procs, join(png_dir, '*.bin.png')],
51
            [join(self.ocropus_dir, 'ocropus-rpred'), "-Q", procs, "-m", self.rpred_model,
52
             join(png_dir, '*/*.bin.png')],
53
        ]
54
55
        # Execute the list of command
56
        for command in command_list:
57
            try:
58
                self.logger.debug("> "+str(command))
59
60
                cmdout = check_output(self.python+command, stderr=STDOUT)
61
                self.logger.info(cmdout)
62
            except Exception, e:
63
                print e
64
                self.logger.fatal("An exception has been caugth: "+str(e.message))
65
                self.finalize()
66
                return 1
67
68
        # Build the resulting text file from every line file
69
        txt_files = [join(png_dir, subdir, f) for subdir in listdir(png_dir) if isdir(join(png_dir, subdir))
70
                     for f in listdir(join(png_dir, subdir)) if f.endswith(".txt")]
71
        self.logger.debug(str(len(txt_files)) + " text file(s) found")
72
73
        for f in txt_files:
74
            dirs = split(dirname(f))
75
76
            filename = basename(f)
77
            pagenum = dirs[-1].split("-")[-1]
78
79
            move(f, join(txt_dir, "segments", pagenum+"-"+filename))
80
81
        txt_files = sorted([join(txt_dir, "segments", f) for f in listdir(join(txt_dir, "segments"))
82
                            if f.endswith(".txt")])
83
84
        text = ""
85
        for f in txt_files:
86
            with open(f, "r") as txt:
87
                lines = txt.readlines()
88
89
                for l in lines:
90
                    text += l
91
92
        pdf_files = [join(self.unzipped, f) for f in listdir(self.unzipped) if isfile(join(self.unzipped, f)) and
93
                     f.endswith(".pdf")]
94
        txt_filename = splitext(basename(pdf_files[0]))[0]+".txt"
95
96
        with open(join(txt_dir, txt_filename), "w") as output:
97
            output.write(text)
98
99
        self.finalize()
100
        return 0
101
102
    def finalize(self):
103
        """Finalize the job
104
        """
105
        # super(PNGReader, self).store_file()
106
        self.logger.debug("::: PNG reading (END) :::")
107