build.rna_tools.Seq.RNASequence.predict_ss()   F
last analyzed

Complexity

Conditions 56

Size

Total Lines 368
Code Lines 163

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 56
eloc 163
nop 8
dl 0
loc 368
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity    Many Parameters   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like build.rna_tools.Seq.RNASequence.predict_ss() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

Many Parameters

Methods with many parameters are not only hard to understand, but their parameters also often become inconsistent when you need more, or different data.

There are several approaches to avoid long parameter lists:

1
#!/usr/bin/env python3
2
#-*- coding: utf-8 -*-
3
"""RNA Sequence with secondary structure prediction methods.
4
5
This tool takes a given sequence and returns the secondary structure prediction provided by 5 different tools: RNAfold, RNAsubopt, ipknot, contextfold and centroid_fold. You must have these tools installed. You don't have to install all tools if you want to use only one of the methods.
6
7
It's easy to add more methods of your choince to this class.
8
9
Installation
10
~~~~~~~~~~~~~
11
Depends on what tools you want to use, follow the instructions below.
12
13
ContextFold
14
^^^^^^^^^^^^^^^^^^^^^
15
https://www.cs.bgu.ac.il/~negevcb/contextfold/
16
17
needs Java. Try this on Ubuntu 14-04 https://askubuntu.com/questions/521145/how-to-install-oracle-java-on-ubuntu-14-04 Single chain only!
18
19
ViennaRNA
20
^^^^^^^^^^^^^^
21
https://www.tbi.univie.ac.at/RNA/
22
23
For OSX install from the binary Installer from the page.
24
25
ipknot OSX
26
^^^^^^^^^^^^^
27
https://github.com/satoken/homebrew-rnatools
28
29
If one encounters a problem::
30
31
    [mm] Desktop$ /usr/local/opt/bin/ipknot
32
    dyld: Library not loaded: /usr/local/opt/glpk/lib/libglpk.40.dylib
33
      Referenced from: /usr/local/opt/bin/ipknot
34
      Reason: image not found
35
    [1]    51654 abort      /usr/local/opt/bin/ipknot
36
37
the solution is::
38
39
     brew install glpk # on OSX
40
41
RNA Structure
42
^^^^^^^^^^^^^
43
http://rna.urmc.rochester.edu/
44
45
Works with 5.8.1; Jun 16, 2016.
46
47
Download http://rna.urmc.rochester.edu/RNAstructureDownload.html and untar it in ``<RNA_PDB_TOOLS>/opt/RNAstructure/``; run make, the tools will be compiled in a folder ``exe``. Set up ``DATPATH`` in your bashrc to ``<RNA_PDB_TOOLS>/opt/RNAstructure/data_tables`` ``DATAPATH=/home/magnus/work/src/rna-pdb-tools/opt/RNAstructure/data_tables/`` (read more http://rna.urmc.rochester.edu/Text/Thermodynamics.html). RNAstructure can be run with SHAPE restraints, read more http://rna.urmc.rochester.edu/Text/File_Formats.html#Constraint about the format. The file format for SHAPE reactivity comprises two columns. The first column is the nucleotide number, and the second is the reactivity. Nucleotides for which there is no SHAPE data can either be left out of the file, or the reactivity can be entered as less than -500. Columns are separated by any white space.
48
49
MC-Sym
50
^^^^^^^^^^^^^
51
52
FAQ
53
~~~~~~~~~~~~~
54
55
- Does it work for more than one chain??? Hmm.. I think it's not. You have to check on your own. --magnus
56
57
TIPS
58
~~~~~~~~~~~~~
59
60
Should you need to run it on a list of sequences, use the following script::
61
62
    from rna_tools import Seq
63
    f = open("listOfSequences.fasta")
64
    for line in f:
65
     if line.startswith('>'):
66
       print line,
67
     else:
68
       print line,
69
       s = Seq.Seq(line.strip()) # module first Seq and class second Seq #without strip this has two lines
70
       print s.predict_ss(method="contextfold"),
71
       #print s.predict_ss(method="centroid_fold")
72
73
TODO
74
~~~~~~~~~~~~~
75
76
- This calss should be renamed to RNASeq and merged with RNASeq class from RNAalignment
77
78
"""  # noqa
79
import os
80
import subprocess
81
import tempfile
82
import sys
83
84
from rna_tools.SecondaryStructure import parse_vienna_to_pairs
85
from rna_tools.rna_tools_config import CONTEXTFOLD_PATH, RNASTRUCTURE_PATH, ENTRNA_PATH, IPKNOT_PATH
86
87
88
class MethodNotChosen(Exception):
89
    pass
90
91
92
class RNASequence(object):
93
    """RNASequence.
94
95
    Usage::
96
97
        >>> seq = RNASequence("CCCCUUUUGGGG")
98
        >>> seq.name = 'RNA03'
99
        >>> print(seq.predict_ss("RNAfold", constraints="((((....))))"))
100
        >RNA03
101
        CCCCUUUUGGGG
102
        ((((....)))) ( -6.40)
103
104
    """
105
106
    def __init__(self, seq, ss='', name='rna_seq'):
107
        self.seq = seq
108
        self.ss = ss
109
        self.ss_log = ''
110
        self.name = name
111
112
    def __repr__(self):
113
        return self.name + '\n' + self.seq + '\n' + self.ss
114
115
    def eval(self, ss='', no_dangling_end_energies=False, verbose=False):
116
        """Evaluate energy of RNA sequence.
117
118
        Args:
119
            ss (optional), if not set, then self.ss is taken for calc
120
            no_dangling_end_energies (Boolean)
121
            verbose (Boolean)
122
123
        Returns:
124
            Energy (float)
125
126
        The RNAeval web server calculates the energy of a RNA sequence on a given secondary structure.
127
        You can use it to get a detailed thermodynamic description (loop free-energy decomposition) of your RNA structures.
128
129
        Simply paste or upload your sequence below and click Proceed. To get more information on the meaning of the options click the help symbols. You can test the server using this sample sequence/structure pair.
130
131
        An equivalent RNAeval command line call would have been::
132
133
            RNAeval -v -d0 < input.txt
134
135
        Read more: <http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAeval.cgi>
136
137
        """
138
        tf = tempfile.NamedTemporaryFile(delete=False)
139
        if not ss:
140
            ss = self.ss
141
142
        tf.name += '.fa'
143
        with open(tf.name, 'w') as f:
144
            f.write('>' + self.name + '\n')
145
            f.write(self.seq + '\n')
146
            f.write(ss + '\n')
147
148
        dopt = ' -d2 '
149
        if no_dangling_end_energies:
150
            dopt = ' -d0 '
151
152
        cmd = 'RNAeval ' + dopt + ' < ' + tf.name
153
        if verbose:
154
            print(cmd)
155
        self.ss_log = subprocess.check_output(cmd, shell=True).decode()
156
        # [u'>rna_seq\nGGCAGGGGCGCUUCGGCCCCCUAUGCC\n((((((((.((....)).)))).))))', u'(-13.50)']
157
        return float(self.ss_log.strip().split(' ')[-1].replace('(','').replace(')', ''))
158
159
160
    def get_foldability(self, ss='', verbose=False):
161
        """Calculate foldability based on EntRNA.
162
163
        Steps:
164
165
        - parse SS into basepairs,
166
        - calculate foldabilty
167
168
        Configuration:
169
170
        - Set ENTRNA_PATH to the folder where ENTRNA_predict.py is.
171
172
        Cmd wrapper in here::
173
174
            python ENTRNA_predict.py --seq_file pseudoknotted_seq.txt --str_file pseudoknotted_str.txt
175
176
        Su, C., Weir, J. D., Zhang, F., Yan, H., & Wu, T. (2019).
177
        ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics, 20(1), 1–11.
178
        http://doi.org/10.1186/s12859-019-2948-5
179
        """
180
        if ss:
181
            self.ss = ss
182
183
        # parse SS into base-pairs
184
        def dp_to_bp(dp):
185
            import numpy as np
186
            a_list = []
187
            bp_array = np.zeros(len(dp),dtype = int)
188
            for i in range(len(dp)):
189
                if dp[i] == "(":
190
                    a_list.append(i)
191
                if dp[i] == ")":
192
                    bp_array[i] = a_list[-1] + 1
193
                    bp_array[a_list[-1]] = i + 1
194
                    a_list.pop()
195
            return list(bp_array)
196
197
        bp = dp_to_bp(self.ss)
198
        if verbose: print(bp)
199
200
        tempstr = tempfile.NamedTemporaryFile(delete=False)
201
        with open(tempstr.name, 'w') as f:
202
            f.write(str(bp))
203
204
        tempseq = tempfile.NamedTemporaryFile(delete=False)
205
        with open(tempseq.name, 'w') as f:
206
            f.write(self.seq)
207
208
        # -W to silent warnings See [1]
209
        cmd = "cd " + ENTRNA_PATH + " && python -W ignore ENTRNA_predict.py --seq_file " + tempseq.name + " --str_file " + tempstr.name
210
        log = subprocess.check_output(cmd, shell=True).decode()
211
        if verbose:
212
            print(cmd)
213
            print(log)
214
        for l in log.split('\n'):
215
            if l.startswith('Foldability: '):
216
                return round(float(l.replace('Foldability: ', '')), 2)
217
        return -1
218
        ## [1]:
219
        ##     /Users/magnus/work/evoClustRNA/rna-foldability/ENTRNA/util/pseudoknot_free.py:22: SettingWithCopyWarning:
220
        ## A value is trying to be set on a copy of a slice from a DataFrame.
221
        ## Try using .loc[row_indexer,col_indexer] = value instead
222
223
        ## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
224
        ##   df_v1['length'] = df_v1['seq'].apply(lambda x:len(x))
225
        ## /home/magnus/miniconda2/lib/python2.7/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
226
        ##   FutureWarning)
227
        ## cd /Users/magnus/work/evoClustRNA/rna-foldability/ENTRNA/ && python ENTRNA_predict.py --seq_file /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmpUORegp --str_file /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmp1ERCcD
228
229
    def predict_ss(self, method="RNAfold", constraints='', enforce_constraint=False, shapefn='', explore='', verbose=0, path=''):
230
        """Predict secondary structure of the seq.
231
232
        Args:
233
          method: {mcfold, RNAfold}
234
          onstraints:
235
          shapefn (str): path to a file with shape reactivites
236
          verbose (boolean)
237
238
        It creates a seq fasta file and runs various methods for secondary structure
239
        prediction. You can provide also a constraints file for RNAfold and RNAsubopt.
240
241
        Methods that can be used with contraints: RNAsubopt, RNAfold, mcfold.
242
243
        Methods that can be used with SHAPE contraints: RNAfold.
244
245
        **ContextFold**
246
247
        Example::
248
249
            $ java -cp bin contextFold.app.Predict in:CCCCUUUGGGGG
250
            CCCCUUUGGGGG
251
            ((((....))))
252
253
        It seems that a seq has to be longer than 9. Otherwise::
254
255
            $ java -cp bin contextFold.app.Predict in:UUUUUUGGG
256
            Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10
257
258
            # this is OK
259
            $ java -cp bin contextFold.app.Predict in:CCCCUUUGGG
260
            CCCCUUUGGG
261
            .(((...)))
262
263
264
        **RNAstructure**
265
266
        Example::
267
268
            >>> seq = RNASequence("GGGGUUUUCCC")
269
            >>> print(seq.predict_ss("rnastructure"))
270
            >  ENERGY = -4.4  rna_seq
271
            GGGGUUUUCCC
272
            ((((...))))
273
274
        and with the shape data::
275
276
            >>> print(seq.predict_ss("rnastructure", shapefn="data/shape.txt"))
277
            >  ENERGY = -0.2  rna_seq
278
            GGGGUUUUCCC
279
            .(((....)))
280
281
        the shape data::
282
283
            1 10
284
            2 1
285
            3 1
286
287
        You can easily see that the first G is unpaired right now! The reactivity of this G was
288
        set to 10. Worked!
289
290
291
        **MC-Fold**
292
293
        MC-Fold uses the online version of the tool, this is very powerful with constraints::
294
295
            rna_seq
296
            acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
297
            ((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
298
            curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
299
            mcfold::energy best dynamics programming: -53.91
300
            (-53.91, '((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))')
301
302
            curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((........)))).......((((..............((((((((((..............))))))))))..))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
303
            mcfold::energy best dynamics programming: -34.77
304
            (-34.77, '((((........)))).......((((..............((((((((((..............))))))))))..))))')
305
306
            acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
307
            ((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
308
            curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((xxxxxxxx))))xxxxxxx((((xxxxxxxxxxxxxx((((((((((xxxxxxxxxxxxxx))))))))))xx))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
309
            mcfold::energy best dynamics programming: -34.77
310
            (-34.77, '((((........)))).......((((..............((((((((((..............))))))))))..))))')
311
312
313
            acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
314
            ((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
315
            curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((********))))*******((((**************((((((((((**************))))))))))**))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
316
            mcfold::energy best dynamics programming: -77.30
317
            (-71.12, '(((((((..))))))).......((((((.(((...)))..(((((((((((((((....)))))))))))))))))))))')
318
319
            acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
320
            ((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
321
            curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((**[[[[[**))))*******((((****]]]]]****(((((((((((((((****)))))))))))))))**))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
322
            mcfold::energy best dynamics programming: -77.30
323
            ('-77.30', '((((**[[[[[**))))*******((((****]]]]]****(((((((((((((((****)))))))))))))))**))))')
324
325
        **explore**
326
327
         The sub-optimal search space can be constrained within a percentage of the minimum free energy structure, as MC-fold makes use of the Waterman-Byers algorithm [18, 19]. Because the exploration has an exponential time complexity, increasing this value can have a dramatic effect on MC-Fold’s run time.
328
329
        Parisien, M., & Major, F. (2009). RNA Modeling Using the MC-Fold and MC-Sym Pipeline [Manual] (pp. 1–84).
330
331
332
        """
333
334
        tf = tempfile.NamedTemporaryFile(delete=False)
335
        tf.name += '.fa'
336
        with open(tf.name, 'w') as f:
337
            f.write('>' + self.name + '\n')
338
            f.write(self.seq + '\n')
339
            if constraints:
340
                f.write(constraints)
341
342
        # check for seq and constraints
343
        if constraints:
344
            if len(self.seq) != len(constraints):
345
                raise Exception('The seq and constraints should be of the same length: %i %s %i %s' % (len(self.seq), self.seq, len(constraints), constraints))
346
347
        # run prediction
348
        # rnafold without contraints
349
        if method == "RNAfold" and constraints:
350
            cmd = 'RNAfold -C < ' + tf.name
351
            if verbose:
352
                print(cmd)
353
            self.ss_log = subprocess.check_output(cmd, shell=True).decode()
354
            return '\n'.join(self.ss_log.strip().split('\n')[:])
355
356
        if method == "RNAfoldX" and constraints:
357
            if enforce_constraint:
358
                cmd = 'RNAfold -p -d2 --noLP -C --enforceConstraint < ' + tf.name
359
            else:
360
                cmd = 'RNAfold -p -d2 --noLP -C < ' + tf.name
361
            if verbose:
362
                print(cmd)
363
364
            try:
365
                self.ss_log = subprocess.check_output(cmd, shell=True).decode()
366
            except subprocess.CalledProcessError:
367
                 print('Error')
368
                 return 0, 'error', 0, '', 0, '', 0, 0
369
370
            if verbose:
371
                print(self.ss_log)
372
            # parse the results
373
            lines = self.ss_log.split('\n')
374
            if 'Supplied structure constraints create empty solution set for sequence' in self.ss_log:
375
                return 0, 'Supplied structure constraints create empty solution set for sequence', 0, '', 0, '', 0, 0
376
            #if not 'frequency of mfe structure' in self.ss_log:
377
378
            # RNAfold -p -d2 --noLP -C < /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmpGiUoo7.fa
379
            # >rna_seq
380
            # AAAUUAAGGGGAAGCGUUGAGCCGCUACCCAUAUGUGGUUCACUCGGAUAGCGGGGAGCUAAUAGUGAAACCGGCCCUUUAGGGG
381
            # ...((((((((.(((......((((((.((....(((...)))..)).))))))...)))..............))))))))... (-19.80)
382
            # ...{(((((((.(((......((((((.((....(((...)))..)).))))))...)))..............)))))))}... [-21.05]
383
            #...((((((((.(((......((((((.((....(((...)))..)).))))))...)))..............))))))))... {-19.80 d=2.34}
384
            # frequency of mfe structure in ensemble 0.131644; ensemble diversity 3.68
385
386
            mfess = lines[2].split()[0]
387
            mfe = ' '.join(lines[2].split()[-1:])
388
            mfe = float(mfe.replace('(', '').replace(')', '')) # (-19.80) ->-19.80
389
390
            efess = lines[3].split()[0]  # ensamble free energy
391
            efe = ' '.join(lines[3].split()[-1:])
392
            efe = float(efe.replace('[', '').replace(']', '')) # (-19.80) ->-19.80
393
394
            cfess = lines[4].split()[0]  # ensamble free energy
395
            cfe, d = ' '.join(lines[4].split()[1:]).split('d')
396
            cfe = float(cfe.replace('{', '').replace('}', '')) # (-19.80) ->-19.80
397
398
            words = lines[5].split()  # ensamble free energy
399
            freq = round(float(words[6].replace(';', '')), 2) # frequency of mfe structure in ensemble
400
            diversity = float(words[9])     # ensemble diversity
401
402
            if verbose:
403
                print(mfe, mfess, efe, efess, cfe, cfess, freq, diversity)
404
            return mfe, mfess, efe, efess, cfe, cfess, freq, diversity
405
406
        elif method == "RNAfold":
407
            cmd = 'RNAfold < ' + tf.name
408
            if verbose:
409
                print(cmd)
410
            self.ss_log = subprocess.check_output(cmd, shell=True).decode()
411
            return '\n'.join(self.ss_log.strip().split('\n')[:])
412
413
        elif method == "RNAsubopt" and constraints:
414
            cmd = 'RNAsubopt -C < ' + tf.name
415
            if verbose:
416
                print(cmd)
417
            self.ss_log = subprocess.check_output(cmd, shell=True).decode()
418
            return '\n'.join(self.ss_log.split('\n')[:])
419
420
        elif method == "RNAsubopt":
421
            cmd = 'RNAsubopt < ' + tf.name
422
            if verbose:
423
                print(cmd)
424
            self.ss_log = subprocess.check_output(cmd, shell=True).decode()
425
            return '\n'.join(self.ss_log.split('\n')[:])
426
427
        elif method == "mcfold":
428
            #  -F tope=1
429
            if explore:
430
                explore_str = " -F explore=" + str(explore)
431
            else:
432
                explore_str = ''
433
434
            #if constraints:
435
            #cmd = "curl -Y 0 -y 300 -F \"pass=lucy\" -F mask=\"" + constraints + "\" " + explore_str + \
436
            #" -F sequence=\"" + self.seq + "\" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi"
437
            cmd = "curl https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi\?pass\=lucy\&sequence\=" + self.seq + "\&top\=20\&explore\=15\&name\=\&mask\='" + constraints + "'\&singlehigh\=\&singlemed\=\&singlelow\="
438
            # cmd = "curl -Y 0 -y 300 -F \"pass=lucy\" -F sequence=\"" + self.seq + "\" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi"
439
            if verbose:
440
                print(cmd)
441
                
442
            o = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
443
            out = o.stdout.read().decode(errors='ignore').strip()
444
            err = o.stderr.read().decode(errors='ignore').strip()
445
446
            if verbose:
447
                print(out)
448
449
            # If the structure can't be find, detect this statement and finish this routine.
450
            if 'Explored 0 structures' in out:
451
                return 0.00, '', 'Explored 0 structures'
452
453
            comment = ''
454
            energy = ''
455
            out = out.split('\n')
456
            for l in out :
457
                # first you will find the best dynamic energy, and in the next loop
458
                # it will be used to search for lines with this energy and secondary
459
                # structure
460
461
                # (((..)))  -5.43
462
                if energy:  # if energy is set
463
                    if energy in l:
464
                        if verbose: print(l)
465
                        ss = l.split()[0]
466
467
                # Performing Dynamic Programming...
468
                # Best Dynamic Programming Solution has Energy:  -5.43
469
                if l.startswith('Best Dynamic Programming Solution has Energy:'):
470
                    energy_bdp = l.split(':')[1].strip()
471
                    if verbose:
472
                        print ('mcfold::energy best dynamics programming: ' + energy_bdp)
473
                        comment = 'energy best dynamics programming'
474
                    ss = constraints
475
                    # return float(energy), constraints # I'm not sure if this is good
476
477
            # Ok, for whatever reason Best DP energy might not be exactly the same as and
478
            # the energy listed later for secondary structure. So this code finds this secondary
479
            # structure and gets again the energy for this secondary structure,
480
            # and overwrites the previous energy.
481
            # In this case:
482
            # Best Dynamic Programming Solution has Energy:  -5.46
483
            # ...
484
            # CUCUCGAAAGAUG
485
            # (((.((..)))))  -5.44 ( +0.00)
486
            # (((.((..)))))  BP >=  50%
487
488
            # if evenn this will not find ss, then set ss to null not to crash
489
            # and it's possible, like in here
490
            # curl -Y 0 -y 300 -F "pass=lucy" -F mask="((******)))" -F sequence="CCUgcgcaAGG" \
491
            #            http://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
492
            ss = ''
493
            for l in out:
494
                if 'target="_blank">MARNA</a>-formatted:<P><P><P></H2><pre>' in l:
495
                    index = out.index(l)
496
                    ss_line = out[index + 2]
497
                    ss, energy = ss_line.split()[0:2]  # '(((.((..)))))  -5.44 ( +0.00)'
498
499
                    # if there is
500
                    # UUGCCGUAAGACA
501
                    # .............  BP >=  50%
502
                    # then finish with energy 0.00, and empty ss
503
                    if energy == 'BP':
504
                        energy = energy_bdp
0 ignored issues
show
introduced by
The variable energy_bdp does not seem to be defined for all execution paths.
Loading history...
505
                        comment = 'BP energy'
506
                        return energy_bdp, constraints, comment
507
                    # break
508
509
            # prepare outputs, return and self-s
510
            self.log = out
511
            self.ss = ss
512
            return float(energy), ss, comment
513
514
        # if method == "RNAsubopt":
515
        #     from cogent.app.vienna_package import RNAfold, RNAsubopt
516
        #     r = RNAsubopt(WorkingDir="/tmp")
517
        #     res = r([self.seq])
518
        #     return str(res['StdOut'].read()).strip()
519
520
        # if method == 'RNAfold':
521
        #     from cogent.app.vienna_package import RNAfold, RNAsubopt
522
        #     r = RNAfold(WorkingDir="/tmp")
523
        #     res = r([self.seq])
524
        #     self.ss_log = res['StdOut'].read()
525
        #     return self.ss_log.strip().split('\n')[-1].split()[0]
526
527
        elif method == "ipknot":
528
            self.ss_log = subprocess.check_output(IPKNOT_PATH + ' ' + tf.name, shell=True)
529
            return '\n'.join(self.ss_log.decode().split('\n')[2:])
530
531
        elif method == "contextfold":
532
            if path:
533
                CONTEXTFOLD_PATH = path
534
            if not CONTEXTFOLD_PATH:
0 ignored issues
show
introduced by
The variable CONTEXTFOLD_PATH does not seem to be defined in case path on line 532 is False. Are you sure this can never be the case?
Loading history...
535
                print('Set up CONTEXTFOLD_PATH in configuration.')
536
                sys.exit(0)
537
            cmd = "cd " + CONTEXTFOLD_PATH + \
538
                  " && java -cp bin contextFold.app.Predict in:" + self.seq
539
            if verbose:
540
                print(cmd)
541
            self.ss_log = subprocess.check_output(cmd, shell=True).decode()
542
            return '\n'.join(self.ss_log.split('\n')[1:])
543
544
        elif method == "centroid_fold":
545
            self.ss_log = subprocess.check_output('centroid_fold ' + tf.name, shell=True)
546
            return '\n'.join(self.ss_log.split('\n')[2:])
547
548
        elif method == "rnastructure":
549
            cmd = RNASTRUCTURE_PATH + '/exe/Fold ' + tf.name + ' ' + tf.name + '.out '
550
            if shapefn:
551
                cmd += ' -sh ' + shapefn
552
            if verbose:
553
                print(cmd)
554
            o = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
555
            stderr = o.stderr.read().strip()
556
            if stderr:
557
                print(stderr)
558
559
            cmd = RNASTRUCTURE_PATH + '/exe/ct2dot ' + tf.name + '.out 1 ' + \
560
                             tf.name + '.dot'
561
            if verbose:
562
                print(cmd)
563
            o = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
564
            stderr = o.stderr.read().strip()
565
            if not stderr:
566
                with open(tf.name + '.dot') as f:
567
                    return f.read().strip()
568
569
        # (-51.15, '.(.(((((((((((((((..))))))))))))))))(..((((((((....)))).))))).')
570
        elif method == "rnastructure_CycleFold":
571
            cmd = RNASTRUCTURE_PATH + '/exe/CycleFold ' + tf.name + ' > ' + tf.name + '.ct '
572
            if verbose:
573
                print(cmd)
574
            o = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
575
            stderr = o.stderr.read().strip()
576
            if stderr:
577
                print(stderr)
578
579
            # get energy
580
            energy = float(open(tf.name + '.ct').readline().split("energy:")[1].strip())  # >rna_seq	energy: -51.1500
581
582
            # get ss in dot-bracket notation
583
            cmd = RNASTRUCTURE_PATH + '/exe/ct2dot ' + tf.name + '.ct 1 ' + \
584
                             tf.name + '.dot'
585
            if verbose:
586
                print(cmd)
587
            o = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
588
            stderr = o.stderr.read().strip()
589
            if not stderr:
590
                with open(tf.name + '.dot') as f:
591
                    # (-51.15, '.(.(((((((((((((((..))))))))))))))))(..((((((((....)))).))))).')
592
                    return energy, f.read().strip().split('\n')[2]
593
594
595
        else:
596
            raise MethodNotChosen('You have to define a correct method to use.')
597
598
599
# main
600
601
def load_fasta_ss_into_RNAseqs(fn, debug=True):
602
    seqs = []
603
    with open(fn) as f:
604
        for line in f:
605
            if debug: print(line)
606
            name = line.replace('>', '').strip()
607
            seq = next(f).strip()
608
            ss = next(f).strip()
609
            rs = RNASequence(seq, ss, name)
610
            seqs.append(rs)
611
    return seqs
612
613
if __name__ == '__main__':
614
    #import doctest
615
    #doctest.testmod()
616
617
    seq = RNASequence('GAUCguaaGAUC')
618
    seq.name = 'RNA01'
619
    print(seq.predict_ss("mcfold"))
620
    sys.exit(1)
621
622
    seq = RNASequence("CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGUG")
623
    seq.name = 'RNA01'
624
    print(seq.predict_ss("RNAfold",
625
                          constraints="((((...............................................................))))"))  # noqa
626
627
    seq = RNASequence("CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGUG")
628
    seq.name = 'RNA02'
629
    print(seq.predict_ss("RNAsubopt",
630
                         constraints="((((...............................................................))))"))  # noqa
631
632
    print(seq.predict_ss("contextfold"))
633
    print(seq.predict_ss(method="ipknot"))
634
635
    verbose = False
636
    seq = RNASequence("GGGGUUUUCCC")
637
    print(seq.predict_ss("rnastructure", verbose=verbose))
638
    print(seq.predict_ss("rnastructure", shapefn="data/shape.txt", verbose=verbose))
639
640
    seq = RNASequence("CGUGGUUAGGGCCACGUUAAAUAGUUGCUUAAGCCCUAAGCGUUGAUAAAUAUCAGgUGCAA")
641
    print(seq.predict_ss("rnastructure", shapefn="data/shape.txt", verbose=verbose))
642
643
    #
644
    # test of MethodNotChose
645
    # print(seq.predict_ss("test"))
646