Passed
Push — master ( c026c2...632976 )
by Daniel
06:58
created

amd.io._parse_sitesym_pymatgen()   F

Complexity

Conditions 19

Size

Total Lines 75
Code Lines 56

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 56
dl 0
loc 75
rs 0.5999
c 0
b 0
f 0
cc 19
nop 1

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like amd.io._parse_sitesym_pymatgen() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""Tools for reading crystals from files, or from the CSD with 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
coding-style introduced by
Too many lines in module (1432/1000)
Loading history...
2
``csd-python-api``. The readers return :class:`.periodicset.PeriodicSet`
3
objects representing the crystal which can be passed to 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
4
:func:`.calculate.AMD` and :func:`.calculate.PDD` to get their invariants.
5
"""
6
7
import os
8
import re
9
import functools
10
import warnings
11
from typing import Callable, Iterable, Tuple
12
13
import numpy as np
14
import numba
15
16
import ase.io.cif
17
import ase.data
18
import ase.spacegroup
19
import ase.spacegroup.spacegroup
20
21
from .utils import cellpar_to_cell
22
from .periodicset import PeriodicSet
23
from .data import CIF_TAGS
24
25
26
def _custom_warning(message, category, filename, lineno, *args, **kwargs):
0 ignored issues
show
Unused Code introduced by
The argument filename seems to be unused.
Loading history...
Unused Code introduced by
The argument lineno seems to be unused.
Loading history...
Unused Code introduced by
The argument args seems to be unused.
Loading history...
Unused Code introduced by
The argument kwargs seems to be unused.
Loading history...
27
    return f'{category.__name__}: {message}\n'
28
29
warnings.formatwarning = _custom_warning
30
31
_EQUIV_SITE_TOL = 1e-3
32
_DISORDER_OPTIONS = {'skip', 'ordered_sites', 'all_sites'}
33
34
35
class _Reader:
36
37
    def __init__(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
38
            self,
39
            remove_hydrogens=False,
40
            disorder='skip',
41
            heaviest_component=False,
42
            molecular_centres=False,
43
            show_warnings=True
44
    ):
45
46
        if disorder not in _DISORDER_OPTIONS:
47
            msg = 'disorder parameter must be one of ' \
48
                  f'{_DISORDER_OPTIONS} (passed {disorder})'
49
            raise ValueError(msg)
50
51
        self.remove_hydrogens = remove_hydrogens
52
        self.disorder = disorder
53
        self.heaviest_component = heaviest_component
54
        self.molecular_centres = molecular_centres
55
        self.show_warnings = show_warnings
56
        self._generator = None
57
58
    def __iter__(self):
59
        yield from self._generator
0 ignored issues
show
introduced by
Non-iterable value self._generator is used in an iterating context
Loading history...
60
61
    def read_one(self):
62
        """Read the first item."""
63
        try:
64
            return next(iter(self._generator))
65
        except StopIteration:
66
            return None
67
68
69
class CifReader(_Reader):
70
    """Read all structures in a .cif file or all files in a folder
71
    with ase or csd-python-api (if installed), yielding
72
    :class:`.periodicset.PeriodicSet` s.
73
74
    Parameters
75
    ----------
76
    path : str
77
        Path to a .cif file or directory. (Other files are accepted when using
78
        ``reader='ccdc'``, if csd-python-api is installed.)
79
    reader : str, optional
80
        The backend package used for parsing. Default is :code:`ase`,
81
        to use csd-python-api change to :code:`ccdc`. The ccdc reader should 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
82
        be able to read any format accepted by :class:`ccdc.io.EntryReader`,
83
        though only cifs have been tested.
84
    remove_hydrogens : bool, optional
85
        Remove Hydrogens from the crystal.
86
    disorder : str, optional
87
        Controls how disordered structures are handled. Default is ``skip``
88
        which skips any crystal with disorder, since disorder conflicts with
89
        the periodic set model. To read disordered structures anyway, choose
90
        either :code:`ordered_sites` to remove atoms with disorder or
91
        :code:`all_sites` include all atoms regardless of disorder.
92
    heaviest_component : bool, optional
93
        csd-python-api only. Removes all but the heaviest molecule in the
94
        asymmeric unit, intended for removing solvents.
95
    molecular_centres : bool, default False
96
        csd-python-api only. Extract the centres of molecules in the unit cell
97
        and store in the attribute molecular_centres.
98
    show_warnings : bool, optional
99
        Controls whether warnings that arise during reading are printed.
100
101
    Yields
102
    ------
103
    :class:`.periodicset.PeriodicSet`
104
        Represents the crystal as a periodic set, consisting of a finite set
105
        of points (motif) and lattice (unit cell). Contains other data, e.g.
106
        the crystal's name and information about the asymmetric unit.
107
108
    Examples
109
    --------
110
    
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
111
        ::
112
113
            # Put all crystals in a .CIF in a list
114
            structures = list(amd.CifReader('mycif.cif'))
115
116
            # Can also accept path to a directory, reading all files inside
117
            structures = list(amd.CifReader('path/to/folder'))
118
119
            # Reads just one if the .CIF has just one crystal
120
            periodic_set = amd.CifReader('mycif.cif').read_one()
121
122
            # List of AMDs (k=100) of crystals in a .CIF
123
            amds = [amd.AMD(periodic_set, 100) for periodic_set in amd.CifReader('mycif.cif')]
124
    """
125
126
    def __init__(
0 ignored issues
show
best-practice introduced by
Too many arguments (8/5)
Loading history...
127
            self,
128
            path,
129
            reader='ase',
130
            remove_hydrogens=False,
131
            disorder='skip',
132
            heaviest_component=False,
133
            molecular_centres=False,
134
            show_warnings=True,
135
    ):
136
137
        super().__init__(
138
            remove_hydrogens=remove_hydrogens,
139
            disorder=disorder,
140
            heaviest_component=heaviest_component,
141
            molecular_centres=molecular_centres,
142
            show_warnings=show_warnings
143
        )
144
145
        if reader != 'ccdc':
146
147
            if heaviest_component:
148
                msg = 'Parameter heaviest_component ' \
149
                      'only implemented for reader="ccdc".'
150
                raise NotImplementedError(msg)
151
152
            if molecular_centres:
153
                msg = 'Parameter molecular_centres ' \
154
                      'only implemented for reader="ccdc".'
155
                raise NotImplementedError(msg)
156
157
        if reader in ('ase', 'pycodcif'):
158
159
            extensions = {'cif'}
160
            file_parser = functools.partial(ase.io.cif.parse_cif, 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
161
                                            reader=reader)
162
            converter = functools.partial(
163
                periodicset_from_ase_cifblock,
164
                remove_hydrogens=self.remove_hydrogens,
165
                disorder=self.disorder
166
            )
167
168
        elif reader == 'pymatgen':
169
170
            extensions = {'cif'}
171
            file_parser = self._pymatgen_cifblock_generator
172
            converter = functools.partial(
173
                periodicset_from_pymatgen_cifblock,
174
                remove_hydrogens=self.remove_hydrogens,
175
                disorder=self.disorder
176
            )
177
178
        elif reader == 'gemmi':
179
180
            try:
181
                import gemmi.cif
0 ignored issues
show
introduced by
Import outside toplevel (gemmi.cif)
Loading history...
182
            except ImportError as _:
183
                raise ImportError('Failed to import gemmi.')
184
185
            extensions = {'cif'}
186
            file_parser = gemmi.cif.read_file
187
            converter = functools.partial(
188
                periodicset_from_gemmi_block,
189
                remove_hydrogens=self.remove_hydrogens,
190
                disorder=self.disorder
191
            )
192
193
        elif reader == 'ccdc':
194
195
            try:
196
                import ccdc.io
0 ignored issues
show
introduced by
Import outside toplevel (ccdc.io)
Loading history...
197
            except (ImportError, RuntimeError) as _:
198
                msg = 'Failed to import csd-python-api, please'\
199
                      'check it is installed and licensed.'
200
                raise ImportError(msg)
201
202
            extensions = ccdc.io.EntryReader.known_suffixes
203
            file_parser = ccdc.io.EntryReader
204
            converter = functools.partial(
205
                periodicset_from_ccdc_entry,
206
                remove_hydrogens=self.remove_hydrogens,
207
                disorder=self.disorder,
208
                molecular_centres=self.molecular_centres,
209
                heaviest_component=self.heaviest_component
210
            )
211
212
        else:
213
            raise ValueError(f'Unknown reader {reader}.')
214
215
        if os.path.isfile(path):
216
            generator = file_parser(path)
217
        elif os.path.isdir(path):
218
            generator = self._generate_from_dir(path, file_parser, extensions)
219
        else:
220
            raise FileNotFoundError(f'No such file or directory: {path}')
221
222
        self._generator = _map(converter, generator, self.show_warnings)
223
224
    def _generate_from_dir(self, path, file_parser, extensions):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
225
        for file in os.listdir(path):
226
            suff = os.path.splitext(file)[1][1:]
227
            if suff.lower() in extensions:
228
                yield from file_parser(os.path.join(path, file))
229
230
    def _pymatgen_cifblock_generator(path):
0 ignored issues
show
Coding Style Best Practice introduced by
Methods should have self as first argument.

It is a widespread convention and generally a good practice to name the first argument of methods self.

class SomeClass:
    def some_method(self):
        # ... do something
Loading history...
231
        """Path to .cif --> generator of pymatgen CifBlocks."""
232
        from pymatgen.io.cif import CifFile
0 ignored issues
show
introduced by
Unable to import 'pymatgen.io.cif'
Loading history...
introduced by
Import outside toplevel (pymatgen.io.cif.CifFile)
Loading history...
233
        yield from CifFile.from_file(path).data.values()
234
235
236
class CSDReader(_Reader):
237
    """Read structures from the CSD with csd-python-api, yielding 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
238
    :class:`.periodicset.PeriodicSet` s.
239
240
    Parameters
241
    ----------
242
    refcodes : str or List[str], optional
243
        Single or list of CSD refcodes to read. If None or 'CSD', iterates
244
        over the whole CSD.
245
    families : bool, optional
246
        Read all entries whose refcode starts with the given strings, or
247
        'families' (e.g. giving 'DEBXIT' reads all entries starting with
248
        DEBXIT).
249
    remove_hydrogens : bool, optional
250
        Remove hydrogens from the crystal.
251
    disorder : str, optional
252
        Controls how disordered structures are handled. Default is ``skip``
253
        which skips any crystal with disorder, since disorder conflicts with
254
        the periodic set model. To read disordered structures anyway, choose
255
        either :code:`ordered_sites` to remove atoms with disorder or
256
        :code:`all_sites` include all atoms regardless of disorder.
257
    heaviest_component : bool, optional
258
        Removes all but the heaviest molecule in the asymmeric unit, intended
259
        for removing solvents.
260
    molecular_centres : bool, default False
261
        Extract the centres of molecules in the unit cell and store in
262
        attribute molecular_centres.
263
    show_warnings : bool, optional
264
        Controls whether warnings that arise during reading are printed.
265
266
    Yields
267
    ------
268
    :class:`.periodicset.PeriodicSet`
269
        Represents the crystal as a periodic set, consisting of a finite set
270
        of points (motif) and lattice (unit cell). Contains other useful data,
271
        e.g. the crystal's name and information about the asymmetric unit for
272
        calculation.
273
274
    Examples
275
    --------
276
    
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
277
        ::
278
279
            # Put these entries in a list
280
            refcodes = ['DEBXIT01', 'DEBXIT05', 'HXACAN01']
281
            structures = list(amd.CSDReader(refcodes))
282
283
            # Read refcode families (any whose refcode starts with strings in the list)
284
            refcode_families = ['ACSALA', 'HXACAN']
285
            structures = list(amd.CSDReader(refcode_families, families=True))
286
287
            # Get AMDs (k=100) for crystals in these families
288
            refcodes = ['ACSALA', 'HXACAN']
289
            amds = []
290
            for periodic_set in amd.CSDReader(refcodes, families=True):
291
                amds.append(amd.AMD(periodic_set, 100))
292
293
            # Giving the reader nothing reads from the whole CSD.
294
            reader = amd.CSDReader()
295
            
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
296
            # looping over this generic reader will yield all CSD entries
297
            for periodic_set in reader:
298
                ...
299
            
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
300
            # or, read structures by refcode on demand
301
            debxit01 = reader.entry('DEBXIT01')
302
    """
303
304
    def __init__(
0 ignored issues
show
best-practice introduced by
Too many arguments (8/5)
Loading history...
305
            self,
306
            refcodes=None,
307
            families=False,
308
            remove_hydrogens=False,
309
            disorder='skip',
310
            heaviest_component=False,
311
            molecular_centres=False,
312
            show_warnings=True,
313
    ):
314
315
        super().__init__(
316
            remove_hydrogens=remove_hydrogens,
317
            disorder=disorder,
318
            heaviest_component=heaviest_component,
319
            molecular_centres=molecular_centres,
320
            show_warnings=show_warnings,
321
        )
322
323
        try:
324
            import ccdc.io
0 ignored issues
show
introduced by
Import outside toplevel (ccdc.io)
Loading history...
325
        except (ImportError, RuntimeError) as _:
326
            msg = 'Failed to import csd-python-api, please'\
327
                  'check it is installed and licensed.'
328
            raise ImportError(msg)
329
330
        if isinstance(refcodes, str) and refcodes.lower() == 'csd':
331
            refcodes = None
332
333
        if refcodes is None:
334
            families = False
335
        else:
336
            if isinstance(refcodes, str):
337
                refcodes = [refcodes]
338
            else:
339
                refcodes = list(refcodes)
340
341
        if families:
342
            refcodes = _refcodes_from_families(refcodes)
343
344
        self._entry_reader = ccdc.io.EntryReader('CSD')
345
        converter = functools.partial(
346
            periodicset_from_ccdc_entry,
347
            remove_hydrogens=self.remove_hydrogens,
348
            disorder=self.disorder,
349
            molecular_centres=self.molecular_centres,
350
            heaviest_component=self.heaviest_component
351
        )
352
        generator = self._ccdc_generator(refcodes)
353
        self._generator = _map(converter, generator, self.show_warnings)
354
355
    def entry(self, refcode: str, **kwargs) -> PeriodicSet:
356
        """Read a crystal given a CSD refcode, returning a
357
        :class:`.periodicset.PeriodicSet`. If given kwargs, overrides the
358
        kwargs given to the Reader."""
359
360
        try:
361
            entry = self._entry_reader.entry(refcode)
362
        except RuntimeError:
363
            warnings.warn(f'{refcode} not found in database')
364
365
        kwargs_ = {
366
            'remove_hydrogens': self.remove_hydrogens,
367
            'disorder': self.disorder,
368
            'heaviest_component': self.heaviest_component,
369
            'molecular_centres': self.molecular_centres,
370
        }
371
        kwargs_.update(kwargs)
372
        converter = functools.partial(periodicset_from_ccdc_entry, **kwargs_)
373
374
        if 'show_warnings' in kwargs:
0 ignored issues
show
Unused Code introduced by
Consider using dict.get for getting values from a dict if a key is present or a default if not
Loading history...
375
            show_warnings = kwargs['show_warnings']
376
        else:
377
            show_warnings = self.show_warnings
378
379
        try:
380
            periodic_set = next(iter(_map(converter, [entry], show_warnings)))
381
        except StopIteration:
382
            periodic_set = None
383
384
        return periodic_set
385
386
    def family(self, refcode_family: str, **kwargs):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
387
388
        kwargs_ = {
389
            'remove_hydrogens': self.remove_hydrogens,
390
            'disorder': self.disorder,
391
            'heaviest_component': self.heaviest_component,
392
            'molecular_centres': self.molecular_centres
393
        }
394
395
        kwargs_.update(kwargs)
396
        converter = functools.partial(periodicset_from_ccdc_entry, **kwargs_)
397
        refcodes = _refcodes_from_families([refcode_family])
398
        generator = self._ccdc_generator(refcodes)
399
400
        if 'show_warnings' in kwargs:
0 ignored issues
show
Unused Code introduced by
Consider using dict.get for getting values from a dict if a key is present or a default if not
Loading history...
401
            show_warnings = kwargs['show_warnings']
402
        else:
403
            show_warnings = self.show_warnings
404
405
        yield from _map(converter, generator, show_warnings)
406
407
    def _ccdc_generator(self, refcodes):
408
        """Generates ccdc Entries from CSD refcodes."""
409
410
        if refcodes is None:
411
            for entry in self._entry_reader:
412
                yield entry
413
        else:
414
            for refcode in refcodes:
415
                try:
416
                    entry = self._entry_reader.entry(refcode)
417
                    yield entry
418
                except RuntimeError:
419
                    warnings.warn(f'{refcode} not found in database')
420
421
422
class ParseError(ValueError):
423
    """Raised when an item cannot be parsed into a periodic set."""
424
    pass
0 ignored issues
show
Unused Code introduced by
Unnecessary pass statement
Loading history...
425
426
427
def periodicset_from_ase_cifblock(
428
        block,
429
        remove_hydrogens=False,
430
        disorder='skip'
431
) -> PeriodicSet:
432
    """:class:`ase.io.cif.CIFBlock` --> :class:`amd.periodicset.PeriodicSet`. 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
433
    CIFBlock is the type returned by :class:`ase.io.cif.parse_cif`.
434
435
    Parameters
436
    ----------
437
    block : :class:`ase.io.cif.CIFBlock`
438
        An ase CIFBlock object representing a crystal.
439
    remove_hydrogens : bool, optional
440
        Remove Hydrogens from the crystal.
441
    disorder : str, optional
442
        Controls how disordered structures are handled. Default is ``skip``
443
        which raises a ParseError if the input has any disorder, since
444
        disorder conflicts with the periodic set model. To read disordered
445
        structures anyway, choose either :code:`ordered_sites` to remove sites
446
        with disorder or :code:`all_sites` include all sites regardless.
447
448
    Returns
449
    -------
450
    :class:`.periodicset.PeriodicSet`
451
        Represents the crystal as a periodic set, consisting of a finite set
452
        of points (motif) and lattice (unit cell). Contains other useful data,
453
        e.g. the crystal's name and information about the asymmetric unit for
454
        calculation.
455
456
    Raises
457
    ------
458
    ParseError :
459
        Raised if the structure fails to be parsed for any of the following:
460
        1. Required data is missing (e.g. cell parameters),
461
        2. disorder == 'skip' and any of:
462
            (a) any atom has fractional occupancy,
463
            (b) any atom's label ends with '?',
464
        3. The motif is empty after removing H or disordered sites.
465
    """
466
467
    # get cell
468
    try:
469
        cellpar = [block[tag] for tag in CIF_TAGS['cellpar']]
470
    except KeyError:
471
        raise ParseError(f'{block.name} has missing cell data')
472
    cell = cellpar_to_cell(*cellpar)
473
474
    # asymmetric unit fractional coords
475
    asym_unit = [block.get(name) for name in CIF_TAGS['atom_site_fract']]
476
    if None in asym_unit: # missing fractional coords, try cartesian
477
        asym_motif = [block.get(name) for name in CIF_TAGS['atom_site_cartn']]
478
        if None in asym_motif:
479
            raise ParseError(f'{block.name} has missing coordinate data')
480
        asym_unit = np.array(asym_motif) @ np.linalg.inv(cell)
481
482
    elif any(None in coords for coords in asym_unit):
483
        raise ParseError(f'{block.name} has missing coordinate data')
484
485
    # if there's a ?, . or other string, remove site
486
    if any(isinstance(c, str) for coords in asym_unit for c in coords):
487
        warnings.warn(f'atoms without sites or unparsable data will be removed')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
488
        asym_unit_ = [[], [], []]
489
        for xyz in zip(*asym_unit):
490
            if not any(isinstance(coord, str) for coord in xyz):
491
                for i in range(3):
492
                    asym_unit_[i].append(xyz[i])
493
        asym_unit = asym_unit_
494
495
    asym_unit = np.mod(np.array(asym_unit).T, 1)
496
    # asym_unit = _snap_small_prec_coords(asym_unit)
497
498
    # atomic types
499
    for tag in CIF_TAGS['atom_symbol']:
500
        if tag in block:
501
            asym_types = []
502
            for label in block[tag]:
503
                if label in ('.', '?'):
504
                    warnings.warn(f'missing atomic type data')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
505
                    asym_types.append(0)
506
                # remove additional labels
507
                sym = re.search(r'([A-Z][a-z]?)', label).group(0)
508
                if sym == 'D': sym = 'H'
0 ignored issues
show
Coding Style introduced by
More than one statement on a single line
Loading history...
509
                asym_types.append(ase.data.atomic_numbers[sym])
510
            break
511
    else:
512
        warnings.warn(f'missing atomic types')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
513
        asym_types = [0 for _ in range(len(asym_unit))]
514
515
    # symmetry operations
516
    sitesym = []
517
    for tag in CIF_TAGS['symop']: # try symop tags first
518
        if tag in block:
519
            sitesym = block[tag]
520
            break
521
    if isinstance(sitesym, str):
522
        sitesym = [sitesym]
523
    if not sitesym: # if no symops, get ops from spacegroup
524
        try:
525
            spg = block.get_spacegroup(True)
526
            rot, trans = spg.rotations, spg.translations
527
        except:
0 ignored issues
show
Coding Style Best Practice introduced by
General except handlers without types should be used sparingly.

Typically, you would use general except handlers when you intend to specifically handle all types of errors, f.e. when logging. Otherwise, such general error handlers can mask errors in your application that you want to know of.

Loading history...
528
            rot, trans = ase.spacegroup.spacegroup.parse_sitesym(['x,y,z'])
529
    else:
530
        rot, trans = ase.spacegroup.spacegroup.parse_sitesym(sitesym)
531
532
    # (if needed) remove Hydrogens, fract occupancy atoms
533
    remove_sites = [] # indices of asymmetric unit sites to remove
534
    occupancies = block.get('_atom_site_occupancy')
535
    labels = block.get('_atom_site_label')
536
    if occupancies is not None:
537
        if disorder == 'skip':
538
            if any(_atom_has_disorder(lab, occ) for lab, occ in zip(labels, occupancies)):
539
                raise ParseError(f'{block.name} has disorder')
540
        elif disorder == 'ordered_sites':
541
            remove_sites.extend(
542
                (i for i, (lab, occ) in enumerate(zip(labels, occupancies))
0 ignored issues
show
introduced by
The variable i does not seem to be defined for all execution paths.
Loading history...
543
                 if _atom_has_disorder(lab, occ)))
544
545
    if remove_hydrogens:
546
        remove_sites.extend((i for i, num in enumerate(asym_types) if num == 1))
547
548
    asym_unit = np.delete(asym_unit, remove_sites, axis=0)
549
    asym_types = [s for i, s in enumerate(asym_types) if i not in remove_sites]
550
551
    # if disorder == 'all_sites', don't remove overlapping sites
552
    if disorder != 'all_sites':
553
        keep_sites = _unique_sites(asym_unit)
554
        if not np.all(keep_sites):
555
            warnings.warn(f'may have overlapping sites; duplicates will be removed')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
556
        asym_unit = asym_unit[keep_sites]
557
        asym_types = [sym for sym, keep in zip(asym_types, keep_sites) if keep]
558
559
    if asym_unit.shape[0] == 0:
560
        raise ParseError(f'{block.name} has no valid sites')
561
562
    frac_motif, asym_inds, wyc_muls, inverses = _expand_asym_unit(asym_unit, rot, trans)
563
    full_types = np.array([asym_types[i] for i in inverses])
564
    motif = frac_motif @ cell
565
566
    kwargs = {
567
        'name': block.name,
568
        'asymmetric_unit': asym_inds,
569
        'wyckoff_multiplicities': wyc_muls,
570
        'types': full_types
571
    }
572
573
    return PeriodicSet(motif, cell, **kwargs)
574
575
576
# TODO: entrie function needs to be finished.
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
577
def periodicset_from_ase_atoms(
578
        atoms,
579
        remove_hydrogens=False,
580
        disorder='skip'
581
) -> PeriodicSet:
582
    """:class:`ase.atoms.Atoms` --> :class:`amd.periodicset.PeriodicSet`.
583
584
    Parameters
585
    ----------
586
    atoms : :class:`ase.atoms.Atoms`
587
        An ase Atoms object representing a crystal.
588
    remove_hydrogens : bool, optional
589
        Remove Hydrogens from the crystal.
590
    disorder : str, optional
591
        Controls how disordered structures are handled. Default is ``skip``
592
        which raises a ParseError if the input has any disorder, since
593
        disorder conflicts with the periodic set model. To read disordered
594
        structures anyway, choose either :code:`ordered_sites` to remove sites
595
        with disorder or :code:`all_sites` include all sites regardless.
596
597
    Returns
598
    -------
599
    :class:`.periodicset.PeriodicSet`
600
        Represents the crystal as a periodic set, consisting of a finite set
601
        of points (motif) and lattice (unit cell). Contains other useful data,
602
        e.g. the crystal's name and information about the asymmetric unit for
603
        calculation.
604
605
    Raises
606
    ------
607
    ParseError      ## rewrite!!
608
        Raised if the structure can/should not be parsed for the following reasons:
609
        1. no sites found or motif is empty after removing H or disordered sites,
610
        2. a site has missing coordinates,
611
        3. disorder == 'skip' and any of:
612
            (a) any atom has fractional occupancy,
613
            (b) any atom's label ends with '?'.
614
    """
615
616
    if remove_hydrogens:
617
        for i in np.where(atoms.get_atomic_numbers() == 1)[0][::-1]:
618
            atoms.pop(i)
619
620
    if disorder == 'skip':
621
        occ_info = atoms.info['occupancy']
622
        if occ_info is not None:
623
            for d in occ_info.values():
624
                for occ in d.values():
625
                    if float(occ) < 1:
626
                        raise ParseError(f'Ase atoms object has disorder')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
627
628
    elif disorder == 'ordered_sites':
629
        occ_info = atoms.info['occupancy']
630
        if occ_info is not None:
631
            remove_inds = []
632
            for i, d in occ_info.items():
633
                for occ in d.values():
634
                    if float(occ) < 1:
635
                        remove_inds.append(int(i))
636
637
    # .pop removes just one atom. 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
638
639
    # if disorder == 'skip':
640
    #     if not structure.is_ordered:
641
    #         raise ParseError(f'{name} has disorder')
642
    # elif disorder == 'ordered_sites':
643
    #     remove_inds = []
644
    #     for i, comp in enumerate(structure.species_and_occu):
645
    #         if comp.num_atoms < 1:
646
    #             remove_inds.append(i)
647
    #     structure.remove_sites(remove_inds)
648
649
    cell = atoms.get_cell().array
650
    sg = atoms.info['spacegroup']
651
    asym_unit = ase.spacegroup.get_basis(atoms)
652
    frac_motif, asym_inds, multiplicities, inverses = _expand_asym_unit(asym_unit, sg.rotations, sg.translations)
0 ignored issues
show
Unused Code introduced by
The variable inverses seems to be unused.
Loading history...
653
    motif = frac_motif @ cell
654
    
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
655
    kwargs = {
656
        'name': None,
657
        'asymmetric_unit': asym_inds,
658
        'wyckoff_multiplicities': multiplicities,
659
        'types': atoms.get_atomic_numbers()
660
    }
661
662
    return PeriodicSet(motif, cell, **kwargs)
663
664
665
def periodicset_from_pymatgen_cifblock(
666
        block,
667
        remove_hydrogens=False,
668
        disorder='skip',
669
) -> PeriodicSet:
670
    """:class:`pymatgen.io.cif.CifBlock` --> 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
671
    :class:`amd.periodicset.PeriodicSet`.
672
673
    Parameters
674
    ----------
675
    block : :class:`pymatgen.core.cif.CifBlock`
676
        A pymatgen CifBlock object representing a crystal.
677
    remove_hydrogens : bool, optional
678
        Remove Hydrogens from the crystal.
679
    disorder : str, optional
680
        Controls how disordered structures are handled. Default is ``skip``
681
        which raises a ParseError if the input has any disorder, since
682
        disorder conflicts with the periodic set model. To read disordered
683
        structures anyway, choose either :code:`ordered_sites` to remove sites
684
        with disorder or :code:`all_sites` include all sites regardless.
685
686
    Returns
687
    -------
688
    :class:`.periodicset.PeriodicSet`
689
        Represents the crystal as a periodic set, consisting of a finite set
690
        of points (motif) and lattice (unit cell). Contains other useful data,
691
        e.g. the crystal's name and information about the asymmetric unit for
692
        calculation.
693
694
    Raises
695
    ------
696
    ParseError
697
        Raised if the structure can/should not be parsed for the following reasons:
698
        1. no sites found or motif is empty after removing H or disordered sites,
699
        2. a site has missing coordinates,
700
        3. disorder == 'skip' and any of:
701
            (a) any atom has fractional occupancy,
702
            (b) any atom's label ends with '?'.
703
    """
704
705
    from pymatgen.io.cif import str2float
0 ignored issues
show
introduced by
Unable to import 'pymatgen.io.cif'
Loading history...
introduced by
Import outside toplevel (pymatgen.io.cif.str2float)
Loading history...
706
    from pymatgen.core.periodic_table import _pt_data
0 ignored issues
show
introduced by
Unable to import 'pymatgen.core.periodic_table'
Loading history...
introduced by
Import outside toplevel (pymatgen.core.periodic_table._pt_data)
Loading history...
707
708
    odict = block.data
709
710
    try:
711
        cellpar = [str2float(odict[tag]) for tag in CIF_TAGS['cellpar']]
712
        cell = cellpar_to_cell(*cellpar)
713
    except KeyError:
714
        raise ParseError(f'{block.header} has missing cell information')
715
716
    # check for missing data ?
717
    asym_unit = [[str2float(s) for s in odict.get(name)] 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
718
                 for name in CIF_TAGS['atom_site_fract']]
719
    if None in asym_unit:
720
        asym_motif = [[str2float(s) for s in odict.get(name)] 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
721
                      for name in CIF_TAGS['atom_site_cartn']]
722
        if None in asym_motif:
723
            raise ParseError(f'{block.header} has no sites')
724
        asym_unit = np.array(asym_motif) @ np.linalg.inv(cell)
725
726
    asym_unit = np.mod(np.array(asym_unit).T, 1)
727
    # asym_unit = _snap_small_prec_coords(asym_unit)
728
729
    # check _atom_site_label as well?
730
    symbols = odict.get('_atom_site_type_symbol')
731
    if symbols is None:
732
        warnings.warn(f'missing atomic types')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
733
        asym_types = [0 for _ in range(len(asym_unit))]
734
    else:
735
        asym_types = [_pt_data[s]['Atomic no'] for s in symbols]
736
737
    remove_sites = []
738
739
    occupancies = odict.get('_atom_site_occupancy')
740
    labels = odict.get('_atom_site_label')
741
    if occupancies is not None:
742
        if disorder == 'skip':
743
            if any(_atom_has_disorder(lab, occ) for lab, occ in zip(labels, occupancies)):
744
                raise ParseError(f'{block.header} has disorder')
745
        elif disorder == 'ordered_sites':
746
            remove_sites.extend(
747
                (i for i, (lab, occ) in enumerate(zip(labels, occupancies))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable i does not seem to be defined.
Loading history...
748
                 if _atom_has_disorder(lab, occ)))
749
750
    if remove_hydrogens:
751
        remove_sites.extend((i for i, num in enumerate(asym_types) if num == 1))
752
753
    asym_unit = np.delete(asym_unit, remove_sites, axis=0)
754
    asym_types = [s for i, s in enumerate(asym_types) if i not in remove_sites]
755
756
    if disorder != 'all_sites':
757
        keep_sites = _unique_sites(asym_unit)
758
        if not np.all(keep_sites):
759
            warnings.warn(f'may have overlapping sites; duplicates will be removed')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
760
        asym_unit = asym_unit[keep_sites]
761
        asym_types = [sym for sym, keep in zip(asym_types, keep_sites) if keep]
762
763
    if asym_unit.shape[0] == 0:
764
        raise ParseError(f'{block.header} has no valid sites')
765
766
    rot, trans = _parse_sitesym_pymatgen(odict)
767
    frac_motif, asym_inds, wyc_muls, inverses = _expand_asym_unit(asym_unit, rot, trans)
768
    full_types = np.array([asym_types[i] for i in inverses])
769
    motif = frac_motif @ cell
770
771
    kwargs = {
772
        'name': block.header,
773
        'asymmetric_unit': asym_inds,
774
        'wyckoff_multiplicities': wyc_muls,
775
        'types': full_types
776
    }
777
778
    return PeriodicSet(motif, cell, **kwargs)
779
780
781
# TODO: add Raises to docstr
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
782
def periodicset_from_pymatgen_structure(
783
        structure,
784
        remove_hydrogens=False,
785
        disorder='skip'
786
) -> PeriodicSet:
787
    """:class:`pymatgen.core.structure.Structure` --> :class:`amd.periodicset.PeriodicSet`.
788
    Does not set the name of the periodic set, as there seems to be no such attribute in 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
789
    the pymatgen Structure object.
790
791
    Parameters
792
    ----------
793
    structure : :class:`pymatgen.core.structure.Structure`
794
        A pymatgen Structure object representing a crystal.
795
    remove_hydrogens : bool, optional
796
        Remove Hydrogens from the crystal.
797
    disorder : str, optional
798
        Controls how disordered structures are handled. Default is ``skip``
799
        which raises a ParseError if the input has any disorder, since
800
        disorder conflicts with the periodic set model. To read disordered
801
        structures anyway, choose either :code:`ordered_sites` to remove sites
802
        with disorder or :code:`all_sites` include all sites regardless.
803
804
    Returns
805
    -------
806
    :class:`.periodicset.PeriodicSet`
807
        Represents the crystal as a periodic set, consisting of a finite set
808
        of points (motif) and lattice (unit cell). Contains other useful data,
809
        e.g. the crystal's name and information about the asymmetric unit for
810
        calculation.
811
    """
812
813
    from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
0 ignored issues
show
introduced by
Unable to import 'pymatgen.symmetry.analyzer'
Loading history...
introduced by
Import outside toplevel (pymatgen.symmetry.analyzer.SpacegroupAnalyzer)
Loading history...
814
815
    if remove_hydrogens:
816
        structure.remove_species(['H', 'D'])
817
818
    if disorder == 'skip':
819
        if not structure.is_ordered:
820
            raise ParseError(f'Pymatgen structure object has disorder')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
821
    elif disorder == 'ordered_sites':
822
        remove_inds = []
823
        for i, comp in enumerate(structure.species_and_occu):
824
            if comp.num_atoms < 1:
825
                remove_inds.append(i)
826
        structure.remove_sites(remove_inds)
827
828
    motif = structure.cart_coords
829
    cell = structure.lattice.matrix
830
    sym_structure = SpacegroupAnalyzer(structure).get_symmetrized_structure()
831
    equiv_inds = sym_structure.equivalent_indices
832
833
    kwargs = {
834
        'asymmetric_unit': np.array([l[0] for l in equiv_inds]),
835
        'wyckoff_multiplicities': np.array([len(l) for l in equiv_inds]),
836
        'types': np.array(sym_structure.atomic_numbers)
837
    }
838
839
    return PeriodicSet(motif, cell, **kwargs)
840
841
842
def periodicset_from_ccdc_entry(
843
        entry,
844
        remove_hydrogens=False,
845
        disorder='skip',
846
        heaviest_component=False,
847
        molecular_centres=False
848
) -> PeriodicSet:
849
    """:class:`ccdc.entry.Entry` --> :class:`amd.periodicset.PeriodicSet`.
850
    Entry is the type returned by :class:`ccdc.io.EntryReader`.
851
852
    Parameters
853
    ----------
854
    entry : :class:`ccdc.entry.Entry`
855
        A ccdc Entry object representing a database entry.
856
    remove_hydrogens : bool, optional
857
        Remove Hydrogens from the crystal.
858
    disorder : str, optional
859
        Controls how disordered structures are handled. Default is ``skip``
860
        which raises a ParseError if the input has any disorder, since
861
        disorder conflicts with the periodic set model. To read disordered
862
        structures anyway, choose either :code:`ordered_sites` to remove sites
863
        with disorder or :code:`all_sites` include all sites regardless.
864
    heaviest_component : bool, optional
865
        Removes all but the heaviest molecule in the asymmeric unit, intended
866
        for removing solvents.
867
    molecular_centres : bool, default False
868
        Extract the centres of molecules in the unit cell and store in 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
869
        the attribute molecular_centres of the returned PeriodicSet.
870
871
    Returns
872
    -------
873
    :class:`.periodicset.PeriodicSet`
874
        Represents the crystal as a periodic set, consisting of a finite set
875
        of points (motif) and lattice (unit cell). Contains other useful data,
876
        e.g. the crystal's name and information about the asymmetric unit for
877
        calculation.
878
879
    Raises
880
    ------
881
    ParseError :
882
        Raised if the structure fails parsing for any of the following:
883
        1. entry.has_3d_structure is False,
884
        2. disorder == 'skip' and any of:
885
            (a) any disorder flag is True,
886
            (b) any atom has fractional occupancy,
887
            (c) any atom's label ends with '?',
888
        3. entry.crystal.molecule.all_atoms_have_sites is False,
889
        4. a.fractional_coordinates is None for any a in entry.crystal.disordered_molecule,
890
        5. The motif is empty after removing H, disordered sites or solvents.
891
    """
892
893
    # Entry specific flags
894
    if not entry.has_3d_structure:
895
        raise ParseError(f'{entry.identifier} has no 3D structure')
896
897
    if disorder == 'skip' and entry.has_disorder:
898
        raise ParseError(f'{entry.identifier} has disorder')
899
900
    return periodicset_from_ccdc_crystal(
901
        entry.crystal,
902
        remove_hydrogens=remove_hydrogens,
903
        disorder=disorder,
904
        heaviest_component=heaviest_component,
905
        molecular_centres=molecular_centres
906
    )
907
908
909
def periodicset_from_ccdc_crystal(
910
        crystal,
911
        remove_hydrogens=False,
912
        disorder='skip',
913
        heaviest_component=False,
914
        molecular_centres=False
915
) -> PeriodicSet:
916
    """:class:`ccdc.crystal.Crystal` --> :class:`amd.periodicset.PeriodicSet`.
917
    Crystal is the type returned by :class:`ccdc.io.CrystalReader`.
918
919
    Parameters
920
    ----------
921
    crystal : :class:`ccdc.crystal.Crystal`
922
        A ccdc Crystal object representing a crystal structure.
923
    remove_hydrogens : bool, optional
924
        Remove Hydrogens from the crystal.
925
    disorder : str, optional
926
        Controls how disordered structures are handled. Default is ``skip``
927
        which raises a ParseError if the input has any disorder, since
928
        disorder conflicts with the periodic set model. To read disordered
929
        structures anyway, choose either :code:`ordered_sites` to remove sites
930
        with disorder or :code:`all_sites` include all sites regardless.
931
    heaviest_component : bool, optional
932
        Removes all but the heaviest molecule in the asymmeric unit,
933
        intended for removing solvents.
934
    molecular_centres : bool, default False
935
        Extract the centres of molecules in the unit cell and store in 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
936
        the attribute molecular_centres of the returned PeriodicSet.
937
938
    Returns
939
    -------
940
    :class:`.periodicset.PeriodicSet`
941
        Represents the crystal as a periodic set, consisting of a finite set
942
        of points (motif) and lattice (unit cell). Contains other useful data,
943
        e.g. the crystal's name and information about the asymmetric unit for
944
        calculation.
945
946
    Raises
947
    ------
948
    ParseError :
949
        Raised if the structure fails parsing for any of the following:
950
        1. disorder == 'skip' and any of:
951
            (a) any disorder flag is True,
952
            (b) any atom has fractional occupancy,
953
            (c) any atom's label ends with '?',
954
        2. crystal.molecule.all_atoms_have_sites is False,
955
        3. a.fractional_coordinates is None for any a in crystal.disordered_molecule,
956
        4. The motif is empty after removing H, disordered sites or solvents.
957
    """
958
959
    molecule = crystal.disordered_molecule
960
961
    # if skipping disorder, check disorder flags and then all atoms
962
    if disorder == 'skip':
963
        if crystal.has_disorder or \
964
         any(_atom_has_disorder(a.label, a.occupancy) for a in molecule.atoms):
965
            raise ParseError(f'{crystal.identifier} has disorder')
966
967
    elif disorder == 'ordered_sites':
968
        molecule.remove_atoms(a for a in molecule.atoms
969
                              if _atom_has_disorder(a.label, a.occupancy))
970
971
    if remove_hydrogens:
972
        molecule.remove_atoms(a for a in molecule.atoms if a.atomic_symbol in 'HD')
973
974
    if heaviest_component and len(molecule.components) > 1:
975
        molecule = _heaviest_component(molecule)
976
977
    # remove atoms with missing coord data and warn
978
    if any(a.fractional_coordinates is None for a in molecule.atoms):
979
        warnings.warn(f'trying to remove atoms without sites')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
980
        molecule.remove_atoms(a for a in molecule.atoms 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
981
                              if a.fractional_coordinates is None)
982
983
    if not molecule.all_atoms_have_sites:
984
        raise ParseError(f'{crystal.identifier} has atoms without sites')
985
986
    crystal.molecule = molecule
987
    cell = cellpar_to_cell(*crystal.cell_lengths, *crystal.cell_angles)
988
989
    if molecular_centres:
990
        frac_centres = _frac_molecular_centres_ccdc(crystal)
991
        mol_centres = frac_centres @ cell
992
        return PeriodicSet(mol_centres, cell, name=crystal.identifier)
993
    
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
994
    asym_atoms = crystal.asymmetric_unit_molecule.atoms
995
    # check for None?
996
    asym_unit = np.array([tuple(a.fractional_coordinates) for a in asym_atoms])
997
    asym_unit = np.mod(asym_unit, 1)
998
    # asym_unit = _snap_small_prec_coords(asym_unit)
999
    asym_types = [a.atomic_number for a in asym_atoms]
1000
1001
    if disorder != 'all_sites':
1002
        keep_sites = _unique_sites(asym_unit)
1003
        if not np.all(keep_sites):
1004
            warnings.warn(f'may have overlapping sites; duplicates will be removed')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
1005
        asym_unit = asym_unit[keep_sites]
1006
        asym_types = [sym for sym, keep in zip(asym_types, keep_sites) if keep]
1007
1008
    if asym_unit.shape[0] == 0:
1009
        raise ParseError(f'{crystal.identifier} has no valid sites')
1010
1011
    sitesym = crystal.symmetry_operators
1012
    # try spacegroup numbers?
1013
    if not sitesym:
1014
        sitesym = ['x,y,z']
1015
1016
    rot = np.array([np.array(crystal.symmetry_rotation(op)).reshape((3,3)) 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
Coding Style introduced by
Exactly one space required after comma
Loading history...
1017
                    for op in sitesym])
1018
    trans = np.array([crystal.symmetry_translation(op) for op in sitesym])
1019
    frac_motif, asym_inds, wyc_muls, inverses = _expand_asym_unit(asym_unit, rot, trans)
1020
    motif = frac_motif @ cell
1021
1022
    kwargs = {
1023
        'name': crystal.identifier,
1024
        'asymmetric_unit': asym_inds,
1025
        'wyckoff_multiplicities': wyc_muls,
1026
        'types': np.array([asym_types[i] for i in inverses])
1027
    }
1028
1029
    periodic_set = PeriodicSet(motif, cell, **kwargs)
1030
1031
    # if molecular_centres:
1032
    #     frac_centres = _frac_molecular_centres_ccdc(crystal)
1033
    #     periodic_set.molecular_centres = frac_centres @ cell
1034
1035
    return periodic_set
1036
1037
1038
def periodicset_from_gemmi_block(
1039
        block,
1040
        remove_hydrogens=False,
1041
        disorder='skip'
1042
) -> PeriodicSet:
1043
    """:class:`gemmi.cif.Block` --> :class:`amd.periodicset.PeriodicSet`. 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
1044
    Block is the type returned by :class:`gemmi.cif.read_file`.
1045
1046
    Parameters
1047
    ----------
1048
    block : :class:`ase.io.cif.CIFBlock`
1049
        An ase CIFBlock object representing a crystal.
1050
    remove_hydrogens : bool, optional
1051
        Remove Hydrogens from the crystal.
1052
    disorder : str, optional
1053
        Controls how disordered structures are handled. Default is ``skip``
1054
        which raises a ParseError if the input has any disorder, since
1055
        disorder conflicts with the periodic set model. To read disordered
1056
        structures anyway, choose either :code:`ordered_sites` to remove sites
1057
        with disorder or :code:`all_sites` include all sites regardless.
1058
1059
    Returns
1060
    -------
1061
    :class:`.periodicset.PeriodicSet`
1062
        Represents the crystal as a periodic set, consisting of a finite set
1063
        of points (motif) and lattice (unit cell). Contains other useful data,
1064
        e.g. the crystal's name and information about the asymmetric unit for
1065
        calculation.
1066
1067
    Raises
1068
    ------
1069
    ParseError :
1070
        Raised if the structure fails to be parsed for any of the following:
1071
        1. Required data is missing (e.g. cell parameters),
1072
        2. disorder == 'skip' and any of:
1073
            (a) any atom has fractional occupancy,
1074
            (b) any atom's label ends with '?',
1075
        3. The motif is empty after removing H or disordered sites.
1076
    """
1077
1078
    cellpar = [ase.io.cif.convert_value(block.find_value(tag)) 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
1079
               for tag in CIF_TAGS['cellpar']]
1080
    if None in cellpar:
1081
        raise ParseError(f'{block.name} has missing cell information')
1082
1083
    cell = cellpar_to_cell(*cellpar)
1084
1085
    xyz_loop = block.find(CIF_TAGS['atom_site_fract']).loop
1086
    if xyz_loop is None:
1087
        # check for crtsn!
1088
        raise ParseError(f'{block.name} has missing coordinate data')
1089
1090
    # loop will contain all xyz cols, maybe labels, maybe types, maybe occs
1091
1092
    loop_dict = _gemmi_loop_to_dict(xyz_loop)
1093
    xyz_str = [loop_dict[t] for t in CIF_TAGS['atom_site_fract']]
1094
    asym_unit = [[ase.io.cif.convert_value(c) for c in coords] for coords in xyz_str]
1095
1096
    asym_unit = np.mod(np.array(asym_unit).T, 1)
1097
    # asym_unit = _snap_small_prec_coords(asym_unit)
1098
1099
    if '_atom_site_type_symbol' in loop_dict:
1100
        asym_syms = loop_dict['_atom_site_type_symbol']
1101
        asym_types = [ase.data.atomic_numbers[s] for s in asym_syms]
1102
    else:
1103
        warnings.warn(f'missing atomic types')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
1104
        asym_types = [0 for _ in range(len(asym_unit))]
1105
1106
    remove_sites = []
1107
    
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
1108
    # if labels exist, check them for disorder
1109
    if '_atom_site_label' in loop_dict:
1110
        labels = loop_dict['_atom_site_label']
1111
    else:
1112
        labels = ['' for _ in range(xyz_loop.length())]
1113
1114
    # if occupancies exist, check them for disorder
1115
    if '_atom_site_occupancy' in loop_dict:
1116
        occupancies = [ase.io.cif.convert_value(occ) 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
1117
                  for occ in loop_dict['_atom_site_occupancy']]
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation (add 5 spaces).
Loading history...
1118
    else:
1119
        occupancies = [None for _ in range(xyz_loop.length())]
1120
1121
    if disorder == 'skip':
1122
        if any(_atom_has_disorder(lab, occ) for lab, occ in zip(labels, occupancies)):
1123
            raise ParseError(f'{block.name} has disorder')
1124
    elif disorder == 'ordered_sites':
1125
        remove_sites.extend(
1126
            (i for i, (lab, occ) in enumerate(zip(labels, occupancies))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable i does not seem to be defined.
Loading history...
1127
                if _atom_has_disorder(lab, occ)))
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation (remove 3 spaces).
Loading history...
1128
1129
    if remove_hydrogens:
1130
        remove_sites.extend((i for i, num in enumerate(asym_types) if num == 1))
1131
1132
    asym_unit = np.delete(asym_unit, remove_sites, axis=0)
1133
    asym_types = [s for i, s in enumerate(asym_types) if i not in remove_sites]
1134
1135
    if disorder != 'all_sites':
1136
        keep_sites = _unique_sites(asym_unit)
1137
        if not np.all(keep_sites):
1138
            warnings.warn(f'may have overlapping sites; duplicates will be removed')
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
1139
        asym_unit = asym_unit[keep_sites]
1140
        asym_types = [sym for sym, keep in zip(asym_types, keep_sites) if keep]
1141
1142
    if asym_unit.shape[0] == 0:
1143
        raise ParseError(f'{block.name} has no valid sites')
1144
1145
    # get symops
1146
    sitesym = []
1147
    for tag in CIF_TAGS['symop']:
1148
        symop_loop = block.find([tag]).loop
1149
        if symop_loop is not None:
1150
            symop_loop_dict = _gemmi_loop_to_dict(symop_loop)
1151
            sitesym = symop_loop_dict[tag]
1152
            break
1153
    # try spacegroup names/nums?
1154
    if not sitesym:
1155
        sitesym = ['x,y,z',]
1156
1157
    rot, trans = ase.spacegroup.spacegroup.parse_sitesym(sitesym)
1158
    frac_motif, asym_inds, wyc_muls, inverses = _expand_asym_unit(asym_unit, rot, trans)
1159
    full_types = np.array([asym_types[i] for i in inverses])
1160
    motif = frac_motif @ cell
1161
1162
    kwargs = {
1163
        'name': block.name,
1164
        'asymmetric_unit': asym_inds,
1165
        'wyckoff_multiplicities': wyc_muls,
1166
        'types': full_types
1167
    }
1168
1169
    return PeriodicSet(motif, cell, **kwargs)
1170
1171
1172
def _frac_molecular_centres_ccdc(crystal):
1173
    """Returns any geometric centres of molecules in the unit cell.
1174
    Expects a ccdc Crystal object and returns fractional coordiantes."""
1175
1176
    frac_centres = []
1177
    for comp in crystal.packing(inclusion='CentroidIncluded').components:
1178
        coords = [a.fractional_coordinates for a in comp.atoms]
1179
        x, y, z = zip(*coords)
1180
        m = len(coords)
1181
        frac_centres.append((sum(x) / m, sum(y) / m, sum(z) / m))
1182
    frac_centres = np.mod(np.array(frac_centres), 1)
1183
    return frac_centres[_unique_sites(frac_centres)]
1184
1185
1186
def _expand_asym_unit(
1187
        asym_unit: np.ndarray,
1188
        rotations: np.ndarray,
1189
        translations: np.ndarray
1190
) -> Tuple[np.ndarray, ...]:
1191
    """
1192
    Asymmetric unit frac coords, list of rotations, list of translations
1193
    -->
1194
    full fractional motif, asymmetric unit indices, multiplicities, inverses.
1195
    """
1196
1197
    all_sites = []
1198
    asym_inds = [0]
1199
    multiplicities = []
1200
    inverses = []
1201
    m, dims = asym_unit.shape
1202
1203
    expanded_sites = np.zeros((m, len(rotations), dims))
1204
    for i in range(m):
1205
        expanded_sites[i] = np.dot(rotations, asym_unit[i]) + translations
1206
    expanded_sites = np.mod(expanded_sites, 1)
1207
1208
    for inv, sites in enumerate(expanded_sites):
1209
1210
        multiplicity = 0
1211
1212
        for site_ in sites:
1213
1214
            if not all_sites:
1215
                all_sites.append(site_)
1216
                inverses.append(inv)
1217
                multiplicity += 1
1218
                continue
1219
1220
            # check if site_ overlaps with existing sites
1221
            diffs1 = np.abs(site_ - all_sites)
1222
            diffs2 = np.abs(diffs1 - 1)
1223
            mask = np.all((diffs1 <= _EQUIV_SITE_TOL) | (diffs2 <= _EQUIV_SITE_TOL), axis=-1)
1224
1225
            if np.any(mask):
1226
                where_equal = np.argwhere(mask).flatten()
1227
                for ind in where_equal:
1228
                    if inverses[ind] == inv: # invarnant 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
1229
                        pass
1230
                    else: # equivalent to a different site
1231
                        warnings.warn(f'has equivalent sites at positions {inverses[ind]}, {inv}')
1232
            else:
1233
                all_sites.append(site_)
1234
                inverses.append(inv)
1235
                multiplicity += 1
1236
1237
        if multiplicity > 0:
1238
            multiplicities.append(multiplicity)
1239
            asym_inds.append(len(all_sites))
1240
1241
    frac_motif = np.array(all_sites)
1242
    asym_inds = np.array(asym_inds[:-1])
1243
    multiplicities = np.array(multiplicities)
1244
1245
    return frac_motif, asym_inds, multiplicities, inverses
1246
1247
1248
@numba.njit()
1249
def _unique_sites(asym_unit):
1250
    """Uniquify (within io._EQUIV_SITE_TOL) a list of fractional coordinates,
1251
    considering all points modulo 1. Returns an array of bools such that
1252
    asym_unit[_unique_sites(asym_unit)] is a uniquified list."""
1253
    site_diffs1 = np.abs(np.expand_dims(asym_unit, 1) - asym_unit)
1254
    site_diffs2 = np.abs(site_diffs1 - 1)
1255
    sites_neq_mask = np.logical_and((site_diffs1 > _EQUIV_SITE_TOL), 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
1256
                                    (site_diffs2 > _EQUIV_SITE_TOL))
1257
    overlapping = np.triu(sites_neq_mask.sum(axis=-1) == 0, 1)
1258
    return overlapping.sum(axis=0) == 0
1259
1260
1261
def _parse_sitesym_pymatgen(data):
1262
    """In order to generate symmetry equivalent positions, the symmetry
1263
    operations are parsed. If the symops are not present, the space
1264
    group symbol is parsed, and symops are generated."""
1265
1266
    from pymatgen.symmetry.groups import SpaceGroup
0 ignored issues
show
introduced by
Unable to import 'pymatgen.symmetry.groups'
Loading history...
introduced by
Import outside toplevel (pymatgen.symmetry.groups.SpaceGroup)
Loading history...
1267
    from pymatgen.core.operations import SymmOp
0 ignored issues
show
introduced by
Unable to import 'pymatgen.core.operations'
Loading history...
introduced by
Import outside toplevel (pymatgen.core.operations.SymmOp)
Loading history...
1268
    import pymatgen.io.cif
0 ignored issues
show
introduced by
Import outside toplevel (pymatgen.io.cif)
Loading history...
introduced by
Unable to import 'pymatgen.io.cif'
Loading history...
1269
1270
    symops = []
1271
1272
    # try to parse xyz symops
1273
    for symmetry_label in CIF_TAGS['symop']:
1274
        xyz = data.get(symmetry_label)
1275
        if not xyz:
1276
            continue
1277
        if isinstance(xyz, str):
1278
            xyz = [xyz]
1279
        try:
1280
            symops = [SymmOp.from_xyz_string(s) for s in xyz]
1281
            break
1282
        except ValueError:
1283
            continue
1284
1285
    # try to parse symbol
1286
    if not symops:
1287
        for symmetry_label in CIF_TAGS['spacegroup_name']:
1288
1289
            sg = data.get(symmetry_label)
1290
            if not sg:
1291
                continue
1292
1293
            sg = pymatgen.io.cif.sub_spgrp(sg)
1294
            try:
1295
                spg = pymatgen.io.cif.space_groups.get(sg)
1296
                if not spg:
1297
                    continue
1298
                symops = SpaceGroup(spg).symmetry_ops
1299
                break
1300
            except ValueError:
1301
                pass
1302
1303
            try:
1304
                for d in pymatgen.io.cif._get_cod_data():
0 ignored issues
show
Coding Style Best Practice introduced by
It seems like _get_cod_data was declared protected and should not be accessed from this context.

Prefixing a member variable _ is usually regarded as the equivalent of declaring it with protected visibility that exists in other languages. Consequentially, such a member should only be accessed from the same class or a child class:

class MyParent:
    def __init__(self):
        self._x = 1;
        self.y = 2;

class MyChild(MyParent):
    def some_method(self):
        return self._x    # Ok, since accessed from a child class

class AnotherClass:
    def some_method(self, instance_of_my_child):
        return instance_of_my_child._x   # Would be flagged as AnotherClass is not
                                         # a child class of MyParent
Loading history...
1305
                    if sg == re.sub(r'\s+', '', d['hermann_mauguin']):
1306
                        xyz = d['symops']
1307
                        symops = [SymmOp.from_xyz_string(s) for s in xyz]
1308
                        break
1309
            except Exception:
0 ignored issues
show
Best Practice introduced by
Catching very general exceptions such as Exception is usually not recommended.

Generally, you would want to handle very specific errors in the exception handler. This ensure that you do not hide other types of errors which should be fixed.

So, unless you specifically plan to handle any error, consider adding a more specific exception.

Loading history...
1310
                continue
1311
1312
            if symops:
1313
                break
1314
1315
    # try to parse international number
1316
    if not symops:
1317
        for symmetry_label in CIF_TAGS['spacegroup_number']:
1318
            num = data.get(symmetry_label)
1319
            if not num:
1320
                continue
1321
1322
            try:
1323
                i = int(pymatgen.io.cif.str2float(num))
1324
                symops = SpaceGroup.from_int_number(i).symmetry_ops
1325
                break
1326
            except ValueError:
1327
                continue
1328
1329
    if not symops:
1330
        symops = [SymmOp.from_xyz_string(s) for s in ['x', 'y', 'z']]
1331
1332
    rotations = [op.rotation_matrix for op in symops]
1333
    translations = [op.translation_vector for op in symops]
1334
1335
    return rotations, translations
1336
1337
1338
def _atom_has_disorder(label, occupancy):
1339
    """Return True if label ends with ? or occupancy is a number < 1."""
1340
    return label.endswith('?') or (np.isscalar(occupancy) and occupancy < 1)
1341
1342
1343
def _snap_small_prec_coords(frac_coords):
1344
    """Find where frac_coords is within 1e-4 of 1/3 or 2/3, change to 1/3 and 2/3.
1345
    Recommended by pymatgen's CIF parser."""
1346
    frac_coords[np.abs(1 - 3 * frac_coords) < 1e-4] = 1 / 3.
1347
    frac_coords[np.abs(1 - 3 * frac_coords / 2) < 1e-4] = 2 / 3.
1348
    return frac_coords
1349
1350
1351
def _heaviest_component(molecule):
1352
    """Heaviest component (removes all but the heaviest component of the asym unit).
1353
    Intended for removing solvents. Probably doesn't play well with disorder"""
1354
    component_weights = []
1355
    for component in molecule.components:
1356
        weight = 0
1357
        for a in component.atoms:
1358
            if isinstance(a.atomic_weight, (float, int)):
1359
                if isinstance(a.occupancy, (float, int)):
1360
                    weight += a.occupancy * a.atomic_weight
1361
                else:
1362
                    weight += a.atomic_weight
1363
        component_weights.append(weight)
1364
    largest_component_ind = np.argmax(np.array(component_weights))
1365
    molecule = molecule.components[largest_component_ind]
1366
    return molecule
1367
1368
1369
def _refcodes_from_families(refcode_families):
1370
    """List of strings --> all CSD refcodes starting with one of the strings.
1371
    Intended to be passed a list of families and return all refcodes in them."""
1372
1373
    try:
1374
        import ccdc.search
0 ignored issues
show
introduced by
Import outside toplevel (ccdc.search)
Loading history...
1375
    except (ImportError, RuntimeError) as _:
1376
        msg = 'Failed to import csd-python-api, please'\
1377
              'check it is installed and licensed.'
1378
        raise ImportError(msg)
1379
1380
    all_refcodes = []
1381
    for refcode in refcode_families:
1382
        query = ccdc.search.TextNumericSearch()
1383
        query.add_identifier(refcode)
1384
        hits = [hit.identifier for hit in query.search()]
1385
        all_refcodes.extend(hits)
1386
1387
    # filter to unique refcodes
1388
    seen = set()
1389
    seen_add = seen.add
1390
    refcodes = [
1391
        refcode for refcode in all_refcodes
1392
        if not (refcode in seen or seen_add(refcode))]
1393
1394
    return refcodes
1395
1396
1397
def _gemmi_loop_to_dict(loop):
1398
    """gemmi Loop object --> dict, tags: values"""
1399
    tablified_loop = [[] for _ in range(len(loop.tags))]
1400
    n_cols = loop.width()
1401
    for i, item in enumerate(loop.values):
1402
        tablified_loop[i % n_cols].append(item)
1403
    return {tag: l for tag, l in zip(loop.tags, tablified_loop)}
0 ignored issues
show
Unused Code introduced by
Unnecessary use of a comprehension
Loading history...
1404
1405
1406
def _map(func: Callable, iterable: Iterable, show_warnings: bool):
1407
    """Iterates over iterable, passing items through func and yielding the result. 
0 ignored issues
show
Coding Style introduced by
Trailing whitespace
Loading history...
1408
    Catches ParseError and warnings, optionally printing them.
1409
    """
1410
1411
    if not show_warnings:
1412
        warnings.simplefilter('ignore')
1413
1414
    for item in iterable:
1415
1416
        with warnings.catch_warnings(record=True) as warning_msgs:
1417
1418
            parse_failed = False
1419
            try:
1420
                periodic_set = func(item)
1421
            except ParseError as err:
1422
                parse_failed = str(err)
1423
1424
        if parse_failed:
1425
            warnings.warn(parse_failed)
1426
            continue
1427
1428
        for warning in warning_msgs:
1429
            msg = f'{periodic_set.name} {warning.message}'
1430
            warnings.warn(msg, category=warning.category)
1431
1432
        yield periodic_set
1433