amd._nearest_neighbors.generate_concentric_cloud() - Code Metrics - Inspection of "Update .scrutinizer.yml" - dwiddo/average-minimum-distance - Measure and Improve Code Quality continuously with Scrutinizer

Test Setup Failed

Push — master ( ad57f0...44572e )

by Daniel

created 2025-06-17 20:20 UTC

amd._nearest_neighbors.generate_concentric_cloud() A

↳ Parent: amd._nearest_neighbors

Complexity

Conditions

Size

Total Lines	26
Code Lines	5

Duplication

Lines	0
Ratio	0 %

Importance

Changes

Metric	Value
eloc	5
dl	0
loc	26
rs	10
c	0
b	0
f	0
cc	2
nop	2

"""Implements core function nearest_neighbors used for AMD and PDD
calculations.
"""

from typing import Tuple, Iterable, Iterator, Callable, List
from itertools import product, tee, accumulate
import functools

import numba
import numpy as np

from ._types import FloatArray, UIntArray

__all__ = ["nearest_neighbors", "nearest_neighbors_data", "nearest_neighbors_minval"]


def nearest_neighbors(
    motif: FloatArray, cell: FloatArray, x: FloatArray, k: int
) -> FloatArray:
    """Find distances to ``k`` nearest neighbors in a periodic set for
    each point in ``x``.

    Given a periodic set described by ``motif`` and ``cell``, a query
    set of points ``x`` and an integer ``k``, find distances to the
    ``k`` nearest neighbors in the periodic set for all points in
    ``x``. Returns an array with shape ``(x.shape[0], k)`` of distances
    to the neighbors. This function only returns distances, see also
    ``nearest_neighbors_data()`` which also returns indices of those
    points which are neighbors.

    Parameters
    ----------
    motif : :class:`numpy.ndarray`
        Cartesian coordinates of the motif, shape (no points, dims).
    cell : :class:`numpy.ndarray`
        The unit cell as a square array, shape (dims, dims).
    x : :class:`numpy.ndarray`
        Array of points to query for neighbors. For isometry invariants
        of crystals this is typically the asymmetric unit.
    k : int
        Number of nearest neighbors to find for each point in ``x``.

    Returns
    -------
    dists : numpy.ndarray
        Array shape ``(x.shape[0], k)`` of distances from points in
        ``x`` to their ``k`` nearest neighbors in the periodic set in
        order, e.g. ``dists[m][n]`` is the distance from ``x[m]`` to its
        n-th nearest neighbor in the periodic set.
    """

    m, dims = motif.shape

    # Get an initial collection of points and a generator for more
    int_lattice, int_lat_generator = _integer_lattice_batches(dims, m, k)
    cloud = _lattice_to_cloud(motif, int_lattice @ cell)

    # Squared distances to k nearest neighbors
    sqdists = _cdist_sqeulcidean(x, cloud)
    motif_diam = np.sqrt(sqdists[:, :m].max())
    sqdists.partition(k - 1)
    sqdists = sqdists[:, :k]
    sqdists.sort()

    # If a lattice point l has |l| >= max|p-p'| + max_d for p ∈ x, p' ∈ motif
    # then |p-(l+p')| >= max_d for all p, p' and l can be discarded.
    max_sqd = sqdists[:, -1].max()
    bound = (np.sqrt(max_sqd) + motif_diam) ** 2

    while True:
        # Get next layer of lattice and prune distant lattice points
        lattice = next(int_lat_generator) @ cell

        lattice = lattice[_where_sq_sum_lt_bound(lattice, bound)]
        if lattice.size == 0:  # None are close enough
            break

        # Squared distances to new points
        cloud = _lattice_to_cloud(motif, lattice)
        sqdists_ = _cdist_sqeulcidean(x, cloud)
        where_close = (sqdists_ < max_sqd).any(axis=0)
        if not np.any(where_close):  # None are close enough
            break

        # Squared distances to up to k nearest new points
        sqdists_ = sqdists_[:, where_close]
        if sqdists_.shape[-1] > k:
            sqdists_.partition(k - 1)
            sqdists_ = sqdists_[:, :k]
        sqdists_.sort()

        # Merge existing and new distances
        sqdists = _merge_sorted_arrays(sqdists, sqdists_)
        max_sqd = sqdists[:, -1].max()
        bound = (np.sqrt(max_sqd) + motif_diam) ** 2

    return np.sqrt(sqdists)


def nearest_neighbors_data(
    motif: FloatArray, cell: FloatArray, x: FloatArray, k: int
) -> Tuple[FloatArray, FloatArray, UIntArray]:
    """Find the ``k`` nearest neighbors in a periodic set for each
    point in ``x``, along with the point cloud generated during the
    search and indices of the nearest neighbors in the cloud.

    Note: the points in ``x`` are

    Given a periodic set described by ``motif`` and ``cell``, a query
    set of points ``x`` and an integer ``k``, find the ``k`` nearest
    neighbors in the periodic set for all points in ``x``. Return
    an array of distances to neighbors, the point cloud generated
    during the search and the indices of which points in the cloud are
    the neighbors of points in ``x``.

    Note: the point ``cloud[i]`` in the periodic set comes from (is
    translationally equivalent to) the motif point
    ``motif[i % len(motif)]``, because points are added to ``cloud`` in
    batches of whole unit cells and not rearranged.

    Parameters
    ----------
    motif : :class:`numpy.ndarray`
        Cartesian coordinates of the motif, shape (no points, dims).
    cell : :class:`numpy.ndarray`
        The unit cell as a square array, shape (dims, dims).
    x : :class:`numpy.ndarray`
        Array of points to query for neighbors. For AMD/PDD invariants
        this is the motif, or more commonly an asymmetric unit of it.
    k : int
        Number of nearest neighbors to find for each point in ``x``.

    Returns
    -------
    dists : numpy.ndarray
        Array shape ``(x.shape[0], k)`` of distances from points in
        ``x`` to their ``k`` nearest neighbors in the periodic set in
        order, e.g. ``dists[m][n]`` is the distance from ``x[m]`` to its
        n-th nearest neighbor in the periodic set.
    cloud : numpy.ndarray
        Collection of points in the periodic set that were generated
        during the search. Arranged such that cloud[i] comes from the
        motif point motif[i % len(motif)] by translation.
    inds : numpy.ndarray
        Array shape ``(x.shape[0], k)`` containing the indices of
        nearest neighbors in ``cloud``, e.g. the n-th nearest neighbor
        to ``x[m]`` is ``cloud[inds[m][n]]``.
    """

    full_cloud = []
    m, dims = motif.shape

    int_lattice, int_lat_generator = _integer_lattice_batches(dims, m, k)
    cloud = _lattice_to_cloud(motif, int_lattice @ cell)
    full_cloud.append(cloud)
    cloud_ind_offset = len(cloud)

    sqdists = _cdist_sqeulcidean(x, cloud)
    motif_diam = np.sqrt(sqdists[:, :m].max())
    part_inds = np.argpartition(sqdists, k - 1)[:, :k]
    part_sqdists = np.take_along_axis(sqdists, part_inds, axis=-1)[:, :k]
    part_sort_inds = np.argsort(part_sqdists)
    inds = np.take_along_axis(part_inds, part_sort_inds, axis=-1)
    sqdists = np.take_along_axis(part_sqdists, part_sort_inds, axis=-1)

    max_sqd = sqdists[:, -1].max()
    bound = (np.sqrt(max_sqd) + motif_diam) ** 2

    while True:
        lattice = next(int_lat_generator) @ cell
        lattice = lattice[np.einsum("ij,ij->i", lattice, lattice) < bound]
        if lattice.size == 0:
            break

        cloud = _lattice_to_cloud(motif, lattice)
        sqdists_ = _cdist_sqeulcidean(x, cloud)
        close = sqdists_ < max_sqd
        if not np.any(close):
            break

        argpart_upto = min(k - 1, sqdists_.shape[-1] - 1)
        part_inds = np.argpartition(sqdists_, argpart_upto)[:, :k]
        part_sqdists = np.take_along_axis(sqdists_, part_inds, axis=-1)[:, :k]
        part_sort_inds = np.argsort(part_sqdists)
        inds_ = np.take_along_axis(part_inds, part_sort_inds, axis=-1)
        sqdists_ = np.take_along_axis(part_sqdists, part_sort_inds, axis=-1)
        inds_ += cloud_ind_offset
        cloud_ind_offset += len(cloud)
        full_cloud.append(cloud)

        sqdists, inds = _merge_sorted_arrays_and_inds(sqdists, sqdists_, inds, inds_)
        max_sqd = sqdists[:, -1].max()
        bound = (np.sqrt(max_sqd) + motif_diam) ** 2

    return np.sqrt(sqdists), np.concatenate(full_cloud), inds


def _integer_lattice_batches_cache(func: Callable) -> Callable:
    """Specialised cache for ``_integer_lattice_batches``. Stores layers
    of integer lattice points from ``_integer_lattice_generator``.
    How many layers are needed depends on the ratio of k to m, so this
    is passed to the cache. One cache is kept for each dimension.
    """

    cache = {}  # (dims, n_layers): (layers, generator)
    npoints_cache = {}  # dims: [num points accumulated in layers 1,2,...]

    @functools.wraps(func)
    def wrapper(
        dims: int, m: int, k: int
    ) -> Tuple[List[FloatArray], Iterator[FloatArray]]:
        if dims not in npoints_cache:

            npoints_cache[dims] = []

        # Use npoints_cache to get how many layers of the lattice are needed
        # Delicate code; change in accordance with _integer_lattice_batches
        n_layers = 1
        for n_points in npoints_cache[dims]:
            if n_points * m > k:
                break
            n_layers += 1
        n_layers += 1

        # Update cache and possibly npoints_cache
        if (dims, n_layers) not in cache:

            layers, generator = func(dims, m, k)
            n_layers = len(layers)
            # If more layers were made than seen so far, update npoints_cache
            if len(npoints_cache[dims]) < n_layers:
                npoints_cache[dims] = list(accumulate(len(i) for i in layers))
            cache[(dims, n_layers)] = [np.concatenate(layers), generator]

        layers, generator = cache[(dims, n_layers)]
        cache[(dims, n_layers)][1], ret_generator = tee(generator)
        return layers, ret_generator

    return wrapper


@_integer_lattice_batches_cache
def _integer_lattice_batches(
    dims: int, m: int, k: int
) -> Tuple[List[FloatArray], Iterator[FloatArray]]:
    """Return an initial batch of integer lattice points (number
    according to k & m) and a generator for more distant points.

    Parameters
    ----------
    dims : int
        The dimension of Euclidean space the lattice is in.
    m : int
        Number of motif points. Used to determine how many layers of
        lattice points to put into the initial batch.
    k : int
        Parameter of ``nearest_neighbors``, the number of neighbors to
        find for each point. Used to determine how many layers of
        lattice points to put into the initial batch.

    Returns
    -------
    initial_integer_lattice : :class:`numpy.ndarray`
        A collection of integer lattice points. Consists of the first
        few layers generated by ``integer_lattice_generator`` (number of
        layers depends on m, k).
    integer_lattice_generator
        A generator for integer lattice points more distant than those
        in ``initial_integer_lattice``.
    """

    int_lattice_generator = iter(_integer_lattice_generator(dims))
    layers = [next(int_lattice_generator)]
    n_points = 1
    while n_points * m <= k:
        layer = next(int_lattice_generator)
        n_points += layer.shape[0]
        layers.append(layer)
    layers.append(next(int_lattice_generator))
    return layers, int_lattice_generator


def _memoized_generator(generator_function: Callable) -> Callable:
    """Caches results of a generator."""
    cache = {}

    @functools.wraps(generator_function)
    def wrapper(*args) -> Iterable:
        if args not in cache:

            cache[args] = generator_function(*args)
        cache[args], r = tee(cache[args])
        return r

    return wrapper


@_memoized_generator
def _integer_lattice_generator(dims: int) -> Iterable[FloatArray]:
    """Generate batches of integer lattice points. Each yield gives all
    points (that have not been yielded) in a sphere radius d centered at
    0; d starts at 0 and increments by 1 on each yield.

    Parameters
    ----------
    dims : int
        The dimension of Euclidean space of the lattice.

    Yields
    -------
    :class:`numpy.ndarray`
        Yields arrays of integer lattice points in `dims`-dimensional
        Euclidean space.
    """

    d = 0
    if dims == 1:
        yield np.zeros((1, 1), dtype=np.float64)
        while True:
            d += 1
            yield np.array([[-d], [d]], dtype=np.float64)

    ymax = {}
    while True:
        positive_int_lattice = []
        while True:
            batch = False
            for xy in product(range(d + 1), repeat=dims - 1):
                if xy not in ymax:
                    ymax[xy] = 0
                if sum(i**2 for i in xy) + ymax[xy] ** 2 <= d**2:
                    positive_int_lattice.append((*xy, ymax[xy]))
                    batch = True
                    ymax[xy] += 1
            if not batch:
                break
        pos_int_lat = np.array(positive_int_lattice, dtype=np.float64)
        yield _reflect_positive_integer_lattice(pos_int_lat)
        d += 1


@numba.njit(cache=True, fastmath=True)
def _reflect_positive_integer_lattice(positive_int_lattice: FloatArray) -> FloatArray:
    """Reflect points in the positive quadrant across all combinations
    of axes without duplicating points that are invariant under
    reflections.
    """

    dims = positive_int_lattice.shape[-1]
    batches = []
    batches.extend(positive_int_lattice)

    for n_reflections in range(1, dims + 1):
        axes = np.arange(n_reflections, dtype=np.int64)
        off_axes = (positive_int_lattice[:, axes] == 0).sum(axis=-1) == 0
        int_lattice = positive_int_lattice[off_axes]
        int_lattice[:, axes] *= -1
        batches.extend(int_lattice)

        while True:
            i = n_reflections - 1
            for _ in range(n_reflections):
                if axes[i] != i + dims - n_reflections:
                    break
                i -= 1
            else:
                break
            axes[i] += 1
            for j in range(i + 1, n_reflections):
                axes[j] = axes[j - 1] + 1

            off_axes = (positive_int_lattice[:, axes] == 0).sum(axis=-1) == 0
            int_lattice = positive_int_lattice[off_axes]
            int_lattice[:, axes] *= -1
            batches.extend(int_lattice)

    int_lattice = np.empty(shape=(len(batches), dims), dtype=np.float64)
    for i in range(len(batches)):
        int_lattice[i] = batches[i]
    return int_lattice


@numba.njit(cache=True, fastmath=True)
def _lattice_to_cloud(motif: FloatArray, lattice: FloatArray) -> FloatArray:
    """Create a cloud of points from a periodic set with ``motif``,
    and a collection of lattice points ``lattice``.
    """
    m, dims = motif.shape
    n_lat_points = lattice.shape[0]
    layer = np.empty((m * n_lat_points, dims), dtype=np.float64)
    i = 0
    for lat_i in range(n_lat_points):
        for mot_i in range(m):
            for dim in range(dims):
                layer[i, dim] = motif[mot_i, dim] + lattice[lat_i, dim]
            i += 1
    return layer


@numba.njit(cache=True, fastmath=True)
def _cdist_sqeulcidean(XA, XB):
    mA, dims = XA.shape
    mB = XB.shape[0]
    dm = np.empty((mA, mB), np.float64)
    for i in range(mA):
        for j in range(mB):
            v = 0.0
            for d in range(dims):
                v += (XA[i, d] - XB[j, d]) ** 2
            dm[i, j] = v
    return dm


@numba.njit(cache=True, fastmath=True)
def _where_sq_sum_lt_bound(XA, bound):
    """Returns an array of bools such that
    XA[_where_sq_sum_lt_bound(XA, bound)] contains only points whose
    squared sum is < bound, equivalent to
    XA[(lattice ** 2).sum(axis=-1) < bound]."""
    m, dims = XA.shape
    ret = np.full((m,), fill_value=True, dtype=np.bool_)
    for i in range(m):
        v = 0.0
        for dim in range(dims):
            v += XA[i, dim] ** 2
        if v >= bound:
            ret[i] = False
    return ret


@numba.njit(cache=True, fastmath=True)
def _merge_sorted_arrays(dists: FloatArray, dists_: FloatArray) -> FloatArray:
    """Merge two 2D arrays with sorted rows into one sorted array with
    same number of columns as ``dists``. Optimised assuming that only a
    few elements at the end of each row of ``dists`` needs to be
    changed.
    """

    m, n_new_points = dists_.shape
    ret = np.copy(dists)

    for i in range(m):
        # Traverse dists[i] backwards until value smaller than dists_[i, 0]
        j = -1
        dp_ = 0
        dist_ = dists_[i, dp_]
        while True:
            if dists[i, j] <= dist_:
                j += 1
                break
            j -= 1

        # If dists_[i, 0] >= dists[i, -1], no need to insert
        if j == 0:
            continue

        # dp points to dists[i], dp_ points to dists_[i], j points to ret[i]
        # Fill ret with the larger value, increment pointers and repeat
        dp = j
        dist = dists[i, dp]

        while j < 0:
            if dist <= dist_:
                ret[i, j] = dist
                dp += 1
                dist = dists[i, dp]
            else:
                ret[i, j] = dist_
                dp_ += 1
                if dp_ < n_new_points:
                    dist_ = dists_[i, dp_]
                else:  # ran out of points in dists_
                    dist_ = np.inf
            j += 1

    return ret


@numba.njit(cache=True, fastmath=True)
def _merge_sorted_arrays_and_inds(
    dists: FloatArray, dists_: FloatArray, inds: UIntArray, inds_: UIntArray
) -> Tuple[FloatArray, UIntArray]:
    """Similar to _merge_sorted_arrays, but also merges two arrays
    ``inds`` and ``inds_`` with the same pattern for
    ``nearest_neighbors_data``.
    """

    m, n_new_points = dists_.shape
    ret_dists = np.copy(dists)
    ret_inds = np.copy(inds)

    for i in range(m):
        j = -1
        dp_ = 0
        dist_ = dists_[i, dp_]
        p_ = inds_[i, dp_]

        while True:
            if dists[i, j] <= dist_:
                j += 1
                break
            j -= 1

        if j == 0:
            continue

        dp = j
        dist = dists[i, dp]
        p = inds[i, dp]

        while j < 0:
            if dist <= dist_:
                ret_dists[i, j] = dist
                ret_inds[i, j] = p
                dp += 1
                dist = dists[i, dp]
                p = inds[i, dp]
            else:
                ret_dists[i, j] = dist_
                ret_inds[i, j] = p_
                dp_ += 1
                if dp_ < n_new_points:
                    dist_ = dists_[i, dp_]
                    p_ = inds_[i, dp_]
                else:
                    dist_ = np.inf
            j += 1

    return ret_dists, ret_inds


def nearest_neighbors_minval(
    motif: FloatArray, cell: FloatArray, x: FloatArray, min_val: float
) -> Tuple[FloatArray, FloatArray, UIntArray]:
    """Return the same ``dists``/PDD matrix as ``nearest_neighbors``,
    but with enough columns such that all values in the last column are
    at least ``min_val``.
    """

    full_cloud, all_sqdists = [], []
    m, dims = motif.shape

    int_lattice, int_lat_generator = _integer_lattice_batches(dims, m, 1)
    cloud = _lattice_to_cloud(motif, int_lattice @ cell)
    full_cloud.append(cloud)
    sqdists = _cdist_sqeulcidean(x, cloud)
    motif_diam = np.sqrt(sqdists[:, :m].max())
    all_sqdists.append(sqdists)
    max_sqd = min_val**2
    bound = (np.sqrt(max_sqd) + motif_diam) ** 2

    while True:
        lattice = next(int_lat_generator) @ cell
        lattice = lattice[np.einsum("ij,ij->i", lattice, lattice) < bound]
        if lattice.size == 0:
            break

        cloud = _lattice_to_cloud(motif, lattice)
        sqdists = _cdist_sqeulcidean(x, cloud)
        close = sqdists <= max_sqd
        if not np.any(close):
            break

        all_sqdists.append(sqdists)
        full_cloud.append(cloud)

    sqdists = np.hstack(all_sqdists)
    inds = np.argsort(sqdists)
    sqdists = np.take_along_axis(sqdists, inds, axis=-1)

    i = np.argmax(np.all(sqdists >= max_sqd, axis=0))
    sqdists = sqdists[:, : i + 1]
    inds = inds[:, : i + 1]

    return np.sqrt(sqdists), np.concatenate(full_cloud), inds


def generate_concentric_cloud(
    motif: FloatArray, cell: FloatArray
) -> Iterable[FloatArray]:
    """Generates batches of points from a periodic set given by (motif,
    cell) which get successively further away from the origin.

    Each yield gives all points (that have not already been yielded)
    which lie in a unit cell whose corner lattice point was generated by
    ``generate_integer_lattice(motif.shape[1])``.

    Parameters
    ----------
    motif : :class:`numpy.ndarray`
        Cartesian representation of the motif, shape (no points, dims).
    cell : :class:`numpy.ndarray`
        Cartesian representation of the unit cell, shape (dims, dims).

    Yields
    -------
    :class:`numpy.ndarray`
        Yields arrays of points from the periodic set.
    """

    int_lat_generator = _integer_lattice_generator(cell.shape[0])
    for layer in int_lat_generator:
        yield _lattice_to_cloud(motif, layer @ cell)


1			"""Implements core function nearest_neighbors used for AMD and PDD
2			calculations.
3			"""
4
5			from typing import Tuple, Iterable, Iterator, Callable, List
6			from itertools import product, tee, accumulate
7			import functools
8
9			import numba
10			import numpy as np
11
12			from ._types import FloatArray, UIntArray
13
14			__all__ = ["nearest_neighbors", "nearest_neighbors_data", "nearest_neighbors_minval"]
15
16
17			def nearest_neighbors(
18			motif: FloatArray, cell: FloatArray, x: FloatArray, k: int
19			) -> FloatArray:
20			"""Find distances to ``k`` nearest neighbors in a periodic set for
21			each point in ``x``.
22
23			Given a periodic set described by ``motif`` and ``cell``, a query
24			set of points ``x`` and an integer ``k``, find distances to the
25			``k`` nearest neighbors in the periodic set for all points in
26			``x``. Returns an array with shape ``(x.shape[0], k)`` of distances
27			to the neighbors. This function only returns distances, see also
28			``nearest_neighbors_data()`` which also returns indices of those
29			points which are neighbors.
30
31			Parameters
32			----------
33			motif : :class:`numpy.ndarray`
34			Cartesian coordinates of the motif, shape (no points, dims).
35			cell : :class:`numpy.ndarray`
36			The unit cell as a square array, shape (dims, dims).
37			x : :class:`numpy.ndarray`
38			Array of points to query for neighbors. For isometry invariants
39			of crystals this is typically the asymmetric unit.
40			k : int
41			Number of nearest neighbors to find for each point in ``x``.
42
43			Returns
44			-------
45			dists : numpy.ndarray
46			Array shape ``(x.shape[0], k)`` of distances from points in
47			``x`` to their ``k`` nearest neighbors in the periodic set in
48			order, e.g. ``dists[m][n]`` is the distance from ``x[m]`` to its
49			n-th nearest neighbor in the periodic set.
50			"""
51
52			m, dims = motif.shape
53
54			# Get an initial collection of points and a generator for more
55			int_lattice, int_lat_generator = _integer_lattice_batches(dims, m, k)
56			cloud = _lattice_to_cloud(motif, int_lattice @ cell)
57
58			# Squared distances to k nearest neighbors
59			sqdists = _cdist_sqeulcidean(x, cloud)
60			motif_diam = np.sqrt(sqdists[:, :m].max())
61			sqdists.partition(k - 1)
62			sqdists = sqdists[:, :k]
63			sqdists.sort()
64
65			# If a lattice point l has \|l\| >= max\|p-p'\| + max_d for p ∈ x, p' ∈ motif
66			# then \|p-(l+p')\| >= max_d for all p, p' and l can be discarded.
67			max_sqd = sqdists[:, -1].max()
68			bound = (np.sqrt(max_sqd) + motif_diam) ** 2
69
70			while True:
71			# Get next layer of lattice and prune distant lattice points
72			lattice = next(int_lat_generator) @ cell
73
74			lattice = lattice[_where_sq_sum_lt_bound(lattice, bound)]
75			if lattice.size == 0: # None are close enough
76			break
77
78			# Squared distances to new points
79			cloud = _lattice_to_cloud(motif, lattice)
80			sqdists_ = _cdist_sqeulcidean(x, cloud)
81			where_close = (sqdists_ < max_sqd).any(axis=0)
82			if not np.any(where_close): # None are close enough
83			break
84
85			# Squared distances to up to k nearest new points
86			sqdists_ = sqdists_[:, where_close]
87			if sqdists_.shape[-1] > k:
88			sqdists_.partition(k - 1)
89			sqdists_ = sqdists_[:, :k]
90			sqdists_.sort()
91
92			# Merge existing and new distances
93			sqdists = _merge_sorted_arrays(sqdists, sqdists_)
94			max_sqd = sqdists[:, -1].max()
95			bound = (np.sqrt(max_sqd) + motif_diam) ** 2
96
97			return np.sqrt(sqdists)
98
99
100			def nearest_neighbors_data(
101			motif: FloatArray, cell: FloatArray, x: FloatArray, k: int
102			) -> Tuple[FloatArray, FloatArray, UIntArray]:
103			"""Find the ``k`` nearest neighbors in a periodic set for each
104			point in ``x``, along with the point cloud generated during the
105			search and indices of the nearest neighbors in the cloud.
106
107			Note: the points in ``x`` are
108
109			Given a periodic set described by ``motif`` and ``cell``, a query
110			set of points ``x`` and an integer ``k``, find the ``k`` nearest
111			neighbors in the periodic set for all points in ``x``. Return
112			an array of distances to neighbors, the point cloud generated
113			during the search and the indices of which points in the cloud are
114			the neighbors of points in ``x``.
115
116			Note: the point ``cloud[i]`` in the periodic set comes from (is
117			translationally equivalent to) the motif point
118			``motif[i % len(motif)]``, because points are added to ``cloud`` in
119			batches of whole unit cells and not rearranged.
120
121			Parameters
122			----------
123			motif : :class:`numpy.ndarray`
124			Cartesian coordinates of the motif, shape (no points, dims).
125			cell : :class:`numpy.ndarray`
126			The unit cell as a square array, shape (dims, dims).
127			x : :class:`numpy.ndarray`
128			Array of points to query for neighbors. For AMD/PDD invariants
129			this is the motif, or more commonly an asymmetric unit of it.
130			k : int
131			Number of nearest neighbors to find for each point in ``x``.
132
133			Returns
134			-------
135			dists : numpy.ndarray
136			Array shape ``(x.shape[0], k)`` of distances from points in
137			``x`` to their ``k`` nearest neighbors in the periodic set in
138			order, e.g. ``dists[m][n]`` is the distance from ``x[m]`` to its
139			n-th nearest neighbor in the periodic set.
140			cloud : numpy.ndarray
141			Collection of points in the periodic set that were generated
142			during the search. Arranged such that cloud[i] comes from the
143			motif point motif[i % len(motif)] by translation.
144			inds : numpy.ndarray
145			Array shape ``(x.shape[0], k)`` containing the indices of
146			nearest neighbors in ``cloud``, e.g. the n-th nearest neighbor
147			to ``x[m]`` is ``cloud[inds[m][n]]``.
148			"""
149
150			full_cloud = []
151			m, dims = motif.shape
152
153			int_lattice, int_lat_generator = _integer_lattice_batches(dims, m, k)
154			cloud = _lattice_to_cloud(motif, int_lattice @ cell)
155			full_cloud.append(cloud)
156			cloud_ind_offset = len(cloud)
157
158			sqdists = _cdist_sqeulcidean(x, cloud)
159			motif_diam = np.sqrt(sqdists[:, :m].max())
160			part_inds = np.argpartition(sqdists, k - 1)[:, :k]
161			part_sqdists = np.take_along_axis(sqdists, part_inds, axis=-1)[:, :k]
162			part_sort_inds = np.argsort(part_sqdists)
163			inds = np.take_along_axis(part_inds, part_sort_inds, axis=-1)
164			sqdists = np.take_along_axis(part_sqdists, part_sort_inds, axis=-1)
165
166			max_sqd = sqdists[:, -1].max()
167			bound = (np.sqrt(max_sqd) + motif_diam) ** 2
168
169			while True:
170			lattice = next(int_lat_generator) @ cell
171			lattice = lattice[np.einsum("ij,ij->i", lattice, lattice) < bound]
172			if lattice.size == 0:
173			break
174
175			cloud = _lattice_to_cloud(motif, lattice)
176			sqdists_ = _cdist_sqeulcidean(x, cloud)
177			close = sqdists_ < max_sqd
178			if not np.any(close):
179			break
180
181			argpart_upto = min(k - 1, sqdists_.shape[-1] - 1)
182			part_inds = np.argpartition(sqdists_, argpart_upto)[:, :k]
183			part_sqdists = np.take_along_axis(sqdists_, part_inds, axis=-1)[:, :k]
184			part_sort_inds = np.argsort(part_sqdists)
185			inds_ = np.take_along_axis(part_inds, part_sort_inds, axis=-1)
186			sqdists_ = np.take_along_axis(part_sqdists, part_sort_inds, axis=-1)
187			inds_ += cloud_ind_offset
188			cloud_ind_offset += len(cloud)
189			full_cloud.append(cloud)
190
191			sqdists, inds = _merge_sorted_arrays_and_inds(sqdists, sqdists_, inds, inds_)
192			max_sqd = sqdists[:, -1].max()
193			bound = (np.sqrt(max_sqd) + motif_diam) ** 2
194
195			return np.sqrt(sqdists), np.concatenate(full_cloud), inds
196
197
198			def _integer_lattice_batches_cache(func: Callable) -> Callable:
199			"""Specialised cache for ``_integer_lattice_batches``. Stores layers
200			of integer lattice points from ``_integer_lattice_generator``.
201			How many layers are needed depends on the ratio of k to m, so this
202			is passed to the cache. One cache is kept for each dimension.
203			"""
204
205			cache = {} # (dims, n_layers): (layers, generator)
206			npoints_cache = {} # dims: [num points accumulated in layers 1,2,...]
207
208			@functools.wraps(func)
209			def wrapper(
210			dims: int, m: int, k: int
211			) -> Tuple[List[FloatArray], Iterator[FloatArray]]:
212			if dims not in npoints_cache:
			0 ignored issues – show Comprehensibility Best Practice introduced 2024-03-16 21:42 UTC by Report Bug Copy Issue Report The variable `npoints_cache` does not seem to be defined. Loading history...
213			npoints_cache[dims] = []
214
215			# Use npoints_cache to get how many layers of the lattice are needed
216			# Delicate code; change in accordance with _integer_lattice_batches
217			n_layers = 1
218			for n_points in npoints_cache[dims]:
219			if n_points * m > k:
220			break
221			n_layers += 1
222			n_layers += 1
223
224			# Update cache and possibly npoints_cache
225			if (dims, n_layers) not in cache:
			0 ignored issues – show Comprehensibility Best Practice introduced 2024-03-16 21:42 UTC by Report Bug Copy Issue Report The variable `cache` does not seem to be defined. Loading history...
226			layers, generator = func(dims, m, k)
227			n_layers = len(layers)
228			# If more layers were made than seen so far, update npoints_cache
229			if len(npoints_cache[dims]) < n_layers:
230			npoints_cache[dims] = list(accumulate(len(i) for i in layers))
231			cache[(dims, n_layers)] = [np.concatenate(layers), generator]
232
233			layers, generator = cache[(dims, n_layers)]
234			cache[(dims, n_layers)][1], ret_generator = tee(generator)
235			return layers, ret_generator
236
237			return wrapper
238
239
240			@_integer_lattice_batches_cache
241			def _integer_lattice_batches(
242			dims: int, m: int, k: int
243			) -> Tuple[List[FloatArray], Iterator[FloatArray]]:
244			"""Return an initial batch of integer lattice points (number
245			according to k & m) and a generator for more distant points.
246
247			Parameters
248			----------
249			dims : int
250			The dimension of Euclidean space the lattice is in.
251			m : int
252			Number of motif points. Used to determine how many layers of
253			lattice points to put into the initial batch.
254			k : int
255			Parameter of ``nearest_neighbors``, the number of neighbors to
256			find for each point. Used to determine how many layers of
257			lattice points to put into the initial batch.
258
259			Returns
260			-------
261			initial_integer_lattice : :class:`numpy.ndarray`
262			A collection of integer lattice points. Consists of the first
263			few layers generated by ``integer_lattice_generator`` (number of
264			layers depends on m, k).
265			integer_lattice_generator
266			A generator for integer lattice points more distant than those
267			in ``initial_integer_lattice``.
268			"""
269
270			int_lattice_generator = iter(_integer_lattice_generator(dims))
271			layers = [next(int_lattice_generator)]
272			n_points = 1
273			while n_points * m <= k:
274			layer = next(int_lattice_generator)
275			n_points += layer.shape[0]
276			layers.append(layer)
277			layers.append(next(int_lattice_generator))
278			return layers, int_lattice_generator
279
280
281			def _memoized_generator(generator_function: Callable) -> Callable:
282			"""Caches results of a generator."""
283			cache = {}
284
285			@functools.wraps(generator_function)
286			def wrapper(*args) -> Iterable:
287			if args not in cache:
			0 ignored issues – show Comprehensibility Best Practice introduced 2023-05-18 10:35 UTC by Report Bug Copy Issue Report The variable `cache` does not seem to be defined. Loading history...
288			cache[args] = generator_function(*args)
289			cache[args], r = tee(cache[args])
290			return r
291
292			return wrapper
293
294
295			@_memoized_generator
296			def _integer_lattice_generator(dims: int) -> Iterable[FloatArray]:
297			"""Generate batches of integer lattice points. Each yield gives all
298			points (that have not been yielded) in a sphere radius d centered at
299			0; d starts at 0 and increments by 1 on each yield.
300
301			Parameters
302			----------
303			dims : int
304			The dimension of Euclidean space of the lattice.
305
306			Yields
307			-------
308			:class:`numpy.ndarray`
309			Yields arrays of integer lattice points in `dims`-dimensional
310			Euclidean space.
311			"""
312
313			d = 0
314			if dims == 1:
315			yield np.zeros((1, 1), dtype=np.float64)
316			while True:
317			d += 1
318			yield np.array([[-d], [d]], dtype=np.float64)
319
320			ymax = {}
321			while True:
322			positive_int_lattice = []
323			while True:
324			batch = False
325			for xy in product(range(d + 1), repeat=dims - 1):
326			if xy not in ymax:
327			ymax[xy] = 0
328			if sum(i2 for i in xy) + ymax[xy] 2 <= d**2:
329			positive_int_lattice.append((*xy, ymax[xy]))
330			batch = True
331			ymax[xy] += 1
332			if not batch:
333			break
334			pos_int_lat = np.array(positive_int_lattice, dtype=np.float64)
335			yield _reflect_positive_integer_lattice(pos_int_lat)
336			d += 1
337
338
339			@numba.njit(cache=True, fastmath=True)
340			def _reflect_positive_integer_lattice(positive_int_lattice: FloatArray) -> FloatArray:
341			"""Reflect points in the positive quadrant across all combinations
342			of axes without duplicating points that are invariant under
343			reflections.
344			"""
345
346			dims = positive_int_lattice.shape[-1]
347			batches = []
348			batches.extend(positive_int_lattice)
349
350			for n_reflections in range(1, dims + 1):
351			axes = np.arange(n_reflections, dtype=np.int64)
352			off_axes = (positive_int_lattice[:, axes] == 0).sum(axis=-1) == 0
353			int_lattice = positive_int_lattice[off_axes]
354			int_lattice[:, axes] *= -1
355			batches.extend(int_lattice)
356
357			while True:
358			i = n_reflections - 1
359			for _ in range(n_reflections):
360			if axes[i] != i + dims - n_reflections:
361			break
362			i -= 1
363			else:
364			break
365			axes[i] += 1
366			for j in range(i + 1, n_reflections):
367			axes[j] = axes[j - 1] + 1
368
369			off_axes = (positive_int_lattice[:, axes] == 0).sum(axis=-1) == 0
370			int_lattice = positive_int_lattice[off_axes]
371			int_lattice[:, axes] *= -1
372			batches.extend(int_lattice)
373
374			int_lattice = np.empty(shape=(len(batches), dims), dtype=np.float64)
375			for i in range(len(batches)):
376			int_lattice[i] = batches[i]
377			return int_lattice
378
379
380			@numba.njit(cache=True, fastmath=True)
381			def _lattice_to_cloud(motif: FloatArray, lattice: FloatArray) -> FloatArray:
382			"""Create a cloud of points from a periodic set with ``motif``,
383			and a collection of lattice points ``lattice``.
384			"""
385			m, dims = motif.shape
386			n_lat_points = lattice.shape[0]
387			layer = np.empty((m * n_lat_points, dims), dtype=np.float64)
388			i = 0
389			for lat_i in range(n_lat_points):
390			for mot_i in range(m):
391			for dim in range(dims):
392			layer[i, dim] = motif[mot_i, dim] + lattice[lat_i, dim]
393			i += 1
394			return layer
395
396
397			@numba.njit(cache=True, fastmath=True)
398			def _cdist_sqeulcidean(XA, XB):
399			mA, dims = XA.shape
400			mB = XB.shape[0]
401			dm = np.empty((mA, mB), np.float64)
402			for i in range(mA):
403			for j in range(mB):
404			v = 0.0
405			for d in range(dims):
406			v += (XA[i, d] - XB[j, d]) ** 2
407			dm[i, j] = v
408			return dm
409
410
411			@numba.njit(cache=True, fastmath=True)
412			def _where_sq_sum_lt_bound(XA, bound):
413			"""Returns an array of bools such that
414			XA[_where_sq_sum_lt_bound(XA, bound)] contains only points whose
415			squared sum is < bound, equivalent to
416			XA[(lattice ** 2).sum(axis=-1) < bound]."""
417			m, dims = XA.shape
418			ret = np.full((m,), fill_value=True, dtype=np.bool_)
419			for i in range(m):
420			v = 0.0
421			for dim in range(dims):
422			v += XA[i, dim] ** 2
423			if v >= bound:
424			ret[i] = False
425			return ret
426
427
428			@numba.njit(cache=True, fastmath=True)
429			def _merge_sorted_arrays(dists: FloatArray, dists_: FloatArray) -> FloatArray:
430			"""Merge two 2D arrays with sorted rows into one sorted array with
431			same number of columns as ``dists``. Optimised assuming that only a
432			few elements at the end of each row of ``dists`` needs to be
433			changed.
434			"""
435
436			m, n_new_points = dists_.shape
437			ret = np.copy(dists)
438
439			for i in range(m):
440			# Traverse dists[i] backwards until value smaller than dists_[i, 0]
441			j = -1
442			dp_ = 0
443			dist_ = dists_[i, dp_]
444			while True:
445			if dists[i, j] <= dist_:
446			j += 1
447			break
448			j -= 1
449
450			# If dists_[i, 0] >= dists[i, -1], no need to insert
451			if j == 0:
452			continue
453
454			# dp points to dists[i], dp_ points to dists_[i], j points to ret[i]
455			# Fill ret with the larger value, increment pointers and repeat
456			dp = j
457			dist = dists[i, dp]
458
459			while j < 0:
460			if dist <= dist_:
461			ret[i, j] = dist
462			dp += 1
463			dist = dists[i, dp]
464			else:
465			ret[i, j] = dist_
466			dp_ += 1
467			if dp_ < n_new_points:
468			dist_ = dists_[i, dp_]
469			else: # ran out of points in dists_
470			dist_ = np.inf
471			j += 1
472
473			return ret
474
475
476			@numba.njit(cache=True, fastmath=True)
477			def _merge_sorted_arrays_and_inds(
478			dists: FloatArray, dists_: FloatArray, inds: UIntArray, inds_: UIntArray
479			) -> Tuple[FloatArray, UIntArray]:
480			"""Similar to _merge_sorted_arrays, but also merges two arrays
481			``inds`` and ``inds_`` with the same pattern for
482			``nearest_neighbors_data``.
483			"""
484
485			m, n_new_points = dists_.shape
486			ret_dists = np.copy(dists)
487			ret_inds = np.copy(inds)
488
489			for i in range(m):
490			j = -1
491			dp_ = 0
492			dist_ = dists_[i, dp_]
493			p_ = inds_[i, dp_]
494
495			while True:
496			if dists[i, j] <= dist_:
497			j += 1
498			break
499			j -= 1
500
501			if j == 0:
502			continue
503
504			dp = j
505			dist = dists[i, dp]
506			p = inds[i, dp]
507
508			while j < 0:
509			if dist <= dist_:
510			ret_dists[i, j] = dist
511			ret_inds[i, j] = p
512			dp += 1
513			dist = dists[i, dp]
514			p = inds[i, dp]
515			else:
516			ret_dists[i, j] = dist_
517			ret_inds[i, j] = p_
518			dp_ += 1
519			if dp_ < n_new_points:
520			dist_ = dists_[i, dp_]
521			p_ = inds_[i, dp_]
522			else:
523			dist_ = np.inf
524			j += 1
525
526			return ret_dists, ret_inds
527
528
529			def nearest_neighbors_minval(
530			motif: FloatArray, cell: FloatArray, x: FloatArray, min_val: float
531			) -> Tuple[FloatArray, FloatArray, UIntArray]:
532			"""Return the same ``dists``/PDD matrix as ``nearest_neighbors``,
533			but with enough columns such that all values in the last column are
534			at least ``min_val``.
535			"""
536
537			full_cloud, all_sqdists = [], []
538			m, dims = motif.shape
539
540			int_lattice, int_lat_generator = _integer_lattice_batches(dims, m, 1)
541			cloud = _lattice_to_cloud(motif, int_lattice @ cell)
542			full_cloud.append(cloud)
543			sqdists = _cdist_sqeulcidean(x, cloud)
544			motif_diam = np.sqrt(sqdists[:, :m].max())
545			all_sqdists.append(sqdists)
546			max_sqd = min_val**2
547			bound = (np.sqrt(max_sqd) + motif_diam) ** 2
548
549			while True:
550			lattice = next(int_lat_generator) @ cell
551			lattice = lattice[np.einsum("ij,ij->i", lattice, lattice) < bound]
552			if lattice.size == 0:
553			break
554
555			cloud = _lattice_to_cloud(motif, lattice)
556			sqdists = _cdist_sqeulcidean(x, cloud)
557			close = sqdists <= max_sqd
558			if not np.any(close):
559			break
560
561			all_sqdists.append(sqdists)
562			full_cloud.append(cloud)
563
564			sqdists = np.hstack(all_sqdists)
565			inds = np.argsort(sqdists)
566			sqdists = np.take_along_axis(sqdists, inds, axis=-1)
567
568			i = np.argmax(np.all(sqdists >= max_sqd, axis=0))
569			sqdists = sqdists[:, : i + 1]
570			inds = inds[:, : i + 1]
571
572			return np.sqrt(sqdists), np.concatenate(full_cloud), inds
573
574
575			def generate_concentric_cloud(
576			motif: FloatArray, cell: FloatArray
577			) -> Iterable[FloatArray]:
578			"""Generates batches of points from a periodic set given by (motif,
579			cell) which get successively further away from the origin.
580
581			Each yield gives all points (that have not already been yielded)
582			which lie in a unit cell whose corner lattice point was generated by
583			``generate_integer_lattice(motif.shape[1])``.
584
585			Parameters
586			----------
587			motif : :class:`numpy.ndarray`
588			Cartesian representation of the motif, shape (no points, dims).
589			cell : :class:`numpy.ndarray`
590			Cartesian representation of the unit cell, shape (dims, dims).
591
592			Yields
593			-------
594			:class:`numpy.ndarray`
595			Yields arrays of points from the periodic set.
596			"""
597
598			int_lat_generator = _integer_lattice_generator(cell.shape[0])
599			for layer in int_lat_generator:
600			yield _lattice_to_cloud(motif, layer @ cell)
601

dwiddo / average-minimum-distance

Push — master ( ad57f0...44572e )

amd._nearest_neighbors.generate_concentric_cloud() A

Complexity

Size

Duplication

Importance

Duplication Side-by-Side

Filter issues like