GitHub Access Token became invalid

It seems like the GitHub access token used for retrieving details about this repository from GitHub became invalid. This might prevent certain types of inspections from being run (in particular, everything related to pull requests).
Please ask an admin of your repository to re-new the access token on this website.

Issues (4082)

Orange/data/variable.py (18 issues)

1
from numbers import Real, Integral
2
from math import isnan, floor
3
import numpy as np
0 ignored issues
show
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
Unused numpy imported as np
Loading history...
4
from pickle import PickleError
5
6
from ..data.value import Value, Unknown
7
import collections
8
9
from . import _variable
0 ignored issues
show
The name _variable does not seem to exist in module Orange.data.
Loading history...
10
11
ValueUnknown = Unknown  # Shadowing within classes
12
13
14
def make_variable(cls, compute_value, *args):
15
    if compute_value is not None:
16
        return cls(*args, compute_value=compute_value)
17
    return cls.make(*args)
18
19
20
class VariableMeta(type):
21
    # noinspection PyMethodParameters
22
    def __new__(mcs, name, *args):
23
        cls = type.__new__(mcs, name, *args)
24
        if not hasattr(cls, '_all_vars') or cls._all_vars is Variable._all_vars:
0 ignored issues
show
Coding Style Best Practice introduced by
It seems like _all_vars was declared protected and should not be accessed from this context.

Prefixing a member variable _ is usually regarded as the equivalent of declaring it with protected visibility that exists in other languages. Consequentially, such a member should only be accessed from the same class or a child class:

class MyParent:
    def __init__(self):
        self._x = 1;
        self.y = 2;

class MyChild(MyParent):
    def some_method(self):
        return self._x    # Ok, since accessed from a child class

class AnotherClass:
    def some_method(self, instance_of_my_child):
        return instance_of_my_child._x   # Would be flagged as AnotherClass is not
                                         # a child class of MyParent
Loading history...
25
            cls._all_vars = {}
0 ignored issues
show
Coding Style Best Practice introduced by
It seems like _all_vars was declared protected and should not be accessed from this context.

Prefixing a member variable _ is usually regarded as the equivalent of declaring it with protected visibility that exists in other languages. Consequentially, such a member should only be accessed from the same class or a child class:

class MyParent:
    def __init__(self):
        self._x = 1;
        self.y = 2;

class MyChild(MyParent):
    def some_method(self):
        return self._x    # Ok, since accessed from a child class

class AnotherClass:
    def some_method(self, instance_of_my_child):
        return instance_of_my_child._x   # Would be flagged as AnotherClass is not
                                         # a child class of MyParent
Loading history...
26
        if name != "Variable":
27
            Variable._variable_types.append(cls)
0 ignored issues
show
Coding Style Best Practice introduced by
It seems like _variable_types was declared protected and should not be accessed from this context.

Prefixing a member variable _ is usually regarded as the equivalent of declaring it with protected visibility that exists in other languages. Consequentially, such a member should only be accessed from the same class or a child class:

class MyParent:
    def __init__(self):
        self._x = 1;
        self.y = 2;

class MyChild(MyParent):
    def some_method(self):
        return self._x    # Ok, since accessed from a child class

class AnotherClass:
    def some_method(self, instance_of_my_child):
        return instance_of_my_child._x   # Would be flagged as AnotherClass is not
                                         # a child class of MyParent
Loading history...
28
        return cls
29
30
31
class Variable(metaclass=VariableMeta):
32
    """
33
    The base class for variable descriptors contains the variable's
34
    name and some basic properties.
35
36
    .. attribute:: name
37
38
        The name of the variable.
39
40
    .. attribute:: unknown_str
41
42
        A set of values that represent unknowns in conversion from textual
43
        formats. Default is `{"?", ".", "", "NA", "~", None}`.
44
45
    .. attribute:: compute_value
46
47
        A function for computing the variable's value when converting from
48
        another domain which does not contain this variable. The base class
49
        defines a static method `compute_value`, which returns `Unknown`.
50
        Non-primitive variables must redefine it to return `None`.
51
52
    .. attribute:: source_variable
53
54
        An optional descriptor of the source variable - if any - from which
55
        this variable is derived and computed via :obj:`compute_value`.
56
57
    .. attribute:: attributes
58
59
        A dictionary with user-defined attributes of the variable
60
    """
61
62
    _DefaultUnknownStr = {"?", ".", "", "NA", "~", None}
63
64
    _variable_types = []
65
    Unknown = ValueUnknown
66
67
    def __init__(self, name="", compute_value=None):
68
        """
69
        Construct a variable descriptor.
70
        """
71
        self.name = name
72
        self._compute_value = compute_value
73
        self.unknown_str = set(Variable._DefaultUnknownStr)
74
        self.source_variable = None
75
        self.attributes = {}
76
        if name and compute_value is None:
77
            if isinstance(self._all_vars, collections.defaultdict):
78
                self._all_vars[name].append(self)
79
            else:
80
                self._all_vars[name] = self
81
82
    @classmethod
83
    def make(cls, name):
84
        """
85
        Return an existing continuous variable with the given name, or
86
        construct and return a new one.
87
        """
88
        if not name:
89
            raise ValueError("Variables without names cannot be stored or made")
90
        return cls._all_vars.get(name) or cls(name)
91
92
    @classmethod
93
    def _clear_cache(cls):
94
        """
95
        Clear the list of variables for reuse by :obj:`make`.
96
        """
97
        cls._all_vars.clear()
98
99
    @classmethod
100
    def _clear_all_caches(cls):
101
        """
102
        Clears list of stored variables for all subclasses
103
        """
104
        for cls0 in cls._variable_types:
105
            cls0._clear_cache()
0 ignored issues
show
Coding Style Best Practice introduced by
It seems like _clear_cache was declared protected and should not be accessed from this context.

Prefixing a member variable _ is usually regarded as the equivalent of declaring it with protected visibility that exists in other languages. Consequentially, such a member should only be accessed from the same class or a child class:

class MyParent:
    def __init__(self):
        self._x = 1;
        self.y = 2;

class MyChild(MyParent):
    def some_method(self):
        return self._x    # Ok, since accessed from a child class

class AnotherClass:
    def some_method(self, instance_of_my_child):
        return instance_of_my_child._x   # Would be flagged as AnotherClass is not
                                         # a child class of MyParent
Loading history...
106
107
    @staticmethod
108
    def is_primitive():
109
        """
110
        `True` if the variable's values are stored as floats.
111
        Primitive variables are :obj:`~data.DiscreteVariable` and
112
        :obj:`~data.ContinuousVariable`. Non-primitive variables can appear
113
        in the data only as meta attributes.
114
115
        Derived classes must overload the function.
116
        """
117
        raise RuntimeError("variable descriptors must overload is_primitive()")
118
119
    @property
120
    def is_discrete(self):
121
        return isinstance(self, DiscreteVariable)
122
123
    @property
124
    def is_continuous(self):
125
        return isinstance(self, ContinuousVariable)
126
127
    @property
128
    def is_string(self):
129
        return isinstance(self, StringVariable)
130
131
    def repr_val(self, val):
0 ignored issues
show
The argument val seems to be unused.
Loading history...
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
132
        """
133
        Return a textual representation of variable's value `val`. Argument
134
        `val` must be a float (for primitive variables) or an arbitrary
135
        Python object (for non-primitives).
136
137
        Derived classes must overload the function.
138
        """
139
        raise RuntimeError("variable descriptors must overload repr_val()")
140
141
    str_val = repr_val
142
143
    def to_val(self, s):
144
        """
145
        Convert the given argument to a value of the variable. The
146
        argument can be a string, a number or `None`. For primitive variables,
147
        the base class provides a method that returns
148
        :obj:`~Orange.data.value.Unknown` if `s` is found in
149
        :obj:`~Orange.data.Variable.unknown_str`, and raises an exception
150
        otherwise. For non-primitive variables it returns the argument itself.
151
152
        Derived classes of primitive variables must overload the function.
153
154
        :param s: value, represented as a number, string or `None`
155
        :type s: str, float or None
156
        :rtype: float or object
157
        """
158
        if not self.is_primitive():
159
            return s
160
        if s in self.unknown_str:
161
            return Unknown
162
        raise RuntimeError(
163
            "primitive variable descriptors must overload to_val()")
164
165
    def val_from_str_add(self, s):
166
        """
167
        Convert the given string to a value of the variable. The method
168
        is similar to :obj:`to_val` except that it only accepts strings and
169
        that it adds new values to the variable's domain where applicable.
170
171
        The base class method calls `to_val`.
172
173
        :param s: symbolic representation of the value
174
        :type s: str
175
        :rtype: float or object
176
        """
177
        return self.to_val(s)
178
179
    def __str__(self):
180
        return self.name
181
182
    def __repr__(self):
183
        """
184
        Return a representation of the variable, like,
185
        `'DiscreteVariable("gender")'`. Derived classes may overload this
186
        method to provide a more informative representation.
187
        """
188
        return "{}('{}')".format(self.__class__.__name__, self.name)
189
190
    @property
191
    def compute_value(self):
192
        return self._compute_value
193
194
    def __reduce__(self):
195
        if not self.name:
196
            raise PickleError("Variables without names cannot be pickled")
197
198
        return make_variable, (self.__class__, self._compute_value, self.name), self.__dict__
199
200
    def copy(self, compute_value):
201
        return Variable(self.name, compute_value)
202
203
        
0 ignored issues
show
Trailing whitespace
Loading history...
204
    # Functionality for LazyTables.
205
    # __eq__, __ne__ and __hash__ are necessary to test for equality
206
    # when concatenating LazyTables with extend(). The Variables cannot be
207
    # checked for identity, because the Variables can be recreated.
208
    # E.g. when more data is coming in over SAMP.
209
210
    def __eq__(self, other):
211
        if not isinstance(other, self.__class__):
212
            return False
213
        # Unsure whether this is all necessary.
214
        bs = [
215
            self.name == other.name,
216
            #self._DefaultUnknownStr == other._DefaultUnknownStr,
217
            #self._variable_types == other._variable_types,
218
            #self.Unknown == other.Unknown,
219
            self._compute_value == other._compute_value,
0 ignored issues
show
Coding Style Best Practice introduced by
It seems like _compute_value was declared protected and should not be accessed from this context.

Prefixing a member variable _ is usually regarded as the equivalent of declaring it with protected visibility that exists in other languages. Consequentially, such a member should only be accessed from the same class or a child class:

class MyParent:
    def __init__(self):
        self._x = 1;
        self.y = 2;

class MyChild(MyParent):
    def some_method(self):
        return self._x    # Ok, since accessed from a child class

class AnotherClass:
    def some_method(self, instance_of_my_child):
        return instance_of_my_child._x   # Would be flagged as AnotherClass is not
                                         # a child class of MyParent
Loading history...
220
            #self.source_variable == other.source_variable,
221
            #self.attributes == other.attributes,
222
        ]
223
        b = all(bs)
224
        return b
225
226
    def __ne__(self, other):
227
        return not self.__eq__(other)
228
229
    def __hash__(self):
230
        return hash(self.name)
231
232
        
0 ignored issues
show
Trailing whitespace
Loading history...
233
class ContinuousVariable(Variable):
234
    """
235
    Descriptor for continuous variables.
236
237
    .. attribute:: number_of_decimals
238
239
        The number of decimals when the value is printed out (default: 3).
240
241
    .. attribute:: adjust_decimals
242
243
        A flag regulating whether the `number_of_decimals` is being adjusted
244
        by :obj:`to_val`.
245
246
    The value of `number_of_decimals` is set to 3 and `adjust_decimals`
247
    is set to 2. When :obj:`val_from_str_add` is called for the first
248
    time with a string as an argument, `number_of_decimals` is set to the
249
    number of decimals in the string and `adjust_decimals` is set to 1.
250
    In the subsequent calls of `to_val`, the nubmer of decimals is
251
    increased if the string argument has a larger number of decimals.
252
253
    If the `number_of_decimals` is set manually, `adjust_decimals` is
254
    set to 0 to prevent changes by `to_val`.
255
    """
256
257
    def __init__(self, name="", number_of_decimals=None, compute_value=None):
258
        """
259
        Construct a new continuous variable. The number of decimals is set to
260
        three, but adjusted at the first call of :obj:`to_val`.
261
        """
262
        super().__init__(name, compute_value)
263
        if number_of_decimals is None:
264
            self.number_of_decimals = 3
265
            self.adjust_decimals = 2
266
        else:
267
            self.number_of_decimals = number_of_decimals
268
269
    @property
270
    def number_of_decimals(self):
271
        return self._number_of_decimals
272
273
    # noinspection PyAttributeOutsideInit
274
    @number_of_decimals.setter
275
    def number_of_decimals(self, x):
276
        self._number_of_decimals = x
0 ignored issues
show
The attribute _number_of_decimals was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
277
        self.adjust_decimals = 0
278
        self._out_format = "%.{}f".format(self.number_of_decimals)
0 ignored issues
show
The attribute _out_format was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
279
280
    @staticmethod
281
    def is_primitive():
282
        """ Return `True`: continuous variables are stored as floats."""
283
        return True
284
285
    def to_val(self, s):
286
        """
287
        Convert a value, given as an instance of an arbitrary type, to a float.
288
        """
289
        if s in self.unknown_str:
290
            return Unknown
291
        return float(s)
292
293
    def val_from_str_add(self, s):
294
        """
295
        Convert a value from a string and adjust the number of decimals if
296
        `adjust_decimals` is non-zero.
297
        """
298
        return _variable.val_from_str_add_cont(self, s)
299
300
    def repr_val(self, val):
301
        """
302
        Return the value as a string with the prescribed number of decimals.
303
        """
304
        if isnan(val):
305
            return "?"
306
        return self._out_format % val
307
308
    str_val = repr_val
309
310
    def copy(self, compute_value=None):
311
        return ContinuousVariable(self.name, self.number_of_decimals, compute_value)
312
313
314
class DiscreteVariable(Variable):
315
    """
316
    Descriptor for symbolic, discrete variables. Values of discrete variables
317
    are stored as floats; the numbers corresponds to indices in the list of
318
    values.
319
320
    .. attribute:: values
321
322
        A list of variable's values.
323
324
    .. attribute:: ordered
325
326
        Some algorithms (and, in particular, visualizations) may
327
        sometime reorder the values of the variable, e.g. alphabetically.
328
        This flag hints that the given order of values is "natural"
329
        (e.g. "small", "middle", "large") and should not be changed.
330
331
    .. attribute:: base_value
332
333
        The index of the base value, or -1 if there is none. The base value is
334
        used in some methods like, for instance, when creating dummy variables
335
        for regression.
336
    """
337
    _all_vars = collections.defaultdict(list)
338
    presorted_values = []
339
340
    def __init__(self, name="", values=(), ordered=False, base_value=-1, compute_value=None):
341
        """ Construct a discrete variable descriptor with the given values. """
342
        super().__init__(name, compute_value)
343
        self.ordered = ordered
344
        self.values = list(values)
345
        self.base_value = base_value
346
347
    def __repr__(self):
348
        """
349
        Give a string representation of the variable, for instance,
350
        `"DiscreteVariable('Gender', values=['male', 'female'])"`.
351
        """
352
        args = "values=[{}]".format(
353
            ", ".join([repr(x) for x in self.values[:5]] +
354
                      ["..."] * (len(self.values) > 5)))
355
        if self.ordered:
356
            args += ", ordered=True"
357
        if self.base_value >= 0:
358
            args += ", base_value={}".format(self.base_value)
359
        return "{}('{}', {})".format(self.__class__.__name__, self.name, args)
360
361
    @staticmethod
362
    def is_primitive():
363
        """ Return `True`: discrete variables are stored as floats. """
364
        return True
365
366
    def to_val(self, s):
367
        """
368
        Convert the given argument to a value of the variable (`float`).
369
        If the argument is numeric, its value is returned without checking
370
        whether it is integer and within bounds. `Unknown` is returned if the
371
        argument is one of the representations for unknown values. Otherwise,
372
        the argument must be a string and the method returns its index in
373
        :obj:`values`.
374
375
        :param s: values, represented as a number, string or `None`
376
        :rtype: float
377
        """
378
        if s is None:
379
            return ValueUnknown
380
381
        if isinstance(s, Integral):
382
            return s
383
        if isinstance(s, Real):
384
            return s if isnan(s) else floor(s + 0.25)
385
        if s in self.unknown_str:
386
            return ValueUnknown
387
        if not isinstance(s, str):
388
            raise TypeError('Cannot convert {} to value of "{}"'.format(
389
                type(s).__name__, self.name))
390
        return self.values.index(s)
391
392
    def add_value(self, s):
393
        """ Add a value `s` to the list of values.
394
        """
395
        self.values.append(s)
396
397
    def val_from_str_add(self, s):
398
        """
399
        Similar to :obj:`to_val`, except that it accepts only strings and that
400
        it adds the value to the list if it does not exist yet.
401
402
        :param s: symbolic representation of the value
403
        :type s: str
404
        :rtype: float
405
        """
406
        s = str(s) if s is not None else s
407
        try:
408
            return ValueUnknown if s in self.unknown_str \
409
                else self.values.index(s)
410
        except ValueError:
411
            self.add_value(s)
412
            return len(self.values) - 1
413
414
    def repr_val(self, val):
415
        """
416
        Return a textual representation of the value (`self.values[int(val)]`)
417
        or "?" if the value is unknown.
418
419
        :param val: value
420
        :type val: float (should be whole number)
421
        :rtype: str
422
        """
423
        if isnan(val):
424
            return "?"
425
        return '{}'.format(self.values[int(val)])
426
427
    str_val = repr_val
428
429
    def __reduce__(self):
430
        if not self.name:
431
            raise PickleError("Variables without names cannot be pickled")
432
        return make_variable, (self.__class__, self._compute_value, self.name,
433
                               self.values, self.ordered, self.base_value), \
434
            self.__dict__
435
436
    @classmethod
437
    def make(cls, name, values=(), ordered=False, base_value=-1):
0 ignored issues
show
Arguments number differs from overridden 'make' method
Loading history...
438
        """
439
        Return a variable with the given name and other properties. The method
440
        first looks for a compatible existing variable: the existing
441
        variable must have the same name and both variables must have either
442
        ordered or unordered values. If values are ordered, the order must be
443
        compatible: all common values must have the same order. If values are
444
        unordered, the existing variable must have at least one common value
445
        with the new one, except when any of the two lists of values is empty.
446
447
        If a compatible variable is find, it is returned, with missing values
448
        appended to the end of the list. If there is no explicit order, the
449
        values are ordered using :obj:`ordered_values`. Otherwise, it
450
        constructs and returns a new variable descriptor.
451
452
        :param name: the name of the variable
453
        :type name: str
454
        :param values: symbolic values for the variable
455
        :type values: list
456
        :param ordered: tells whether the order of values is fixed
457
        :type ordered: bool
458
        :param base_value: the index of the base value, or -1 if there is none
459
        :type base_value: int
460
        :returns: an existing compatible variable or `None`
461
        """
462
        if not name:
463
            raise ValueError("Variables without names cannot be stored or made")
464
        var = cls._find_compatible(
465
            name, values, ordered, base_value)
466
        if var:
467
            return var
468
        if not ordered:
469
            base_value_rep = base_value != -1 and values[base_value]
470
            values = cls.ordered_values(values)
471
            if base_value != -1:
472
                base_value = values.index(base_value_rep)
473
        return cls(name, values, ordered, base_value)
474
475
    @classmethod
476
    def _find_compatible(cls, name, values=(), ordered=False, base_value=-1):
477
        """
478
        Return a compatible existing value, or `None` if there is None.
479
        See :obj:`make` for details; this function differs by returning `None`
480
        instead of constructing a new descriptor. (Method :obj:`make` calls
481
        this function.)
482
483
        :param name: the name of the variable
484
        :type name: str
485
        :param values: symbolic values for the variable
486
        :type values: list
487
        :param ordered: tells whether the order of values is fixed
488
        :type ordered: bool
489
        :param base_value: the index of the base value, or -1 if there is none
490
        :type base_value: int
491
        :returns: an existing compatible variable or `None`
492
        """
493
        base_rep = base_value != -1 and values[base_value]
494
        existing = cls._all_vars.get(name)
495
        if existing is None:
496
            return None
497
        if not ordered:
498
            values = cls.ordered_values(values)
499
        for var in existing:
500
            if (var.ordered != ordered or
501
                    var.base_value != -1
502
                    and var.values[var.base_value] != base_rep):
503
                continue
504
            if not values:
505
                break  # we have the variable - any existing values are OK
506
            if not set(var.values) & set(values):
507
                continue  # empty intersection of values; not compatible
508
            if ordered:
509
                i = 0
510
                for val in var.values:
511
                    if values[i] == val:
512
                        i += 1
513
                        if i == len(values):
514
                            break  # we have all the values
515
                else:  # we have some remaining values: check them, add them
516
                    if set(values[i:]) & set(var.values):
517
                        continue  # next var in existing
518
                    for val in values[i:]:
519
                        var.add_value(val)
520
                break  # we have the variable
521
            else:  # not ordered
522
                vv = set(var.values)
523
                for val in values:
524
                    if val not in vv:
525
                        var.add_value(val)
526
                break  # we have the variable
527
        else:
528
            return None
529
        if base_value != -1 and var.base_value == -1:
0 ignored issues
show
The loop variable var might not be defined here.
Loading history...
530
            var.base_value = var.values.index(base_rep)
0 ignored issues
show
The loop variable var might not be defined here.
Loading history...
531
        return var
0 ignored issues
show
The loop variable var might not be defined here.
Loading history...
532
533
    @staticmethod
534
    def ordered_values(values):
535
        """
536
        Return a sorted list of values. If there exists a prescribed order for
537
        such set of values, it is returned. Otherwise, values are sorted
538
        alphabetically.
539
        """
540
        for presorted in DiscreteVariable.presorted_values:
541
            if values == set(presorted):
542
                return presorted
543
        return sorted(values)
544
545
    def copy(self, compute_value=None):
546
        return DiscreteVariable(self.name, self.values, self.ordered,
547
                                self.base_value, compute_value)
548
549
550
class StringVariable(Variable):
551
    """
552
    Descriptor for string variables. String variables can only appear as
553
    meta attributes.
554
    """
555
    Unknown = None
556
557
    @staticmethod
558
    def is_primitive():
559
        """Return `False`: string variables are not stored as floats."""
560
        return False
561
562
    def to_val(self, s):
563
        """
564
        Return the value as a string. If it is already a string, the same
565
        object is returned.
566
        """
567
        if s is None:
568
            return ""
569
        if isinstance(s, str):
570
            return s
571
        return str(s)
572
573
    val_from_str_add = to_val
574
575
    @staticmethod
576
    def str_val(val):
577
        """Return a string representation of the value."""
578
        if isinstance(val, Real) and isnan(val):
579
            return "?"
580
        if isinstance(val, Value):
581
            if val.value is None:
582
                return "None"
583
            val = val.value
584
        return str(val)
585
586
    def repr_val(self, val):
587
        """Return a string representation of the value."""
588
        return '"{}"'.format(self.str_val(val))
589