Completed
Branch master (5dcfeb)
by Steffen
04:40 queued 02:12
created

titlesearch.language.detection.matches_language()   B

Complexity

Conditions 6

Size

Total Lines 18
Code Lines 8

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 6
eloc 8
nop 2
dl 0
loc 18
rs 8
c 0
b 0
f 0
1
#!/usr/local/bin/python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
# coding: utf-8
3
4
import binascii
5
import re
6
from typing import Generator, Type
7
8
import numpy as np
9
10
from titlesearch.language import LanguageTemplate
11
12
13
def extract_unicode_characters(string: str) -> Generator:
14
    """Escape all unicode characters and return a generator for the int values of the unicode characters
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
15
16
    :type string: str
17
    :return:
18
    """
19
    unicode_characters = re.findall(b'\\\\u([a-f0-9]{4})', string.encode('unicode_escape'))
20
    for x in unicode_characters:
0 ignored issues
show
Coding Style Naming introduced by
The name x does not conform to the variable naming conventions ((([a-z][a-z0-9_]{2,30})|(_[a-z0-9_]*))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
21
        s = binascii.unhexlify(x)
0 ignored issues
show
Coding Style Naming introduced by
The name s does not conform to the variable naming conventions ((([a-z][a-z0-9_]{2,30})|(_[a-z0-9_]*))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
22
        yield int.from_bytes(s, byteorder='big')
23
24
25
def matches_language(title: str, language: Type[LanguageTemplate]) -> bool:
26
    """Determine based on unicode elements, if the title matches the language pattern.
27
28
    :type title: str
29
    :type language: LanguageTemplate
30
    :return:
31
    """
32
    unicode_characters = list(extract_unicode_characters(title))
33
    if language.requires_unicode_characters and not unicode_characters:
34
        return False
35
36
    if language.forbids_unicode_characters and unicode_characters:
37
        return False
38
39
    # not sure but all titles I found so far have a clear character set, not shared
40
    # noinspection PyTypeChecker
41
    return all([np.any((language.unicode_character_lowers <= int(unichar)) &
42
                       (int(unichar) <= language.unicode_character_uppers)) for unichar in unicode_characters])
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (111/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
43