Completed
Pull Request — master (#1)
by
unknown
09:06
created

UriNormalizer::normalize()   F

Complexity

Conditions 18
Paths 256

Size

Total Lines 40

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 18
nc 256
nop 2
dl 0
loc 40
rs 3.3333
c 0
b 0
f 0

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
namespace GuzzleHttp\Psr7;
3
4
use Psr\Http\Message\UriInterface;
5
6
/**
7
 * Provides methods to normalize and compare URIs.
8
 *
9
 * @author Tobias Schultze
10
 *
11
 * @link https://tools.ietf.org/html/rfc3986#section-6
12
 */
13
final class UriNormalizer
14
{
15
    /**
16
     * Default normalizations which only include the ones that preserve semantics.
17
     *
18
     * self::CAPITALIZE_PERCENT_ENCODING | self::DECODE_UNRESERVED_CHARACTERS | self::CONVERT_EMPTY_PATH |
19
     * self::REMOVE_DEFAULT_HOST | self::REMOVE_DEFAULT_PORT | self::REMOVE_DOT_SEGMENTS
20
     */
21
    const PRESERVING_NORMALIZATIONS = 63;
22
23
    /**
24
     * All letters within a percent-encoding triplet (e.g., "%3A") are case-insensitive, and should be capitalized.
25
     *
26
     * Example: http://example.org/a%c2%b1b → http://example.org/a%C2%B1b
27
     */
28
    const CAPITALIZE_PERCENT_ENCODING = 1;
29
30
    /**
31
     * Decodes percent-encoded octets of unreserved characters.
32
     *
33
     * For consistency, percent-encoded octets in the ranges of ALPHA (%41–%5A and %61–%7A), DIGIT (%30–%39),
34
     * hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and,
35
     * when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.
36
     *
37
     * Example: http://example.org/%7Eusern%61me/ → http://example.org/~username/
38
     */
39
    const DECODE_UNRESERVED_CHARACTERS = 2;
40
41
    /**
42
     * Converts the empty path to "/" for http and https URIs.
43
     *
44
     * Example: http://example.org → http://example.org/
45
     */
46
    const CONVERT_EMPTY_PATH = 4;
47
48
    /**
49
     * Removes the default host of the given URI scheme from the URI.
50
     *
51
     * Only the "file" scheme defines the default host "localhost".
52
     * All of `file:/myfile`, `file:///myfile`, and `file://localhost/myfile`
53
     * are equivalent according to RFC 3986. The first format is not accepted
54
     * by PHPs stream functions and thus already normalized implicitly to the
55
     * second format in the Uri class. See `GuzzleHttp\Psr7\Uri::composeComponents`.
56
     *
57
     * Example: file://localhost/myfile → file:///myfile
58
     */
59
    const REMOVE_DEFAULT_HOST = 8;
60
61
    /**
62
     * Removes the default port of the given URI scheme from the URI.
63
     *
64
     * Example: http://example.org:80/ → http://example.org/
65
     */
66
    const REMOVE_DEFAULT_PORT = 16;
67
68
    /**
69
     * Removes unnecessary dot-segments.
70
     *
71
     * Dot-segments in relative-path references are not removed as it would
72
     * change the semantics of the URI reference.
73
     *
74
     * Example: http://example.org/../a/b/../c/./d.html → http://example.org/a/c/d.html
75
     */
76
    const REMOVE_DOT_SEGMENTS = 32;
77
78
    /**
79
     * Paths which include two or more adjacent slashes are converted to one.
80
     *
81
     * Webservers usually ignore duplicate slashes and treat those URIs equivalent.
82
     * But in theory those URIs do not need to be equivalent. So this normalization
83
     * may change the semantics. Encoded slashes (%2F) are not removed.
84
     *
85
     * Example: http://example.org//foo///bar.html → http://example.org/foo/bar.html
86
     */
87
    const REMOVE_DUPLICATE_SLASHES = 64;
88
89
    /**
90
     * Sort query parameters with their values in alphabetical order.
91
     *
92
     * However, the order of parameters in a URI may be significant (this is not defined by the standard).
93
     * So this normalization is not safe and may change the semantics of the URI.
94
     *
95
     * Example: ?lang=en&article=fred → ?article=fred&lang=en
96
     *
97
     * Note: The sorting is neither locale nor Unicode aware (the URI query does not get decoded at all) as the
98
     * purpose is to be able to compare URIs in a reproducible way, not to have the params sorted perfectly.
99
     */
100
    const SORT_QUERY_PARAMETERS = 128;
101
102
    /**
103
     * Returns a normalized URI.
104
     *
105
     * The scheme and host component are already normalized to lowercase per PSR-7 UriInterface.
106
     * This methods adds additional normalizations that can be configured with the $flags parameter.
107
     *
108
     * PSR-7 UriInterface cannot distinguish between an empty component and a missing component as
109
     * getQuery(), getFragment() etc. always return a string. This means the URIs "/?#" and "/" are
110
     * treated equivalent which is not necessarily true according to RFC 3986. But that difference
111
     * is highly uncommon in reality. So this potential normalization is implied in PSR-7 as well.
112
     *
113
     * @param UriInterface $uri   The URI to normalize
114
     * @param int          $flags A bitmask of normalizations to apply, see constants
115
     *
116
     * @return UriInterface The normalized URI
117
     * @link https://tools.ietf.org/html/rfc3986#section-6.2
118
     */
119
    public static function normalize(UriInterface $uri, $flags = self::PRESERVING_NORMALIZATIONS)
120
    {
121
        if ($flags & self::CAPITALIZE_PERCENT_ENCODING) {
122
            $uri = self::capitalizePercentEncoding($uri);
123
        }
124
125
        if ($flags & self::DECODE_UNRESERVED_CHARACTERS) {
126
            $uri = self::decodeUnreservedCharacters($uri);
127
        }
128
129
        if ($flags & self::CONVERT_EMPTY_PATH && $uri->getPath() === '' &&
130
            ($uri->getScheme() === 'http' || $uri->getScheme() === 'https')
131
        ) {
132
            $uri = $uri->withPath('/');
133
        }
134
135
        if ($flags & self::REMOVE_DEFAULT_HOST && $uri->getScheme() === 'file' && $uri->getHost() === 'localhost') {
136
            $uri = $uri->withHost('');
137
        }
138
139
        if ($flags & self::REMOVE_DEFAULT_PORT && $uri->getPort() !== null && Uri::isDefaultPort($uri)) {
140
            $uri = $uri->withPort(null);
141
        }
142
143
        if ($flags & self::REMOVE_DOT_SEGMENTS && !Uri::isRelativePathReference($uri)) {
144
            $uri = $uri->withPath(UriResolver::removeDotSegments($uri->getPath()));
145
        }
146
147
        if ($flags & self::REMOVE_DUPLICATE_SLASHES) {
148
            $uri = $uri->withPath(preg_replace('#//++#', '/', $uri->getPath()));
149
        }
150
151
        if ($flags & self::SORT_QUERY_PARAMETERS && $uri->getQuery() !== '') {
152
            $queryKeyValues = explode('&', $uri->getQuery());
153
            sort($queryKeyValues);
154
            $uri = $uri->withQuery(implode('&', $queryKeyValues));
155
        }
156
157
        return $uri;
158
    }
159
160
    /**
161
     * Whether two URIs can be considered equivalent.
162
     *
163
     * Both URIs are normalized automatically before comparison with the given $normalizations bitmask. The method also
164
     * accepts relative URI references and returns true when they are equivalent. This of course assumes they will be
165
     * resolved against the same base URI. If this is not the case, determination of equivalence or difference of
166
     * relative references does not mean anything.
167
     *
168
     * @param UriInterface $uri1           An URI to compare
169
     * @param UriInterface $uri2           An URI to compare
170
     * @param int          $normalizations A bitmask of normalizations to apply, see constants
171
     *
172
     * @return bool
173
     * @link https://tools.ietf.org/html/rfc3986#section-6.1
174
     */
175
    public static function isEquivalent(UriInterface $uri1, UriInterface $uri2, $normalizations = self::PRESERVING_NORMALIZATIONS)
176
    {
177
        return (string) self::normalize($uri1, $normalizations) === (string) self::normalize($uri2, $normalizations);
178
    }
179
180 View Code Duplication
    private static function capitalizePercentEncoding(UriInterface $uri)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
181
    {
182
        $regex = '/(?:%[A-Fa-f0-9]{2})++/';
183
184
        $callback = function (array $match) {
185
            return strtoupper($match[0]);
186
        };
187
188
        return
189
            $uri->withPath(
190
                preg_replace_callback($regex, $callback, $uri->getPath())
191
            )->withQuery(
192
                preg_replace_callback($regex, $callback, $uri->getQuery())
193
            );
194
    }
195
196 View Code Duplication
    private static function decodeUnreservedCharacters(UriInterface $uri)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
197
    {
198
        $regex = '/%(?:2D|2E|5F|7E|3[0-9]|[46][1-9A-F]|[57][0-9A])/i';
199
200
        $callback = function (array $match) {
201
            return rawurldecode($match[0]);
202
        };
203
204
        return
205
            $uri->withPath(
206
                preg_replace_callback($regex, $callback, $uri->getPath())
207
            )->withQuery(
208
                preg_replace_callback($regex, $callback, $uri->getQuery())
209
            );
210
    }
211
212
    private function __construct()
213
    {
214
        // cannot be instantiated
215
    }
216
}
217