Passed
Push — master ( 9028fe...093f3d )
by Jeroen
02:44
created

XmpMetadataExtractor::getXmpXmlString()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 7
Code Lines 4

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 7
c 0
b 0
f 0
rs 9.4285
cc 1
eloc 4
nc 1
nop 1
1
<?php
2
3
namespace JeroenDesloovere\XmpMetadataExtractor;
4
5
use DOMDocument;
6
use JeroenDesloovere\XmpMetadataExtractor\Exception\FileNotFoundException;
7
use SplFileInfo;
8
9
final class XmpMetadataExtractor
10
{
11
    protected const RDF_ALT = 'rdf:Alt';
12
    protected const RDF_BAG = 'rdf:Bag';
13
    protected const RDF_LI = 'rdf:li';
14
    protected const RDF_SEQ = 'rdf:Seq';
15
    protected const POSSIBLE_CONTAINERS = [
16
        self::RDF_ALT,
17
        self::RDF_BAG,
18
        self::RDF_SEQ,
19
    ];
20
21
    private function convertDomNode($node)
22
    {
23
        switch ($node->nodeType) {
24
            case XML_CDATA_SECTION_NODE:
25
            case XML_TEXT_NODE:
26
                return trim($node->textContent);
27
28
                break;
0 ignored issues
show
Unused Code introduced by
break is not strictly necessary here and could be removed.

The break statement is not necessary if it is preceded for example by a return statement:

switch ($x) {
    case 1:
        return 'foo';
        break; // This break is not necessary and can be left off.
}

If you would like to keep this construct to be consistent with other case statements, you can safely mark this issue as a false-positive.

Loading history...
29
            case XML_ELEMENT_NODE:
30
                return $this->convertXmlNode($node);
31
32
                break;
33
        }
34
    }
35
36
    private function convertXmlNode(\DOMElement $node)
37
    {
38
        $output = [];
39
40
        for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++) {
41
            $child = $node->childNodes->item($i);
42
            $v = $this->convertDomNode($child);
43
44
            if (isset($child->tagName)) {
45
                $t = $child->tagName;
46
                if (!isset($output[$t])) {
47
                    $output[$t] = array();
48
                }
49
                $output[$t][] = $v;
50
            } elseif ($v || $v === '0') {
51
                $output = (string)$v;
52
            }
53
        }
54
55
        if ($node->attributes->length && !is_array($output)) { //Has attributes but isn't an array
0 ignored issues
show
Bug introduced by
The property length does not seem to exist on DOMNamedNodeMap.
Loading history...
56
            $output = array('@content' => $output); //Change output into an array.
57
        }
58
59
        if (is_array($output)) {
60
            if ($node->attributes->length) {
61
                $a = array();
62
                foreach ($node->attributes as $attrName => $attrNode) {
63
                    $a[$attrName] = (string)$attrNode->value;
64
                }
65
                $output['@attributes'] = $a;
66
            }
67
68
            foreach ($output as $t => $v) {
69
                // We are combining arrays for rdf:Bag, rdf:Alt, rdf:Seq
70
                if (in_array($t, self::POSSIBLE_CONTAINERS)) {
71
                    if (!array_key_exists(self::RDF_LI, $v[0])) {
72
                        break;
73
                    }
74
75
                    $output = $v[0][self::RDF_LI];
76
                } elseif (is_array($v) && count($v) == 1 && $t != '@attributes') {
77
                    $output[$t] = $v[0];
78
                }
79
            }
80
        }
81
82
        return $output;
83
    }
84
85
    public function extractFromContent(string $content): array
86
    {
87
        try {
88
            $doc = new DOMDocument();
0 ignored issues
show
Bug introduced by
The call to DOMDocument::__construct() has too few arguments starting with version. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

88
            $doc = /** @scrutinizer ignore-call */ new DOMDocument();

This check compares calls to functions or methods with their respective definitions. If the call has less arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
89
            $doc->loadXML($this->getXmpXmlString($content));
90
91
            $root = $doc->documentElement;
92
            $output = $this->convertDomNode($root);
93
            $output['@root'] = $root->tagName;
94
95
            return $output;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $output could return the type string which is incompatible with the type-hinted return array. Consider adding an additional type-check to rule them out.
Loading history...
96
        } catch (\Exception $e) {
97
            return [];
98
        }
99
    }
100
101
    public function extractFromFile(string $file): array
102
    {
103
        try {
104
            $file = new SplFileInfo($file);
105
            $contents = file_get_contents($file->getPathname());
106
        } catch (\Exception $e) {
107
            throw new FileNotFoundException('The given File could not be found.');
108
        }
109
110
        return $this->extractFromContent($contents);
111
    }
112
113
    private function getXmpXmlString(string $content): string
114
    {
115
        $xmpDataStart = strpos($content, '<x:xmpmeta');
116
        $xmpDataEnd = strpos($content, '</x:xmpmeta>');
117
        $xmpLength = $xmpDataEnd - $xmpDataStart;
118
119
        return substr($content, $xmpDataStart, $xmpLength + 12);
120
    }
121
}
122