Passed
Pull Request — master (#6921)
by
unknown
09:03
created

Imscc13Import   F

Complexity

Total Complexity 103

Size/Duplication

Total Lines 597
Duplicated Lines 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
eloc 289
c 1
b 0
f 0
dl 0
loc 597
rs 2
wmc 103

15 Methods

Rating   Name   Duplication   Size   Complexity  
A assertResourceFsWritable() 0 20 6
A firstNonEmpty() 0 9 3
A normalizeSchemaLabels() 0 10 2
A writeTempValidatedCopy() 0 13 4
B parseWebLink() 0 41 7
F bestEffortImportLinksAndDiscussions() 0 150 23
A log() 0 4 1
A wrapAsLegacy() 0 8 2
B parseDiscussionTopic() 0 34 7
B unzip() 0 33 10
A rrmdir() 0 13 4
B execute() 0 66 10
B detectFormat() 0 27 9
B classifyCcResourceType() 0 34 9
A makeManifestValidationCopy() 0 40 6

How to fix   Complexity   

Complex Class

Complex classes like Imscc13Import often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Imscc13Import, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Chamilo\CourseBundle\Component\CourseCopy\CommonCartridge\Import;
6
7
use Chamilo\CoreBundle\Framework\Container;
8
use Chamilo\CourseBundle\Component\CourseCopy\CommonCartridge\Import\Base\Validator\ManifestValidator;
9
use Chamilo\CourseBundle\Component\CourseCopy\CourseRestorer;
10
use DOMDocument;
11
use DOMElement;
12
use DOMXPath;
13
use FilesystemIterator;
14
use PclZip;
15
use RecursiveDirectoryIterator;
16
use RecursiveIteratorIterator;
17
use RuntimeException;
18
use Symfony\Component\Filesystem\Filesystem;
19
use Throwable;
20
use ZipArchive;
21
22
use const DIRECTORY_SEPARATOR;
23
use const PCLZIP_OPT_PATH;
0 ignored issues
show
Bug introduced by
The constant PCLZIP_OPT_PATH was not found. Maybe you did not declare it correctly or list all dependencies?
Loading history...
24
25
class Imscc13Import
26
{
27
    public const FORMAT_IMSCC13 = 'imscc13';
28
29
    public function log(string $message, string|int $level = 'info', $a = null, $depth = null, bool $display = false): void
30
    {
31
        // Minimal, central logger for importer
32
        error_log("(imscc13) $message , level: $level , extra: ".json_encode($a));
33
    }
34
35
    /**
36
     * Quick check to verify an extracted folder looks like CC 1.3.
37
     * Less strict: accepts manifests whose default NS is plain imscp_v1p1
38
     * as long as we detect CC 1.3 traits (schemaversion 1.3.x or v1p3 tokens).
39
     */
40
    public static function detectFormat(string $extractedDir): ?string
41
    {
42
        $manifest = Cc1p3Convert::getManifest($extractedDir);
43
        if (!$manifest || !is_file($manifest)) {
44
            return null;
45
        }
46
47
        // Read a small chunk (up to 64 KiB) to detect tokens fast.
48
        $buf = (string) @file_get_contents($manifest, false, null, 0, 65536);
49
        if ('' === $buf) {
50
            return null;
51
        }
52
        $lc = strtolower($buf);
53
54
        // Heuristics that signal CC 1.3 packages:
55
        //  - schemaversion 1.3.x
56
        //  - resource/@type or xmlns entries containing "v1p3"
57
        //  - lomimscc CC 1.3 LOM namespace
58
        $has13 = (str_contains($lc, '<schemaversion>1.3') || str_contains($lc, 'schemaversion">1.3'));
59
        $hasV13Tokens = (str_contains($lc, 'v1p3') || str_contains($lc, 'imsccv1p3'));
60
        $hasLomImscc = str_contains($lc, 'http://ltsc.ieee.org/xsd/imsccv1p3/lom/manifest');
61
62
        if ($has13 || $hasV13Tokens || $hasLomImscc) {
63
            return self::FORMAT_IMSCC13;
64
        }
65
66
        return null;
67
    }
68
69
    /**
70
     * Validates the manifest and triggers the converter pipeline that creates
71
     * Chamilo resources (documents, links, forums, quizzes).
72
     *
73
     * After the standard converter runs, a "best-effort" importer kicks in for
74
     * types not (yet) handled by the converter:
75
     *   - imswl_xmlv1p1 (Web Links)  -> Links tool
76
     *   - imsdt_xmlv1p1 (Discussions)-> Forum + Thread + Post
77
     */
78
    public function execute(string $extractedDir): void
79
    {
80
        $manifest = Cc1p3Convert::getManifest($extractedDir);
81
        if (!$manifest || !is_file($manifest)) {
82
            throw new RuntimeException('No imsmanifest.xml detected.');
83
        }
84
85
        // Resolve schema dir inside the component
86
        $schemaDir = __DIR__.'/Base/Validator/schemas13';
87
        if (!is_file($schemaDir.'/cc13libxml2validator.xsd')) {
88
            $alt = __DIR__.'/schemas13';
89
            if (is_file($alt.'/cc13libxml2validator.xsd')) {
90
                $schemaDir = $alt;
91
            } else {
92
                throw new RuntimeException('Manifest validation error(s): XSD file not found at '.$schemaDir.' nor '.$alt);
93
            }
94
        }
95
96
        $this->log('imscc13: using schemaDir='.$schemaDir.' skip=0');
97
98
        // 1st pass: raw validation
99
        $validator = new ManifestValidator($schemaDir);
100
        if (!$validator->validate($manifest)) {
101
            $this->log('imscc13: first validation failed; retry with normalized schema labels', 'warn');
102
103
            // Build a patched copy for validation-only, using DOM (no substr/strpos).
104
            $manifestForValidation = self::makeManifestValidationCopy($manifest);
105
106
            $this->log('imscc13: validating using patched manifest copy', 'info', [
107
                'original' => $manifest,
108
                'patched' => $manifestForValidation,
109
            ]);
110
111
            $validator2 = new ManifestValidator($schemaDir);
112
            if (!$validator2->validate($manifestForValidation)) {
113
                // Do not block the import anymore; continue best-effort as agreed
114
                $this->log('imscc13: validation still failing; proceeding in best-effort mode (schema check skipped).', 'warn');
115
            }
116
        }
117
118
        self::assertResourceFsWritable();
119
120
        // Standard converter pipeline (keep existing behavior)
121
        try {
122
            $cc = new Cc1p3Convert($manifest);
123
            if ($cc->isAuth()) {
124
                // CC with basiclti/authorization not supported in this importer
125
                throw new RuntimeException('Protected Common Cartridge is not supported.');
126
            }
127
            $cc->generateImportData();
128
            $this->log('imscc13: converter pipeline executed', 'info');
129
        } catch (Throwable $e) {
130
            // We don't fail; we will try our best-effort importer below.
131
            $this->log('imscc13: converter pipeline failed; falling back to built-in importer', 'warn', [
132
                'error' => $e->getMessage(),
133
            ]);
134
        }
135
136
        // --- Best-effort importer for WebLinks + Discussions (non-destructive) ---
137
        try {
138
            $added = $this->bestEffortImportLinksAndDiscussions($manifest, $extractedDir);
139
            $this->log('imscc13: best-effort import finished', 'info', $added);
140
            // If nothing was added, that's fine (converter may have handled it).
141
        } catch (Throwable $e) {
142
            $this->log('imscc13: best-effort importer failed', 'error', [
143
                'error' => $e->getMessage(),
144
            ]);
145
            // Do not rethrow; execute() must remain resilient.
146
        }
147
    }
148
149
    /**
150
     * Create a validation-only patched copy of the manifest, normalizing
151
     * schema label and removing constructs that the CC 1.3 XSD rejects.
152
     * All changes are applied via DOM to avoid substring pitfalls.
153
     */
154
    private static function makeManifestValidationCopy(string $manifestPath): string
155
    {
156
        $xml = @file_get_contents($manifestPath);
157
        if (false === $xml) {
158
            throw new RuntimeException('Could not read manifest for validation patching.');
159
        }
160
161
        // Normalize <schema> label (1EdTech -> IMS) for v1.3 validator
162
        $xml = self::normalizeSchemaLabels($xml);
163
164
        // Parse with DOM to make safe structural tweaks for validation only
165
        $dom = new DOMDocument();
166
        $dom->preserveWhiteSpace = false;
167
        $dom->formatOutput = false;
168
169
        // Load as XML (NOT HTML); suppress warnings but we control edits
170
        if (!@$dom->loadXML($xml)) {
171
            // If DOM fails, just write normalized string to a temp file
172
            return self::writeTempValidatedCopy($xml);
173
        }
174
175
        $xp = new DOMXPath($dom);
176
        $xp->registerNamespace('ims', 'http://www.imsglobal.org/xsd/imsccv1p3/imscp_v1p1');
177
178
        // CC 1.3 XSD: top-level <organizations>/<organization>/<item> does not allow identifierref
179
        foreach ($xp->query('/ims:manifest/ims:organizations/ims:organization/ims:item[@identifierref]') as $item) {
180
            $item->removeAttribute('identifierref');
181
        }
182
183
        // CC 1.3 XSD: within that same level, <title> is not allowed directly; expected item|metadata
184
        foreach ($xp->query('/ims:manifest/ims:organizations/ims:organization/ims:item/ims:title') as $titleNode) {
185
            $titleNode->parentNode?->removeChild($titleNode);
186
        }
187
188
        $patched = $dom->saveXML();
189
        if (false === $patched) {
190
            $patched = $xml;
191
        }
192
193
        return self::writeTempValidatedCopy($patched);
194
    }
195
196
    /**
197
     * Normalize the <schema> value once (validation copy only).
198
     * Converts "1EdTech Common Cartridge" or variants to "IMS Common Cartridge".
199
     */
200
    private static function normalizeSchemaLabels(string $xml): string
201
    {
202
        // Use a plain replacement with backreferences (no closures here).
203
        $re = '/(<metadata\b[^>]*>\s*<schema>)(.*?)(<\/schema>)/is';
204
205
        // Replace only the first occurrence inside <metadata>...</metadata>.
206
        $patched = preg_replace($re, '$1IMS Common Cartridge$3', $xml, 1);
207
208
        // preg_replace can return null on PCRE error; fall back to original.
209
        return null !== $patched ? $patched : $xml;
210
    }
211
212
    /**
213
     * Write a patched manifest into a temp folder and return its path.
214
     */
215
    private static function writeTempValidatedCopy(string $content): string
216
    {
217
        $tmp = rtrim(sys_get_temp_dir(), DIRECTORY_SEPARATOR)
218
            .DIRECTORY_SEPARATOR.'cc13_val_'.bin2hex(random_bytes(3));
219
        if (!@mkdir($tmp, 0777, true) && !is_dir($tmp)) {
220
            throw new RuntimeException('Cannot create temp directory for validation copy: '.$tmp);
221
        }
222
        $dest = $tmp.DIRECTORY_SEPARATOR.'imsmanifest.xml';
223
        if (false === @file_put_contents($dest, $content)) {
224
            throw new RuntimeException('Cannot write validation copy: '.$dest);
225
        }
226
227
        return $dest;
228
    }
229
230
    /**
231
     * Unzip a file into the specified directory. Throws a RuntimeException if extraction fails.
232
     * Returns the extraction directory.
233
     */
234
    public static function unzip(string $file, ?string $to = null): string
235
    {
236
        @ini_set('memory_limit', '512M');
0 ignored issues
show
Security Best Practice introduced by
It seems like you do not handle an error condition for ini_set(). This can introduce security issues, and is generally not recommended. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unhandled  annotation

236
        /** @scrutinizer ignore-unhandled */ @ini_set('memory_limit', '512M');

If you suppress an error, we recommend checking for the error condition explicitly:

// For example instead of
@mkdir($dir);

// Better use
if (@mkdir($dir) === false) {
    throw new \RuntimeException('The directory '.$dir.' could not be created.');
}
Loading history...
237
238
        $to = $to ?: (rtrim(sys_get_temp_dir(), DIRECTORY_SEPARATOR).DIRECTORY_SEPARATOR.'cc13_'.date('Ymd_His').'_'.bin2hex(random_bytes(3)));
239
        if (!is_dir($to) && !@mkdir($to, 0777, true) && !is_dir($to)) {
240
            throw new RuntimeException("Cannot create temp directory: $to");
241
        }
242
243
        if (class_exists(ZipArchive::class)) {
244
            $zip = new ZipArchive();
245
            $res = $zip->open($file);
246
            if (true === $res) {
247
                if (!$zip->extractTo($to)) {
248
                    $zip->close();
249
250
                    throw new RuntimeException('Could not extract zip file using ZipArchive.');
251
                }
252
                $zip->close();
253
            } else {
254
                throw new RuntimeException('Could not open zip file using ZipArchive.');
255
            }
256
        } else {
257
            if (!class_exists('PclZip')) {
258
                throw new RuntimeException('Zip support not available (ZipArchive nor PclZip).');
259
            }
260
            $zip = new PclZip($file);
261
            if (0 === $zip->extract(PCLZIP_OPT_PATH, $to)) {
0 ignored issues
show
Bug introduced by
The constant PCLZIP_OPT_PATH was not found. Maybe you did not declare it correctly or list all dependencies?
Loading history...
262
                throw new RuntimeException('Could not extract zip file using PclZip.');
263
            }
264
        }
265
266
        return $to;
267
    }
268
269
    /**
270
     * Best-effort recursive delete (used to cleanup temp dirs).
271
     */
272
    public static function rrmdir(string $path): void
273
    {
274
        if (!is_dir($path)) {
275
            return;
276
        }
277
        $it = new RecursiveIteratorIterator(
278
            new RecursiveDirectoryIterator($path, FilesystemIterator::SKIP_DOTS),
279
            RecursiveIteratorIterator::CHILD_FIRST
280
        );
281
        foreach ($it as $fs) {
282
            $fs->isDir() ? @rmdir($fs->getPathname()) : @unlink($fs->getPathname());
283
        }
284
        @rmdir($path);
0 ignored issues
show
Security Best Practice introduced by
It seems like you do not handle an error condition for rmdir(). This can introduce security issues, and is generally not recommended. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unhandled  annotation

284
        /** @scrutinizer ignore-unhandled */ @rmdir($path);

If you suppress an error, we recommend checking for the error condition explicitly:

// For example instead of
@mkdir($dir);

// Better use
if (@mkdir($dir) === false) {
    throw new \RuntimeException('The directory '.$dir.' could not be created.');
}
Loading history...
285
    }
286
287
    private static function assertResourceFsWritable(): void
288
    {
289
        // ResourceFile base path used by Chamilo 2
290
        $base = rtrim((string) Container::getParameter('kernel.project_dir'), DIRECTORY_SEPARATOR).'/var/upload/resource';
291
292
        $fs = new Filesystem();
293
        // Ensure base directory exists
294
        if (!is_dir($base)) {
295
            try {
296
                $fs->mkdir($base, 0775);
297
            } catch (Throwable $e) {
298
                throw new RuntimeException("Resource FS not available: failed to create {$base}: ".$e->getMessage());
299
            }
300
        }
301
302
        // Check writability
303
        if (!is_writable($base)) {
304
            $who = \function_exists('posix_geteuid') ? ('uid='.posix_geteuid()) : (get_current_user() ?: 'unknown-user');
305
306
            throw new RuntimeException("Resource FS not writable: {$base} (php user: {$who}). ".'Fix permissions: chown -R www-data:www-data var && chmod -R 775 var');
307
        }
308
    }
309
310
    /**
311
     * Parse the manifest and import imswl_xmlv1p1 + imsdt_xmlv1p1 resources.
312
     * Non-destructive: if converter already did it, this can end up importing zero.
313
     *
314
     * @return array<string,int> counts per bucket created
315
     */
316
    private function bestEffortImportLinksAndDiscussions(string $manifestPath, string $extractDir): array
317
    {
318
        $doc = new DOMDocument();
319
        $doc->preserveWhiteSpace = false;
320
        $doc->formatOutput = false;
321
        if (!@$doc->load($manifestPath)) {
322
            throw new RuntimeException('Invalid imsmanifest.xml (XML load failed)');
323
        }
324
325
        $xp = new DOMXPath($doc);
326
        $xp->registerNamespace('imscp', 'http://www.imsglobal.org/xsd/imsccv1p3/imscp_v1p1');
327
328
        $resNodes = $xp->query('/imscp:manifest/imscp:resources/imscp:resource');
329
        if (!$resNodes || 0 === $resNodes->length) {
0 ignored issues
show
introduced by
$resNodes is of type DOMNodeList, thus it always evaluated to true.
Loading history...
330
            return ['links' => 0, 'forums' => 0, 'threads' => 0, 'posts' => 0];
331
        }
332
333
        $links = [];
334
        $linkCategories = [];
335
        $forumCats = [];
336
        $forums = [];
337
        $threads = [];
338
        $posts = [];
339
340
        $linkCatId = 1;
341
        $linkCategories[$linkCatId] = (object) [
342
            'id' => $linkCatId,
343
            'title' => 'Imported CC Links',
344
            'description' => '',
345
        ];
346
347
        $forumCatId = 1001;
348
        $forumId = 1002;
349
        $forumCats[$forumCatId] = (object) [
350
            'id' => $forumCatId,
351
            'cat_title' => 'Imported CC Discussions',
352
            'cat_comment' => '',
353
        ];
354
        $forums[$forumId] = (object) [
355
            'id' => $forumId,
356
            'forum_title' => 'Imported discussions',
357
            'forum_comment' => '',
358
            'forum_category' => $forumCatId,
359
        ];
360
361
        $nextId = 1;
362
        $added = ['links' => 0, 'forums' => 0, 'threads' => 0, 'posts' => 0];
363
364
        /** @var DOMElement $res */
365
        foreach ($resNodes as $res) {
366
            $typeRaw = (string) $res->getAttribute('type');
367
            $hrefRaw = (string) $res->getAttribute('href');
368
369
            // Some exporters put href only in <file href="...">
370
            if ('' === trim($hrefRaw)) {
371
                // read first <file> regardless of prefix
372
                $fileHref = '';
373
                // try with ns
374
                $files = $res->getElementsByTagNameNS('http://www.imsglobal.org/xsd/imsccv1p3/imscp_v1p1', 'file');
375
                if ($files->length > 0) {
376
                    $fileHref = (string) $files->item(0)->getAttribute('href');
377
                }
378
                // fallback without ns
379
                if ('' === $fileHref) {
380
                    $files = $res->getElementsByTagName('file');
381
                    if ($files->length > 0) {
382
                        $fileHref = (string) $files->item(0)->getAttribute('href');
383
                    }
384
                }
385
                $hrefRaw = $this->firstNonEmpty($hrefRaw, $fileHref);
386
            }
387
388
            $kind = $this->classifyCcResourceType($typeRaw, $hrefRaw);
389
            if ('other' === $kind || '' === $hrefRaw) {
390
                continue;
391
            }
392
393
            $abs = rtrim($extractDir, '/').'/'.$hrefRaw;
394
395
            if ('weblink' === $kind) {
396
                $wl = $this->parseWebLink($abs);
397
                if (!$wl) {
398
                    continue;
399
                }
400
                $id = $nextId++;
401
                $links[$id] = (object) [
402
                    'id' => $id,
403
                    'title' => (string) $wl['title'],
404
                    'url' => (string) $wl['url'],
405
                    'description' => (string) ($wl['description'] ?? ''),
406
                    'category_id' => $linkCatId,
407
                    'target' => '_blank',
408
                ];
409
                $added['links']++;
410
411
                continue;
412
            }
413
414
            if ('discussion' === $kind) {
415
                $dt = $this->parseDiscussionTopic($abs);
416
                if (!$dt) {
417
                    continue;
418
                }
419
                $tid = $nextId++;
420
                $threads[$tid] = (object) [
421
                    'id' => $tid,
422
                    'forum_id' => $forumId,
423
                    'thread_title' => '' !== $dt['title'] ? (string) $dt['title'] : 'Discussion',
424
                    'thread_date' => date('Y-m-d H:i:s'),
425
                    'poster_name' => 'importer',
426
                ];
427
                $pid = $nextId++;
428
                $posts[$pid] = (object) [
429
                    'id' => $pid,
430
                    'thread_id' => $tid,
431
                    'post_text' => (string) $dt['body'],
432
                    'post_date' => date('Y-m-d H:i:s'),
433
                ];
434
                $added['threads']++;
435
                $added['posts']++;
436
            }
437
        }
438
439
        // Nothing to import? done
440
        if (0 === $added['links'] && 0 === $added['threads'] && 0 === $added['posts']) {
441
            return $added;
442
        }
443
444
        // Build minimal legacy Course object
445
        $legacy = (object) ['resources' => []];
446
        if (!empty($links)) {
447
            $legacy->resources['link'] = $this->wrapAsLegacy($links);
448
            $legacy->resources['link_category'] = $this->wrapAsLegacy($linkCategories);
449
        }
450
        if ($added['threads'] > 0 || $added['posts'] > 0) {
451
            $legacy->resources['Forum_Category'] = $this->wrapAsLegacy($forumCats);
452
            $legacy->resources['forum'] = $this->wrapAsLegacy($forums);
453
            $legacy->resources['thread'] = $this->wrapAsLegacy($threads);
454
            $legacy->resources['post'] = $this->wrapAsLegacy($posts);
455
            $added['forums'] = \count($forums);
456
        }
457
458
        // Restore into the current course using the standard restorer
459
        $restorer = new CourseRestorer($legacy);
460
        if (method_exists($restorer, 'setDebug')) {
461
            $restorer->setDebug(true); // keep verbose while stabilizing the importer
462
        }
463
        $restorer->restore();
464
465
        return $added;
466
    }
467
468
    /**
469
     * Convert a plain [id => entity] array into the legacy wrapper form
470
     * [id => (object)['obj' => entity]] used by CourseRestorer/CourseBuilder.
471
     *
472
     * @param array<int|string,object> $bucket
473
     *
474
     * @return array<int|string,object>
475
     */
476
    private function wrapAsLegacy(array $bucket): array
477
    {
478
        $out = [];
479
        foreach ($bucket as $id => $entity) {
480
            $out[$id] = (object) ['obj' => $entity];
481
        }
482
483
        return $out;
484
    }
485
486
    /**
487
     * Parse IMS Web Link (v1p1).
488
     * Returns ['title' => string, 'url' => string, 'description' => string].
489
     */
490
    private function parseWebLink(string $file): ?array
491
    {
492
        if (!is_file($file)) {
493
            $this->log('weblink xml not found', 'warn', ['file' => $file]);
494
495
            return null;
496
        }
497
        $d = new DOMDocument();
498
        if (!@$d->load($file)) {
499
            $this->log('weblink xml invalid', 'warn', ['file' => $file]);
500
501
            return null;
502
        }
503
        $xp = new DOMXPath($d);
504
        // Support both v1p1 and v1p3, or even no namespace
505
        $xp->registerNamespace('wl11', 'http://www.imsglobal.org/xsd/imswl_v1p1');
506
        $xp->registerNamespace('wl13', 'http://www.imsglobal.org/xsd/imswl_v1p3');
507
508
        // Query by local-name() to ignore the namespace version
509
        $title = trim((string) $xp->evaluate('string(/*[local-name()="webLink"]/*[local-name()="title"])'));
510
        $url = trim((string) $xp->evaluate('string(/*[local-name()="webLink"]/*[local-name()="url"]/@href)'));
511
        if ('' === $url) {
512
            // Some exports put the URL as text node inside <url>
513
            $url = trim((string) $xp->evaluate('string(/*[local-name()="webLink"]/*[local-name()="url"])'));
514
        }
515
        $desc = trim((string) $xp->evaluate('string(/*[local-name()="webLink"]/*[local-name()="description"])'));
516
517
        if ('' === $title) {
518
            // Try LOM-like nested <string>
519
            $title = trim((string) $xp->evaluate('string(/*[local-name()="webLink"]/*[local-name()="title"]/*[local-name()="string"])'));
520
        }
521
        if ('' === $title) {
522
            $title = $url;
523
        }
524
        if ('' === $url) {
525
            $this->log('weblink missing href', 'warn', ['file' => $file]);
526
527
            return null;
528
        }
529
530
        return ['title' => $title, 'url' => $url, 'description' => $desc];
531
    }
532
533
    /**
534
     * Parse IMS Discussion Topic (v1p1).
535
     * Returns ['title' => string, 'body' => html].
536
     * Keeps inner HTML of <dt:text> exactly as-is (CDATA-safe).
537
     */
538
    private function parseDiscussionTopic(string $file): ?array
539
    {
540
        if (!is_file($file)) {
541
            $this->log('discussion xml not found', 'warn', ['file' => $file]);
542
543
            return null;
544
        }
545
        $d = new DOMDocument();
546
        if (!@$d->load($file)) {
547
            $this->log('discussion xml invalid', 'warn', ['file' => $file]);
548
549
            return null;
550
        }
551
        $xp = new DOMXPath($d);
552
        $xp->registerNamespace('dt11', 'http://www.imsglobal.org/xsd/imsdt_v1p1');
553
        $xp->registerNamespace('dt13', 'http://www.imsglobal.org/xsd/imsdt_v1p3');
554
555
        $title = trim((string) $xp->evaluate('string(/*[local-name()="topic"]/*[local-name()="title"])'));
556
557
        // Body can be <text> or <message> depending on exporter
558
        $node = $xp->query('/*[local-name()="topic"]/*[local-name()="text"]')->item(0)
559
            ?: $xp->query('/*[local-name()="topic"]/*[local-name()="message"]')->item(0);
560
561
        $body = '';
562
        if ($node) {
563
            foreach ($node->childNodes as $child) {
564
                $chunk = $d->saveXML($child);
565
                if (false !== $chunk) {
566
                    $body .= $chunk;
567
                }
568
            }
569
        }
570
571
        return ['title' => $title, 'body' => $body];
572
    }
573
574
    /**
575
     * helper to classify resource types more loosely.
576
     */
577
    private function classifyCcResourceType(string $type, string $href): string
578
    {
579
        $t = strtolower(trim($type));
580
        if ('' === $t) {
581
            return 'other';
582
        }
583
584
        // WebLink: imswl_* (v1p1, v1p3, vendor prefixes)
585
        if (str_contains($t, 'imswl')) {
586
            return 'weblink';
587
        }
588
589
        // Discussion: imsdt_* (v1p1, v1p3)
590
        if (str_contains($t, 'imsdt')) {
591
            return 'discussion';
592
        }
593
594
        // Many exports mark files as webcontent; keep for future doc import
595
        if (str_contains($t, 'webcontent')) {
596
            return 'webcontent';
597
        }
598
599
        // Fallback by extension (rare but harmless)
600
        if ('' !== $href && preg_match('~\.(xml)$~i', $href)) {
601
            // Heuristic: WL-*.xml under /weblinks/
602
            if (preg_match('~weblinks/[^/]+\.xml$~i', $href)) {
603
                return 'weblink';
604
            }
605
            if (preg_match('~discussions/[^/]+\.xml$~i', $href)) {
606
                return 'discussion';
607
            }
608
        }
609
610
        return 'other';
611
    }
612
613
    private function firstNonEmpty(string ...$vals): string
614
    {
615
        foreach ($vals as $v) {
616
            if ('' !== trim($v)) {
617
                return $v;
618
            }
619
        }
620
621
        return '';
622
    }
623
}
624