Skip to content

Add support to import a markdown directory#2196

Merged
mejo- merged 28 commits intomainfrom
feat/import_markdown
Feb 26, 2026
Merged

Add support to import a markdown directory#2196
mejo- merged 28 commits intomainfrom
feat/import_markdown

Conversation

@mejo-
Copy link
Member

@mejo- mejo- commented Jan 12, 2026

📝 Summary

This PR adds a new occ command occ collectives:import:markdown to import markdown files and referenced attachments from a given directory path.

The import function first imports all markdown files and then tries to fix relative links and references to local attachment files.

So far it's only tested with markdown created by Dokuwiki2Markdown as the aim is to add support for migrating a Dokuwiki instance to Collectives.

The PR contains documentation how to use the occ command.

🏁 Checklist

  • Code is properly formatted (npm run lint / npm run stylelint / composer run cs:check)
  • Sign-off message is added to all commits
  • Tests (unit, integration and/or end-to-end) passing and the changes are covered with tests
  • Documentation (README or documentation) has been updated or is not required

@mejo- mejo- self-assigned this Jan 12, 2026
@mejo- mejo- force-pushed the feat/import_markdown branch 2 times, most recently from c35b436 to 9ddb854 Compare January 16, 2026 13:00
@mejo- mejo- force-pushed the feat/import_markdown branch 3 times, most recently from 9ad77b4 to 52c7b68 Compare February 9, 2026 19:36
@mejo- mejo- added 3. to review enhancement New feature or request and removed 2. developing labels Feb 9, 2026
@janbaum
Copy link
Contributor

janbaum commented Feb 11, 2026

Nice feature! I am still testing it and will report what I find :)

While migrating one DW (DokuWiki) I noticed, that dealing with ACL's from DW in Collectives isn't that straight forward, so I have to rethink the logical organisation/division of the content in several collectives, rather than just importing the whole DW.

While doing that I recognized, that until now it's only posible to resolve links to attachments, when the attachments are in the imported markdown directory too, which they only are, if I use the whole DW, as this comes with

DW/
    '- media/
    '-  pages/

so I would just point to DW/ as the import directory. My current situation requires me to e.g. extract the IT docs to remove from the big DW into an own Collective, so for the import directory, I will use DW/pages/it/it.intern, which works fine for the markdown files, but fails to resolve the attachments, which are all in DW/media and thus not included in the import directory.

Proposal: Would be nice to have a variable flag to set the attachments directory

Edit: As a workaround I copied the media/ into the desired it.intern/ directory, and used occ collectives:import:markdown -d DW/pages/it/it.inern. This worked fine, besides all directories in media/ also got created empty mardown files

@janbaum
Copy link
Contributor

janbaum commented Feb 11, 2026

Additionally I realized somehow the links again are not resolving. In my head you @mejo- already fixed that issue, I am not really sure what happens here 🤷

The issue is, that DW links like [:it:it.intern:machines:servers:nextcloud-host](:it:it.intern:machines:servers:nextcloud-host) don't get resolved, probably because the link is looked up in an absolute manner, but since it.intern is the root in this case, it would have to be resolved relatively?

@janbaum
Copy link
Contributor

janbaum commented Feb 11, 2026

I withdraw the problem beeing absoulute/relative links. Even in the general import with DW/ as import directory no internal link seems to be resolved correctly (I went through ~30 pages and found non, but didn't check all pages for now...).

Currently e.g. [:it:it.intern:machines:servers:nextcloud-host](:it:it.intern:machines:servers:nextcloud-host) is not resolved and left as is.

I actually thought, that we already had I working version, that took care of this problem, as we were talking about the issue in the converter @mejo- ?

I have the feeling that this could be a more complicated issue, since in case I would import a subdirectory of the actual DW, the links will always contain roots that are not present in the new collective @mejo- ?

@max-nextcloud
Copy link
Collaborator

I have the feeling that this could be a more complicated issue, since in case I would import a subdirectory of the actual DW, the links will always contain roots that are not present in the new collective

@janbaum Just a random thought... would it be feasible to import the entire folder structure instead and just remove the non it folders? That way you'd probably end up with everything in a deeply nested page structure but the links could be resolved even if they are absolute (and the import is fixed). You could move the content out of the nested page once it's in collectives.

@mejo-
Copy link
Member Author

mejo- commented Feb 11, 2026

@janbaum if you want to import everything, you have to give the path to pages/ as directory. Did you do this? The code to resolve paths to attachments will look for media files in the relative path, in './media/<relative_path>and in../media/<relative_path>`. Giving the root Dokuwiki folder will probably import everything into subpages of a "pages" page, which you probably don't want.

When I tested it, links in the :folder:folder:page syntax were resolved when importing the whole dokuwiki/pages folder. But maybe there's a bug, I'll have a look.

I don't see an easy path to make link and attachments resolution work when importing just a subdirectory of the Dokuwiki to be honest. Copying the media folder is a good hack 😉

@mejo- mejo- force-pushed the feat/import_markdown branch 5 times, most recently from 182d145 to e3c2300 Compare February 20, 2026 09:57
@mejo-
Copy link
Member Author

mejo- commented Feb 20, 2026

I added a playwright test to test the import feature. The PR is ready for review from my side.

@janbaum
Copy link
Contributor

janbaum commented Feb 23, 2026

Just a potential papercut/confusion:
Links in the DW source, that don't exist anymore (in DW itself) won't get parsed/translated and remain in the :namespace:page format. The target pages don't exist anyway, so there is no problem, but I thought maybe one could do something around that? Just for the notes :)

@janbaum
Copy link
Contributor

janbaum commented Feb 23, 2026

Another thing that would indeed be interesting:

importing sub-directories as actual sub-directories. E.g I have my overall DW/ and want to import stuff dedicated for a specific user-group. So I need

  • dir1/
  • dir2/

So first I import dir1/. That leads to all files from there being important directly into the root of the collective. When I import dir2/ now, the same thing happens and all contents of dir1/ and dir2 are mixed up in the root of the collective and I have to sort them out again by hand.

My current workaround is to import the first directory as usual, but the second one into another temporary collective, to get the contents right and then hope, that moving the contents from the temp. collective to the actual one by the files app works fine. (it doesn't always, I don't know exactly why)

Should I open a separate issue for that?

@mejo-
Copy link
Member Author

mejo- commented Feb 23, 2026

importing sub-directories as actual sub-directories. E.g I have my overall DW/ and want to import stuff dedicated for a specific user-group. So I need

This should already be possible. You can give --parent-id as a parameter to the import command. It will then import the given directory as subpages of the given parent page.

@mejo-
Copy link
Member Author

mejo- commented Feb 23, 2026

Just a potential papercut/confusion:
Links in the DW source, that don't exist anymore (in DW itself) won't get parsed/translated and remain in the :namespace:page format. The target pages don't exist anyway, so there is no problem, but I thought maybe one could do something around that? Just for the notes :)

Do you have an idea what could be done about the links? I'm hesitant to touch links that cannot be resolved to an existing page as we don't know what to do with them. I'd prefer to keep them as is to be honest.

@mejo- mejo- force-pushed the feat/import_markdown branch from e3c2300 to 05c19b5 Compare February 23, 2026 10:18
@janbaum
Copy link
Contributor

janbaum commented Feb 23, 2026

Links

Just tried to import this file.txt

===== Themen =====

[[ zwanziggrad:workshop_grundlagen | Grundlagen ]] 

[[ zwanziggrad:workshop_gleichstellungs_workshop | Workshop für Frauen inter und n.b. ]]

[[ zwanziggrad:workshop_laufradbau | Laufradbau - Einspeichen]]

[[ zwanziggrad:workshop_laufradbau_lagerspiel | Laufradbau - Lagerspiel & Zentrieren]]

[[ zwanziggrad:workshop_bremse_schaltung | Bremse & Schaltung ]]

[[ zwanziggrad:workshop_getriebenaben | Getriebenaben ]]

[[ zwanziggrad:workshop_fruehjahrsputz | Frühjahrsputz ]]

This was converted to:

## Themen

[ Grundlagen ]( zwanziggrad:workshop_grundlagen )

[ Workshop für Frauen inter und n.b. ]( zwanziggrad:workshop_gleichstellungs_workshop )

[ Laufradbau - Einspeichen]( zwanziggrad:workshop_laufradbau )

[ Laufradbau - Lagerspiel & Zentrieren]( zwanziggrad:workshop_laufradbau_lagerspiel )

[ Bremse & Schaltung ]( zwanziggrad:workshop_bremse_schaltung )

[ Getriebenaben ]( zwanziggrad:workshop_getriebenaben )

[ Frühjahrsputz ]( zwanziggrad:workshop_fruehjahrsputz )

### Termin für AStA Seite erstellen

[ Anleitung zum Termine erstellen ]( zwanziggrad:asta_seite_termin_erstellen )

and then imported identically as:

## Themen

[ Grundlagen ]( zwanziggrad:workshop_grundlagen )

[ Workshop für Frauen inter und n.b. ]( zwanziggrad:workshop_gleichstellungs_workshop )

[ Laufradbau - Einspeichen]( zwanziggrad:workshop_laufradbau )

[ Laufradbau - Lagerspiel & Zentrieren]( zwanziggrad:workshop_laufradbau_lagerspiel )

[ Bremse & Schaltung ]( zwanziggrad:workshop_bremse_schaltung )

[ Getriebenaben ]( zwanziggrad:workshop_getriebenaben )

[ Frühjahrsputz ]( zwanziggrad:workshop_fruehjahrsputz )

### Termin für AStA Seite erstellen

[ Anleitung zum Termine erstellen ]( zwanziggrad:asta_seite_termin_erstellen )

The links are not working, cause they are pointing to e.g. the link "Grundlagen": https://cloud.asta.tu-darmstadt.de/apps/collectives/Wiki-Fahrradwerkstatte-14/workshops-590807# where as if I create a link with the smartpicker, it points to: https://cloud.asta.tu-darmstadt.de/apps/collectives/Wiki-Fahrradwerkstatte-14/workshop-grundlagen-590804 which works.

mejo- added 21 commits February 25, 2026 14:07
Doesn't add the pages to subpage order of parent page and doesn't call
notifyPush. Therefore it's a bit less memory greedy.

Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
…wFile

Further memory saving improvements when processing many files.

Signed-off-by: Jonas <jonas@freesources.org>
Further memory saving improvements when processing many files.

Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Further memory saving improvements when processing many files.

Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
E.g. Dokuwwiki2Markdown creates `name.md` and folder `name` with
subpages. In these cases, use `name.md` as index page for the folder
`name`.

Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
…class

Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
@mejo- mejo- force-pushed the feat/import_markdown branch 2 times, most recently from d414b89 to d9de203 Compare February 25, 2026 13:57
@mejo-
Copy link
Member Author

mejo- commented Feb 25, 2026

@max-nextcloud I did all the changes we discussed.

$directory = $input->getArgument('directory');
$userId = $input->getOption('user-id');
$parentId = (int)$input->getOption('parent-id');
$verbose = (bool)$input->getOption('verbose');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: $verbose seems redundant here as the Symfony OutputInterface already tracks verbosity.
In the ProgressReporter this could be used either from $output->isVerbose() or using the writeln() option directly like $this->output->writeln('<info>' . $message . '</info>', OutputInterface::VERBOSITY_VERBOSE);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, didn't know that. I'll keep $verbose in IProgressReporter for now, as the idea is to maybe add support for importing via API later on and then we will need a second implementation of ProgressReporter that also has to pass in verbosity.

Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
Signed-off-by: Jonas <jonas@freesources.org>
@mejo- mejo- merged commit 4bb305c into main Feb 26, 2026
57 of 62 checks passed
@mejo- mejo- deleted the feat/import_markdown branch February 26, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants