Project

General

Profile

Feature #10257

Feature #10034: Translation web platform

Merge strategy from Weblate

Added by u almost 3 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
10/03/2015
Due date:
% Done:

100%

QA Check:
Pass
Feature Branch:
Type of work:
Research

Description

There will be commits from the upcoming translation sprint in our weblate repo.

But also some other translators have started using the web platform for translations.

How do we call for merge from this repo? How do we verify?

How do we prevent adding too many PO files for non-activated language to the Git? See 08ae428.

weblate.svg View (40.1 KB) sajolida, 03/16/2016 03:59 PM

weblate.svg View (43.4 KB) u, 06/26/2016 03:17 AM


Subtasks

Feature #10331: Investigate the review processes available inside weblateResolved

Feature #11265: Configure Weblate roles on new VMResolvedu


Related issues

Related to Tails - Feature #10802: Investigate states of Weblate translations Resolved 12/29/2015
Blocked by Tails - Feature #10299: Consider disabling the htmlscrubber ikiwiki plugin Resolved 09/28/2015

Associated revisions

Revision b6528ab6
Added by sajolida almost 3 years ago

Merge branch 'doc/7787-bluetooth' (Closes: #10257)

History

#1 Updated by muri almost 3 years ago

the weblate documentation says (https://docs.weblate.org/en/latest/admin/translating.html):

You might however want to have more eyes on the translation and require more people to accept them. This can be achieved by suggestion voting. You can enable this on Component configuration configuration by Suggestion voting and Autoaccept suggestions.

maybe this feature can help to implement some kind of 'four-eyes principle'

#2 Updated by u almost 3 years ago

hi muri!

yes, we've activated this feature and for now, only trusted users have access to the platform.

right now, we're having a farsi translation sprint and eventually we can merge only farsi translations when it's done.

but we need to work out a general strategy. for example what happens when somebody translates into german using the platform?

#3 Updated by muri almost 3 years ago

ah, oke. you mean the merge into master.
i thought the question "How do we verify?" referred to how one weblate user can verify the translation of another weblate user and how this translation gets marked as done, like the [pull] on tails-l10n ...

#4 Updated by sajolida almost 3 years ago

During the October meeting we realized that this was highly dependent on the way we could implement a review process inside Weblate which was something in the original list of requirements ("MUST: provide user roles (admin, reviewer, translator"). We created a ticket to investigate this: #10331.

#5 Updated by intrigeri almost 3 years ago

  • Blocks Feature #10299: Consider disabling the htmlscrubber ikiwiki plugin added

#6 Updated by intrigeri almost 3 years ago

  • Blocks deleted (Feature #10299: Consider disabling the htmlscrubber ikiwiki plugin)

#7 Updated by intrigeri almost 3 years ago

  • Blocked by Feature #10299: Consider disabling the htmlscrubber ikiwiki plugin added

#8 Updated by intrigeri almost 3 years ago

  • Status changed from New to Confirmed

#9 Updated by intrigeri almost 3 years ago

Looks like this may need some preliminary research / proposal work before it's ready to be discussed, so I would set Type of work = Research. Just my 2 cts :)

#10 Updated by muri almost 3 years ago

so, i set up my own weblate instanz and tried to figure out some kind of workflow (both in weblate and mergin the results):

so, first of all, in weblate every file to be translated is a component and every component consists of strings. as mentioned above, there is a feature called suggestion voting, which means that a one doesn't simply translate a string, but one suggests a translation. as soon as N users voted for that suggestion, that translation is used. 'translation is used' means, that it is commited, together with all the other translations that can be used (enough users voted on). this commit can be done by hand (if one user has the permissions to do so) or by cron job. then the changes are in the local repository of this specific component.
then the changes can be pushed to a remote repository (by hand, if one has the permission, or automatically when comitted).
this remote repository can be set specific to the component (which in weblate means 'file').

i haven't found anything about branches in weblate. to me it seems that if we use one repository for translations, all the commits of translated content ends up in that repository. so the person merging these commits into the tails repo would have to look over all of them at once or look at the commits that affect one specific file (like a the branches we use at the moment).
there is a pre-commit-script and a post-commit-script option available in weblate, but i don't know yet if that could be used to group commits together into a branch (if that is even needed?). i think would have to look into git to do so...

i think the notification )if needed) could be done once a day by a commit hook to tails-l10n?

#11 Updated by sajolida almost 3 years ago

  • Subject changed from Discuss & adopt a strategy to merge commits from Weblate to Merge strategy from Weblate
  • Type of work changed from Discuss to Research
  • Blueprint set to https://tails.boum.org/blueprint/translation_platform/

Thanks for looking into this! I can foresee how it would be problematic to choose N in such a system. If N=2, then any team of two malicious translators nobody knows could commit stuff on our website, if N>2 then it would add unnecessary work on the other translators to vote for their strings and thus delay everything.

In the requirements we came up with (see blueprint) there's was the notion of "role":

MUST: provide user roles (admin, reviewer, translator)

and that was our initial idea for the review process: some people are "translators", and some others are "reviewers" that we trust to validate these translations. Did you see any notion of "role" in weblate?

By the way, we have a subtask which seems more suitable for the investigation part (#10331). But we started here, so maybe this ticket is superfluous by now.

#12 Updated by sajolida almost 3 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100

#13 Updated by sajolida over 2 years ago

  • Description updated (diff)
  • Status changed from Resolved to In Progress

This task was marked as resolved though it had an open subtask. This feels wrong. intrigeri also raised an issue about adding PO files for languages that are not activated yet. So I'm reopening this issue.

#14 Updated by sajolida over 2 years ago

  • Assignee set to u
  • QA Check set to Info Needed

So, how could we prevent committing to the Git PO files for non-activated languages? Assigning tentatively to u...

#15 Updated by intrigeri over 2 years ago

So, how could we prevent committing to the Git PO files for non-activated languages? Assigning tentatively to u...

It seems to me that Weblate has to store, somehow, PO files and their history for languages that are work-in-progress. I guess it totally makes sense that they are stored in Git.

So perhaps the question is: how do we avoid that our main/official Git repo continuously grows substantially due to translations that may, or may not, be completed some day? I think this calls for some indirection layer being set up between our official Git repo and the one Weblate uses. I guess that what we merge from Weblate should perhaps be a filtered version of the branch managed by Weblate, with history rewritten to exclude PO files that are not enabled on our website. But then merging back from the official master branch into Weblate's one will introduce duplicate commits. Can we do better than this?

(I have a kinda off-topic comment to add here, that is I think very much related to this problem: PO files for languages that we don't build won't be updated automatically by ikiwiki when the corresponding master page is modified (new/modified strings). Similarly, when we introduce new translatable web pages, our ikiwiki won't create the corresponding PO files for languages it does not know about. Is this something that is addressed by our current Weblate setup, or that is tracked somewhere? (Sorry for the naive question, I didn't follow the whole thing closely.) If it's not addressed yet: I think this problem also calls for some indirection layer being set up between our official Git repo and the one Weblate uses; in other words, someone/something may have to do the ikiwiki refresh with a slightly different ikiwiki.setup (with all languages known to Weblate enabled), so that all those PO files are created/updated as needed. The result of this refresh will probably be stored in Git and Weblate will be fed with it, or something, right?)

#16 Updated by sajolida over 2 years ago

Here are my personal interpretation from the discussion we had at 32C3:

We want to have:

  • Controlled versions of both the website changed through ikiwiki
    and the translations changed through the translation platform
    because they might be changed in parallel.
  • Both languages that are activated on the website and languages
    that are not.

Current limitations to solve:

  • Ikiwiki should update the PO files of non-activated languages
    when the website changes but ignore them when building the
    website.
  • Weblate should only commit reviewed changes.
  • Weblate should automatically merge the PO files updated by
    ikiwiki when the website changes.

Then anybody with commit permissions on the main repo, can merge
the weblate repo into the main repo at any time.

#17 Updated by intrigeri over 2 years ago

  • Parent task set to #10034

#18 Updated by intrigeri over 2 years ago

  • Category changed from Internationalization to Infrastructure

(This is about l10n, not i18n.)

#19 Updated by intrigeri over 2 years ago

We want to have:

ACK.

Current limitations to solve:

Wow, I see it was Christmas time :) But it's not anymore, and anyway Christmas is a lie, so I'm afraid I'll have to take the role of turning a very hopeful wishlist into something less nice, that we can perhaps make happen this year. It might imply a bit more work on our side, but only stuff many of us know how to do, and so much less waiting and hoping for years that IMO it's worth it.

  • Ikiwiki should update the PO files of non-activated languages when the website changes but ignore them when building the website.

Adding these two requirements to the set of current ikiwiki semantics would make it self-contradictory, so no, this is not a realistic option. And IMO it would be a very bad design decision to change these semantics to accomodate for "real source pages that we build" vs. "half-read source pages that we don't build but still we modify them and commit them to Git". There are simply too many places in ikiwiki that would need to learn about this subtle difference, which IMO is too much complexity just to accommodate a rare use case that can be solved differently. Upstream might think differently (especially if provided patches), but don't count on me to help sell them this idea, because if I were them, I would not buy it :)

Looks like as often, splitting a hard problem gives us to easy ones.

Regarding the "update the PO files" aspect: it seems quite easy to solve in an ad-hoc way on the weblate (or whatever web translation platform) side: simply ikiwiki --refresh (with a larger set of po_slave_languages — sorry for the "slave" work, I feel stupid having chosen than 7 years ago) after merging from the official master branch. So there's no need to hardcode this custom workflow into ikiwiki itself, which I think is good news. And it also means that anyone moderately skilled at platform and scripting stuff can do it, instead of requiring someone who feels like patching ikiwiki, which is probably very good news. The only downside I see is that our official master branch will cary outdated PO files for languages that are not enabled on our website yet. I think we can live with it, for most practical purposes. And if we can't, well, we'll need a bot that merges from origin/master, does the aforementioned special refresh, and pushes back there (a bit more infra, but no rocket science) — and maybe a script run at freeze time, before we send the call for translations, could be good enough.

Regarding the "ignore them" aspect: looks like a simple matter of adding, to the list of excludes, any \.LL\.po that we want ikiwiki to ignore, e.g. something like (untested):

--- a/ikiwiki.setup
+++ b/ikiwiki.setup
@@ -105,7 +105,7 @@ ENV: {}
 # regexp of normally excluded files to include
include: '^\.htaccess$'
 # regexp of files that should be skipped
-exclude: '(^blueprint\/.*|^contribute\/how\/promote\/material\/.*|\/discussion\..*|\/Discussion\..*)'
+exclude: '(^blueprint\/.*|^contribute\/how\/promote\/material\/.*|\/discussion\..*|\/Discussion\..*|\.it\.po$)'
 # specifies the characters that are allowed in source filenames
wiki_file_chars: '-[:alnum:]+/._~'
 # allow symlinks in the path leading to the srcdir (potentially insecure)

... would totally ignore all Italian PO files.

  • Weblate should only commit reviewed changes.

This feels like an awkward, or unrealistic, expectation to me. But this is perhaps because I know close to nothing about Weblate. Where is it supposed to store changesets that have not been reviewed yet, then? Does it have any storage backend aside of Git, where it would store not-yet-reviewed changesets, and be able to merge them with newly pulled stuff later on? Or perhaps you mean it should maintain two branches, one for reviewed changes, and one for not-yet-reviewed changes? Really, I probably don't get it.

If Weblate doesn't provide this exact feature yet (that seems hard to implement), I suggest we look for other ways to get what we need. Otherwise I'm concerned we'll fall into our usual self-inflicted trap of requiring something perfect, and instead having to live with the current crap we have, while some middle-ground could have been just fine.

  • Weblate should automatically merge the PO files updated by ikiwiki when the website changes.

This is already encoded in the "automatic pull from main Git repo" MUST on our translation platform spec, so it should not be an issue since Weblate is supposed to implement that spec. And if it doesn't, well, this should be easy to script I guess.

Then anybody with commit permissions on the main repo, can merge the weblate repo into the main repo at any time.

Indeed, this would be the ideal outcome! :)

#20 Updated by muri over 2 years ago

intrigeri wrote:

This feels like an awkward, or unrealistic, expectation to me. But this is perhaps because I know close to nothing about Weblate. Where is it supposed to store
changesets that have not been reviewed yet, then? Does it have any storage backend aside of Git, where it would store not-yet-reviewed changesets, and be able to
merge them with newly pulled stuff later on? Or perhaps you mean it should maintain two branches, one for reviewed changes, and one for not-yet-reviewed changes?
Really, I probably don't get it.

part of the answer to this (the multiple stages a repository resp. a file in a repository goes through) should be more clear as soon as i get to do #10802

#21 Updated by sajolida over 2 years ago

Wow, I see it was Christmas time :)

Please avoid sarcasm. This is how far we went on this day with 5-6
people around the table.

But I'm glad you're building up on our analysis :)

Regarding the "update the PO files" aspect: it seems quite easy to
solve in an ad-hoc way on the weblate (or whatever web translation
platform) side: simply ikiwiki --refresh (with a larger set of
po_slave_languages — sorry for the "slave" work

So this would be having an ikiwiki instance somewhere dedicated to
updating the PO files of non-activated languages, instead of doing this
on the production website. Smart solution!

Note that this is no way contradictory with my initial phrasing in
#10257#note-16 that was abstract enough not to jump to conclusions about
patching ikiwiki (though that was the only thing I could think of at
that point and I might have mention it elsewhere).

The only downside I see is that our
official master branch will cary outdated PO files for languages that
are not enabled on our website yet.

I think it's no big deal and we can try doing this manually for some
time and then see if we need automation as you are suggesting.

Regarding the "ignore them" aspect: looks like a simple matter of
adding, to the list of excludes, any \.LL\.po that we want ikiwiki to
ignore

I'm not sure how frequently we would have to add languages to the list
of non-activated languages. Probably not that often, unless the web
platform allows random people to start new language teams every other
day (something that is undefined in the current requirements). Could we
find a regex to exclude all PO files except the activated languages?
Then we would update both po_slave_languages and exclude only when
activating new languages.

  • Weblate should only commit reviewed changes.

So let's recall instead the problems we're trying to solve here:

- Weblate currently creates billions of commits when translating (more
or less one commit per translated line) and pollutes the Git repo.
- We need some way to merge only reviewed changes on the production
website. So maybe this could be rephrased into the need for commits,
branches, or whatever Git concepts that allow us to differentiate
reviewed from unreviewed changes.

Where is it supposed to store changesets that have not been reviewed yet,
then?

Weblate has a SQL backend; but indeed, I have no clue about what it does
or doesn't do with it.

I hope people with more insight that us on Weblate's internals can bring
in constructive input on solving these problems now that they are better
specified.

#22 Updated by sajolida over 2 years ago

  • Related to Feature #10802: Investigate states of Weblate translations added

#23 Updated by sajolida over 2 years ago

In #10802#note-5, muri explains that we could have translations pending for review not written to disk (maybe there only in the database until then).

#24 Updated by emmapeel over 2 years ago

intrigeri wrote:

Is this something that is addressed by our current Weblate setup, or that is tracked somewhere? (Sorry for the naive question, I didn't follow the whole thing closely.) If it's not addressed yet: I think this problem also calls for some indirection layer being set up between our official Git repo and the one Weblate uses; in other words, someone/something may have to do the ikiwiki refresh with a slightly different ikiwiki.setup (with all languages known to Weblate enabled), so that all those PO files are created/updated as needed. The result of this refresh will probably be stored in Git and Weblate will be fed with it, or something, right?)

Yes, the same that when the new language is added, with ikiwiki.setup

#25 Updated by sajolida over 2 years ago

  • File weblate.svg View added
  • Assignee changed from u to intrigeri
  • QA Check changed from Info Needed to Ready for QA

Building on #10802 and #10331 we came up with an architecture to combine Git, ikiwiki, and Weblate. See the drawing in attachment. Once we agree on this we should better detail the next steps.

#26 Updated by sajolida over 2 years ago

  • Target version set to Tails_2.3

#27 Updated by sajolida over 2 years ago

  • Related to Feature #9183: Have a test website on staging.tails.boum.org added

#28 Updated by sajolida over 2 years ago

  • Target version deleted (Tails_2.3)

The proposed design would also solve #9183.

#29 Updated by sajolida over 2 years ago

Actually, that's unrelated to #9183 since the content built on Weblate's ikiwiki will be the same as origin/master.

#30 Updated by sajolida over 2 years ago

  • Related to deleted (Feature #9183: Have a test website on staging.tails.boum.org)

#31 Updated by sajolida over 2 years ago

  • Assignee changed from intrigeri to u

Reassign to u who's assigned to the "Translation web platform" in our roadmap.

#32 Updated by intrigeri over 2 years ago

Reassign to u who's assigned to the "Translation web platform" in our roadmap.

Thank you, I'm glad to not be responsible for the next step of this work. Still: u, we can discuss this next time we meet if you want my input, as a fun way of side-tracking :)

#33 Updated by u about 2 years ago

Here's a small update of the SVG, which should reflect the current and planned situation a bit better.

#34 Updated by u about 2 years ago

  • Status changed from In Progress to Resolved
  • QA Check changed from Ready for QA to Pass

I think we now know the big picture, however we still expect some merge conflicts here and there. But as far as the research goes, I'll close this ticket.

Also available in: Atom PDF