Bug #16774

Transifex translations: we should not update from the _completed branches

Added by emmapeel 2019-06-04 11:04:55 . Updated 2020-05-06 04:28:55 .

Status:
In Progress
Priority:
Normal
Assignee:
emmapeel
Category:
Internationalization
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
Type of work:
Code
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

ey there!

I think we should change the branches from where we pick up our translations coming from transifex.

The Tails resources being translated on transifex have usually two branches each on https://gitweb.torproject.org/translation.git/
For example, the resource translating WhisperBack has:

This translations are imported with the ./import_translations.sh

## THE PROBLEM

As reported in https://trac.torproject.org/projects/tor/ticket/26878 , the \_completed branches areonly updated when a resource is completed*, so a problem arises when:

  • A translation is completed - the _completed branch is updated
  • There are new strings on the file - the _completed branch is NOT updated
  • Some of the translations are done, but not all - the _completed branch is NOT updated

While I welcome help to solve the tor ticket before, my suggestion for Tails is to pull from the normal branch, and use the _completed branch as a way of seeing the situation of the translation (how many times this file was completed? etc).

By not using the _completed branch, we will be having less outdated translations. Especially on the ‘long tail’ of languages.


Subtasks

Bug #17106: Don't import PO files with no translated string from Transifex Resolved

0


Related issues

Related to Tails - Feature #16095: Curate the list of languages in Tails Greeter Resolved 2018-11-04
Related to Tails - Bug #17279: Work around the lack of usable branches in Tor's translation.git Resolved

History

#1 Updated by emmapeel 2019-06-04 11:12:27

  • Description updated

#2 Updated by intrigeri 2019-06-07 10:12:11

@emmapeel, is there any difference between the branches you recommend we use and the _completed ones, wrt. inclusion of non-reviewed translations?

Rationale for my question: I’d rather not regress on the (already rather poor) quality of translations we pull from Transifex.

#3 Updated by emmapeel 2019-06-07 10:33:59

regarding the reviews, as well as the updates when a resource gets small changes, I am recommending to follow the not-_Completed branches, because I think it is better regarding the update of the files:

if someone makes a correction on an incomplete file, it will be updated in the resource branch, but it will not be updated on the _completed branch until the whole file is translated.

but unfortunately i cannot tell the transifex client to download according to review percentage. what i can tell you is that reviewing a resource makes really small changes, and it happens usually after a resource is completed, so this problem is arising for languages with less updates.

#4 Updated by intrigeri 2019-06-07 11:05:50

OK, I understand. I’m still interested in the answer to my question, so we can consciously factor it into the pros/cons balancing act.

#5 Updated by intrigeri 2019-08-14 07:37:43

intrigeri wrote:
> @emmapeel, is there any difference between the branches you recommend we use and the _completed ones, wrt. inclusion of non-reviewed translations?

Ping? In other words, do the branches without _completed in their name include translations that have not been reviewed yet?

#6 Updated by intrigeri 2019-08-14 07:38:33

  • Subject changed from transifex translations: we should not update from the _completed branches to Transifex translations: we should not update from the _completed branches
  • Priority changed from Low to Normal

(This seems to be an important problem.)

#7 Updated by emmapeel 2019-08-14 08:00:05

As said previously, the difference is:

If someone makes a correction on a resource in which the translation is incomplete, it will be updated in resource branch, but not on the _completed branch until the whole file is translated.

#8 Updated by intrigeri 2019-08-14 08:09:44

> As said previously […]

I understood this part. It is a clear advantage of your proposal.

But AFAICT you still did not answer my question so I still can’t balance this advantage vs. potential drawbacks :/

#9 Updated by emmapeel 2019-08-14 08:20:49

ey,good news!
I went looking for the docs to show you that it was not possible, and it seems they finally went around this!

We can test putting the tails translations in transifex onto this mode:

https://docs.transifex.com/client/pull/#getting-different-file-variants

tx pull -a —mode reviewed —minimum-perc 100

#10 Updated by emmapeel 2019-08-14 08:28:46

I tested such command, and only the French translation was ready. But

tx pull -a —mode reviewed

produced

https://gitweb.torproject.org/translation.git/commit/?h=tails-misc&id=5e8bf218a91af9dd4319b0dc1a3e637a01ef405c

that looks alright i think…. shall i leave it like that on the update script?

and in all Tails branches?

#11 Updated by intrigeri 2019-08-14 12:13:34

  • Status changed from Confirmed to In Progress
  • Assignee set to emmapeel
  • Target version set to Tails_3.16

> I tested such command, and only the French translation was ready. But

> tx pull -a —mode reviewed

Ooh yeah, --mode reviewed sounds very nice to my ear :))) Thanks a lot for investigating!

> shall i leave it like that on the update script? and in all Tails branches?

I think I’d like to only include translations above some minimal level; I agree that 100% is too high a bar but 0 is too low IMO. In doubt, I would use 25% to start with, in order to be consistent with https://tails.boum.org/contribute/how/translate/team/new/ (if that’s good enough for the core pages of our website, that should be good enough for our custom programs as well). I assume this translates into --minimum-perc 25. Makes sense?

Once we agree on some number we can use for now, yeah, please do this on all Tails non-completed branches, then reassign to me: I’ll take a last look and if happy (which I assume I will be), I will update our own scripts to fetch from non-completed branches as you suggested, and we can call this ticket done!

I suspect we’ll want to fine-tune this later: @sajolida is working on stuff that will probably transitively depend on these settings (by reusing the list of PO files in tails.git:po/ for other stuff). But this can happen on another ticket.

#12 Updated by sajolida 2019-08-16 17:49:22

  • blocks Feature #16095: Curate the list of languages in Tails Greeter added

#13 Updated by intrigeri 2019-08-19 10:49:57

emma & I agreed on having _release branches, that contain only reviewed strings and only languages that have 25% of the strings translated+reviewed. Once this is implemented on Tor’s side, we’ll adjust the Tails code to fetch from these new branches (and while we’re at it, we should think about what we’ll do wrt. PO files being removed from these branches: AFAICT our current code will simply leave the old version in place).

#14 Updated by intrigeri 2019-08-19 10:59:16

In current tails-misc (that only has reviewed strings now):

  • 20 languages at 25% or more: ca, cs, de, el, es_AR, es, fi, fr, ga, he, hu, it, km, lt, pt_BR, pt_PT, ro, sv, tr, zh_CN
  • 1 language between 0% and 25%: ar
  • Quite a few languages got dropped because the strings were never reviewed; there’s a general lack of reviewers on Transifex. For example, vi.po has 58 strings translated in current tails.git, but was never updated since 2017 in Transifex, and never reviewed — who knows if these translations are good.

#15 Updated by emmapeel 2019-08-19 11:02:38

  • Description updated

#16 Updated by sajolida 2019-08-22 17:45:39

Why not use a 0% threshold?

The 25% threshold on our website makes more sense because having languages enabled on our website has some cost: at least in build time and in work when unfuzzing stuff manually. But translated software wouldn’t have such a cost.

Seeing the list of languages between 0% and 25% (only ‘ar’) it also sound like not much extra work in case RMs have to fiddle with these.

#17 Updated by intrigeri 2019-08-23 08:00:36

Hi!

> Why not use a 0% threshold?

I think I’ve been somewhat confused.

On Feature #16095 you initially wrote:

"Having such a long list makes it harder to know which languages are actually well translated and for the user to know what’s her best option is, without trial and errors.

I think we should filter this list to only display the languages that are reasonably well translated.

If we base ourselves on the PO files for our internal tools, we might be able to automatically generate a list of languages during the build. Making sure that our internal tools are well translated in a given language before listing it sounds like a good criteria too."

But later on that ticket, you switched from “our internal tools are well translated” to “have at least 1 string of custom software translated”, and I’m afraid I failed to adjust my thinking accordingly here.

I’m totally fine with letting you decide, from a UX perspective, which threshold we should use: I don’t think it makes much of a difference from an implementation perspective.

emmapeel and I have plans to discuss such matters today on XMPP, it would be nice if you could join us :)

#18 Updated by sajolida 2019-08-27 15:42:20

I agree that my position is not super clear and confusing. For me the main goal of Feature #16095 is to bring down this very long list of 284 languages to something easier to parse. To avoid having to curate the list manually and run into political debates on whether we should keep or remove Luxembourgish or Ligurian from the list, I’m happy that we found a automated criteria that brings it down to around 50.

And, from the analysis that we did on actual translations files Feature #16095, being stricter than 0% at applying the criteria of “our internal tools are well translated” is not really helpful.

#19 Updated by intrigeri 2019-08-27 20:21:22

> we found a automated criteria that brings it down to around 50.

> And, from the analysis that we did on actual translations files Feature #16095, being stricter than 0% at applying the criteria of “our internal tools are well translated” is not really helpful.

Note: said analysis did not include the reviewed criterion, which is part of the current proposal here. So the total number may be closer to 20 than to 50. Below I’ll assume that you’re fine with that.

So, as said earlier, I’m fine with letting sajolida pick the threshold, so here’s an updated proposal:

  • Have _release branches, that contain only reviewed strings and only languages that have at least 1 string translated+reviewed.
    • Wrt. implementation details, if this is not something we can easily ask tx pull to do, I guess that using a 1% threshold would be acceptable.
  • Once this is implemented on Tor’s side, we’ll adjust the Tails code to fetch PO files from these new branches.
  • We can deal with the PO files being removed from these branches either here pro-actively, or, worst case, on Feature #16095.

@emmapeel, would this work for you? If not, please have a chat about it with sajolida in place/time of your liking. Feel free to invite me if you think I can add something to the discussion :)

#20 Updated by emmapeel 2019-08-28 08:45:08

intrigeri wrote:

> So, as said earlier, I’m fine with letting sajolida pick the threshold, so here’s an updated proposal:
>
> * Have _release branches, that contain only reviewed strings and only languages that have at least 1 string translated+reviewed.

What about doing this on the already existing _completed branches? green computing! I Think I want to apply said threshold to all _completed branches so I fix https://trac.torproject.org/projects/tor/ticket/26878 as well.

> Wrt. implementation details, if this is not something we can easily ask tx pull to do, I guess that using a 1% threshold would be acceptable.
> * Once this is implemented on Tor’s side, we’ll adjust the Tails code to fetch PO files from these new branches.
> * We can deal with the PO files being removed from these branches either here pro-actively, or, worst case, on Feature #16095.
We can always find the files on the git history, and their strings are still part of the transifex translation memory.

#21 Updated by intrigeri 2019-08-28 09:12:22

> What about doing this on the already existing _completed branches? green computing! I Think I want to apply said threshold to all _completed branches so I fix https://trac.torproject.org/projects/tor/ticket/26878 as well.

On the one hand, I would find this definition of “completed” slightly confusing: having 1 translated+reviewed string does not match what I understand with “completed”. I’m slightly worried that in N months or years, someone will see this as a bug (“completed branches have incomplete translations, where can I find really complete translations?”) and if this gets “fixed”, we may lose the branches we need.

On the other hand, how the branches we use are called makes little difference for Tails in practice (very few people get exposed to these names): we’ll be fine as long as 1. there are branches with the content we want; 2. these branches are here to stay; 3. we’re in the loop if someone wants to change the criteria for inclusion in these branches.

So yeah, if applying this criteria on the _completed branches has advantages for you compared to creating new ones, it’s fine by me :)

#22 Updated by intrigeri 2019-09-02 16:41:05

> Have _release branches, that contain only reviewed strings and only languages that have at least 1 string translated+reviewed.

For 4.0~beta2 I’ve switched to importing from tails-misc_release, as discussed earlier on XMPP. I trust your automation to have imported only reviewed strings in there, which is good. But 60 of the PO files in there have no single string translated, which violates the aforementioned criterion (and FWIW, 30 have at least one string translated).

This is not a big problem on this ticket: having PO files with no translation whatsoever won’t cause trouble. Still, I’ve removed them as sajolida is looking at the content of tails.git:po/ and drawing conclusions from the number of files in there.

But Feature #16095 is expecting us to ensure here that all PO files in tails.git:po/ respect the aforementioned criterion.

So, shall I filter out, at import time, PO files that have no string translated? Or will you do this on your side?

#23 Updated by CyrilBrulebois 2019-09-05 00:05:40

  • Target version changed from Tails_3.16 to Tails_3.17

#24 Updated by intrigeri 2019-09-12 14:25:34

  • Target version changed from Tails_3.17 to Tails_4.0

#25 Updated by intrigeri 2019-09-30 08:41:44

intrigeri wrote:
> So, shall I filter out, at import time, PO files that have no string translated? Or will you do this on your side?

emmapeel told me it’s not easy to do on her side, so I’ll do it: Bug #17106.

Remaining steps here:

  1. emmapeel creates _release branches for every other Tails resource (tails-iuk, tails-onioncircuits, tails-perl5lib, liveusb-creator, tails-persistence-setup, whisperback) so we can stop importing from the _completed branches; once this is done, please reassign to me for the next step
  2. I adjust import-translations so it pulls from the _release branches

#26 Updated by intrigeri 2019-09-30 08:42:02

  • blocked by deleted (Feature #16095: Curate the list of languages in Tails Greeter)

#27 Updated by intrigeri 2019-09-30 08:42:09

  • related to Feature #16095: Curate the list of languages in Tails Greeter added

#28 Updated by intrigeri 2019-09-30 08:43:02

> Blocks deleted (Feature Feature #16095: Curate the list of languages in Tails Greeter)

Rationale: the part of this ticket that still blocks Feature #16095 is now tracked on Bug #17106.

#29 Updated by intrigeri 2019-10-10 05:34:10

> Remaining steps here:

> # emmapeel creates _release branches for every other Tails resource (tails-iuk, tails-onioncircuits, tails-perl5lib, liveusb-creator, tails-persistence-setup, whisperback) so we can stop importing from the _completed branches; once this is done, please reassign to me for the next step

Note that until this is done, I don’t know where to pull translations from: if I pull from the _completed branches (status quo), lots of languages disappear — I guess that’s because the _completed branches now have only languages that are fully translated and reviewed. So for 4.0~rc1, while updating our custom packages, I’ll workaround this problem by importing updated translations for languages present in those branches, and keeping the old PO files for every other language.

#30 Updated by intrigeri 2019-10-19 12:33:20

Hi emma,

>> Remaining steps here:

>> # emmapeel creates _release branches for every other Tails resource (tails-iuk, tails-onioncircuits, tails-perl5lib, liveusb-creator, tails-persistence-setup, whisperback) so we can stop importing from the _completed branches; once this is done, please reassign to me for the next step

> Note that until this is done, I don’t know where to pull translations from: if I pull from the _completed branches (status quo), lots of languages disappear — I guess that’s because the _completed branches now have only languages that are fully translated and reviewed. So for 4.0~rc1, while updating our custom packages, I’ll workaround this problem by importing updated translations for languages present in those branches, and keeping the old PO files for every other language.

Any chance we get the missing _release branches in time for the 4.0 release? Ideally, I’ll need them on Monday morning, latest.

Otherwise, I’ll use the same workaround as last time, which is OK for me as I’m pretty up-to-date on this front.
But for 4.1, the RM will be kibi, and it would really be nice if he did not have to second-guess our existing code and instructions, that conflict with how the _completed branch now work.

Either way, a timeline would be immensely helpful to me: if I can’t expect this to be done soonish, I’ll put temporary workarounds in place in our code & doc.

Cheers!

#31 Updated by intrigeri 2019-10-21 11:46:16

  • Target version changed from Tails_4.0 to Tails_4.1

#32 Updated by intrigeri 2019-12-01 07:39:22

  • related to Bug #17279: Work around the lack of usable branches in Tor's translation.git added

#33 Updated by intrigeri 2019-12-01 07:41:02

intrigeri wrote:
> Any chance we get the missing _release branches in time for the 4.0 release? Ideally, I’ll need them on Monday morning, latest.
>
> Otherwise, I’ll use the same workaround as last time, which is OK for me as I’m pretty up-to-date on this front.
> But for 4.1, the RM will be kibi, and it would really be nice if he did not have to second-guess our existing code and instructions, that conflict with how the _completed branch now work.
>
> Either way, a timeline would be immensely helpful to me: if I can’t expect this to be done soonish, I’ll put temporary workarounds in place in our code & doc.

FTR, unless you tell me that you’ll implement what we need soon (Bug #16774#note-25), I’ll put temporary workarounds in place (Bug #17279) at some point in December or January.

#34 Updated by CyrilBrulebois 2019-12-04 11:31:19

  • Target version changed from Tails_4.1 to Tails_4.2

#35 Updated by intrigeri 2019-12-28 11:31:29

Update: forget about tails-iuk and tails-perl5lib, they’re being merged into tails.git via Feature #15281.

So we only need _release branches for: tails-onioncircuits, liveusb-creator, tails-persistence-setup, and whisperback.

#36 Updated by CyrilBrulebois 2020-01-07 18:00:40

  • Target version changed from Tails_4.2 to Tails_4.3

#37 Updated by anonym 2020-02-11 15:25:58

  • Target version changed from Tails_4.3 to Tails_4.4

#38 Updated by CyrilBrulebois 2020-03-12 09:55:56

  • Target version changed from Tails_4.4 to Tails_4.5

#39 Updated by CyrilBrulebois 2020-04-07 17:05:14

  • Target version changed from Tails_4.5 to Tails_4.6

#40 Updated by CyrilBrulebois 2020-05-06 04:28:55

  • Target version changed from Tails_4.6 to Tails_4.7