Feature #17178

Re-include metadata analysis tools

Added by huertanix 2019-10-22 22:21:55 . Updated 2019-12-20 11:43:21 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
feature/17178-metadata-tools
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

It appears that pdf-redact-tools was removed in Feature #15291. Although this and other tools used for metadata analysis can be installed on Tails with an internet-connected computer, there’s some particularities with SecureDrop’s airgap requirement that make the addition of new software much more challenging and largely impractical. It would be great to have the tools mentioned in Freedom of the Press Foundation’s guide on metadata (https://freedom.press/training/everything-you-wanted-know-about-media-metadata-were-afraid-ask/) installed by default. This would include:

  • pdf-redact-tools
  • tesseract-ocr
  • ffmpeg

Subtasks


History

#1 Updated by sajolida 2019-10-29 20:50:57

  • Assignee set to huertanix

Good to see you again @huertanix!

While working on the Additional Software feature last year, having them work offline was part of our goals. See https://tails.boum.org/contribute/design/additional_software_packages/: “Ensure packages are installed even offline”.

I think that were considering that it would be fine for a Tails USB stick used offline to still be plugged to the Internet when being configured, then restarted and used only offline.

To study the impact of your request a bit more in details, I created a branch that adds pdf-redact-tools, tesseract-ocr, and ffmpeg to see how many extra megas they would add to the ISO image. The ISO image will be available on https://nightly.tails.boum.org/build_Tails_ISO_feature-17178-metadata-tools/ after some time.

I’d also like to understand better what makes the current implementation of Additional Software impractical in your case (because maybe we can fix that instead!):

  • What are the airgap requirements of Secure Drop that make it impractical?
  • How are you dealing with this right now?

#3 Updated by huertanix 2019-10-31 17:53:31

sajolida wrote:
> Good to see you again @huertanix!

Good to see you too!

> I’d also like to understand better what makes the current implementation of Additional Software impractical in your case (because maybe we can fix that instead!):
>
> * What are the airgap requirements of Secure Drop that make it impractical?
> * How are you dealing with this right now?

In this case, the airgap architecture we use requires a Tails drive that is only connected to a machine which has never or ever will connect to the internet, so unfortunately a temporary connection to the internet for the download is not an option.

At the moment, we have a pre-Additional Software tool process that involves copying apt cache files via clean USB drive to the airgap machine and using dpkg to install things one thing at a time. However, it would be amazing if the Additional Software tool had the ability to make this process easier so that it could export the packages, dependencies etc needed into one big file that can then be sneakernet transferred to the airgap and imported on the airgap’s Additional Software tool. Or something along those line.

#4 Updated by sajolida 2019-11-06 14:04:24

What I understand from your scheme (and my relative knowledge of Debian) is that you end up running packages (and their installation scripts) that were downloaded from elsewhere as root on your airgaped Tails.

I understand that you’re using this airgap scheme to protect from potentially very powerful adversaries that could corrupt the airgaped Tails. But how do you prevent such a powerful adversary to infect the packages that you copy to the airgaped Tails? I’m assuming here that the OpenPGP verification of Debian happens at the time APT downloads the packages but not when packages are installed using dpkg, but maybe I got this wrong.

I’m asking this because, in the end, we might realize that it’s not worth building a tool to install Additional Software over sneakernet. If so, then maybe the only possible answer to your problem would be to never run as root anything that’s not included in Vanilla Tails.

What are the packages that you install this way? pdf-redact-tools, tesseract-ocr, and ffmpeg? Anything else?

#5 Updated by huertanix 2019-11-06 15:17:10

sajolida wrote:
>
> I understand that you’re using this airgap scheme to protect from potentially very powerful adversaries that could corrupt the airgaped Tails. But how do you prevent such a powerful adversary to infect the packages that you copy to the airgaped Tails? I’m assuming here that the OpenPGP verification of Debian happens at the time APT downloads the packages but not when packages are installed using dpkg, but maybe I got this wrong.

I also think the PGP signature verification happens upon download rather than install. I may be wrong but if I’m thinking this through correctly, an adversary would have to seize the package maintainer’s PGP private key to create a valid signature, and if that’s a possibility, then it could happen to any Debian package, include one shipped with Tails.

> What are the packages that you install this way? pdf-redact-tools, tesseract-ocr, and ffmpeg? Anything else?

Although there’s no mainline debian package for it, it would be awesome to see peepdf (https://github.com/jesparza/peepdf) ported; It’s already on Kali Linux’s repos, which is Debian based, not sure if that makes it easier.

#6 Updated by sajolida 2019-11-06 16:26:31

> I may be wrong but if I’m thinking this through correctly, an adversary would have to seize the package maintainer’s PGP private key to create a valid signature, and if that’s a possibility, then it could happen to any Debian package, include one shipped with Tails.

I was seeing it differently. If you assume that you have a strong enough
adversary to possibly corrupt your airgaped Tails if it gets online
(ie. an adversary that has a way of rooting your Tails), then what
prevents this same adversary from rooting the computer from which the
packages are downloaded and put a malicious .deb on the USB drive that
you use to install additional packages using dpkg on the airgaped Tails?

> Although there’s no mainline debian package for it, it would be awesome to see peepdf (https://github.com/jesparza/peepdf) ported; It’s already on Kali Linux’s repos, which is Debian based, not sure if that makes it easier.

So, in order to prevent you from using Additional Software on your
airgaped Tails you would need pdf-redact-tools, tesseract-ocr, ffmpeg,
and peepdf. And until then, you would not gain much security or not for
all users. If my security analysis above makes sense, then it’s the same
security risk whether you install 1 or 10 additional packages.

I could see Tails making an exception here and include the packages that
SecureDrop users need, because of these airgap requirements and because
SecureDrop users are very important to us. But I would need more
opinions from the team. @intrigeri: ^

Regarding peepdf, if there’s already a .deb in Kali Linux, then it might
be a question of maintaining it in Debian. The best would be to have you
sponsor someone to maintain the package in Debian and maybe we can help
you find the right person. How would this sound?

#7 Updated by huertanix 2019-11-06 18:11:43

sajolida wrote:
> I was seeing it differently. If you assume that you have a strong enough
> adversary to possibly corrupt your airgaped Tails if it gets online
> (ie. an adversary that has a way of rooting your Tails), then what
> prevents this same adversary from rooting the computer from which the
> packages are downloaded and put a malicious .deb on the USB drive that
> you use to install additional packages using dpkg on the airgaped Tails?

My understanding is that the use of the airgap is mainly there to keep any potential malicious code sent in via SecureDrop from escaping into the internet or newsroom network to exfiltrate data (such as a PGP private key) on the airgap. General attacks on Tails itself/an adversary getting root via the internet are a concern, but one that’s mitigated by the containment of malicious files.

> I could see Tails making an exception here and include the packages that
> SecureDrop users need, because of these airgap requirements and because
> SecureDrop users are very important to us. But I would need more
> opinions from the team. @intrigeri: ^

That would be awesome! Would love to hear the rest of the team’s thoughts.

> Regarding peepdf, if there’s already a .deb in Kali Linux, then it might
> be a question of maintaining it in Debian. The best would be to have you
> sponsor someone to maintain the package in Debian and maybe we can help
> you find the right person. How would this sound?

Perhaps; I’m also wondering if it would be easy enough for the Kali package maintainer(s) to package it for mainline Debian to prevent having two different versions in Kali. I’ll try to reach out to them.

#8 Updated by intrigeri 2019-11-09 09:36:20

Hi,

huertanix wrote:
> sajolida wrote:
>> I could see Tails making an exception here and include the packages that SecureDrop users need, because of these airgap requirements and because SecureDrop users are very important to us. But I would need more opinions from the team. intrigeri: ^

> That would be awesome! Would love to hear the rest of the team’s thoughts.

I’m open to this as well, as long as the list of such packages remains small enough and the security/UX impact for other Tails users is OK.

Now, my understanding is that adding these packages right now to Tails will not solve anything, given SecureDrop admins will still need to go through painful steps to install peepdf anyway. Correct?

>> Regarding peepdf, if there’s already a .deb in Kali Linux, then it might be a question of maintaining it in Debian. The best would be to have you sponsor someone to maintain the package in Debian and maybe we can help you find the right person. How would this sound?

> Perhaps; I’m also wondering if it would be easy enough for the Kali package maintainer(s) to package it for mainline Debian to prevent having two different versions in Kali. I’ll try to reach out to them.

This sounds good to me. The Kali folks are generally happy to maintain their stuff directly in Debian so there’s some hope :)

#9 Updated by sajolida 2019-11-14 17:36:13

> I’m open to this as well, as long as the list of such packages remains small enough and the security/UX impact for other Tails users is OK.

Ok, then let’s do this. The packages that we removed in 3.14, 3.16, and
4.0 didn’t cause a lot of noise, so I’m confident that it’ll be easy to
remove these as well if we ever have to in the future.

> Now, my understanding is that adding these packages right now to Tails will not solve anything, given SecureDrop admins will still need to go through painful steps to install peepdf anyway. Correct?

@huertanix: Indeed, that’s an important point. It makes sense for us to
include everything that you need in Tails to prevent the painful (and
possibly dangerous) dpkg scripts, but as long as even 1 of your packages
is missing from Debian and Tails, it won’t change anything to you if to
already have a few included in Tails.

So please clarify which list of packages that are already included in
Debian would make a significant difference for you and prevent the
painful dpkg scripts.

#10 Updated by sajolida 2019-11-14 17:36:34

  • Status changed from New to Confirmed

#11 Updated by sajolida 2019-11-15 18:23:04

> My understanding is that the use of the airgap is mainly there to keep any potential malicious code sent in via SecureDrop from escaping into the internet or newsroom network to exfiltrate data (such as a PGP private key) on the airgap. General attacks on Tails itself/an adversary getting root via the internet are a concern, but one that’s mitigated by the containment of malicious files.

I think that I finally wrapped my head around this. Let’s see if I got it right. So the threat model is that you don’t trust the airgapped machine because it opens all kind of unsafe documents. You don’t want to ever connect it to the Internet because you’re afraid of some unsafe document being able to root it and exfiltrate data, for example. But you’re fine with trusting some other machine, for example, to upgrade the airgapped Tails or copy .deb files to it.

Then I think that it makes sense to provide some extension of the Additional Software mechanism. For example, Tails could automatically install any .deb file in a given directory. I don’t know if it makes sense from an APT/DPKG point of view but we could have /live/persistence/TailsData_unlocked/deb and Additional Softwaren could forcefully install any Debian package in there. We wouldn’t need an interface for that and a mention in the doc would suffice.

@intrigeri: Does this make sense? Then we could forget about installing more packages and pushing the missing ones into Debian.

#12 Updated by intrigeri 2019-11-16 08:06:42

Hi,

sajolida wrote:
> Then I think that it makes sense to provide some extension of the Additional Software mechanism. For example, Tails could automatically install any .deb file in a given directory. I don’t know if it makes sense from an APT/DPKG point of view but we could have /live/persistence/TailsData_unlocked/deb and Additional Softwaren could forcefully install any Debian package in there.

Sounds doable.

> We wouldn’t need an interface for that and a mention in the doc would suffice.

I’m assuming this would be good enough in environment where there is technical support staff, that maintains the Tails stick, and ensures that the content of /live/persistence/TailsData_unlocked/deb will install nicely after the Tails system was upgraded. On the short term, with this scope, this idea sounds just fine to me.

But as we know, our users can be really creative and use whatever technical facility we provide in vastly different situations, for vastly different reasons, than what we would have imagined initially. So I bet that other folks, who don’t have such support staff around, will start using this new facility, and I’m concerned that in 2 years, our engineers have to polish this feature and properly support it for the general case, even though our initial decision was to support the new feature only for one specific use case. I’ve seen this happen a few times and I’m not super happy with this process: this sort of “put the finger into this gear and a few years later your entire arm is stuck in it too” process makes me concerned me wrt. decision making and longer-term strategy wrt. allocation of our resources. Writing “this is unsupported, use at your own risks” on the box might lower the risk a bit but I’m pretty sure I can find examples where that was not sufficient to avoid us going through this IMO problematic process.

For example, I can imagine users being confused when later down the road, after upgrading to Tails 5.0, Additional Software fails to install one of these extra packages they’ve put into /live/persistence/TailsData_unlocked/deb due to dependency issue. The user may think “oh well, I don’t need this package anymore” and go to the GUI we provide to manage the list of Additional Software. And oh well, the package is not listed there ⇒ confusion ⇒ the FT ends up having to spend lots of time making this feature good enough for the general case.

I don’t mean to block this idea with my concerns; but I’d like to ensure we take these concerns into account when making a decision about this new facility.

Brainstorming possibilities that could alleviate my concerns to some degree:

  • Not document this at all… but no doubt folks would quickly find out, and share the tip on Reddit or blog post tutorials about “how to install your preferred 3rd-party-app that’s not in Debian” (crytocurrency wallet, messaging client, you name it).
  • Document this in a specific place, outside of end-user doc, targeted specifically to technical staff that supports the kind of use cases we have in mind.
  • Negotiate now under which condition this semi-hidden facility should be turned into a fully supported feature in the future.
  • My main hope is that if we end up having to solve this problem for the general case, we do so by adding Flatpak support (probably needed for Feature #14567 and friends), instead of integrating the kludgy /live/persistence/TailsData_unlocked/deb facility into the GUI.
  • Other ideas?

Cheers!

#13 Updated by huertanix 2019-11-18 22:51:56

sajolida wrote:
> I think that I finally wrapped my head around this. Let’s see if I got it right. So the threat model is that you don’t trust the airgapped machine because it opens all kind of unsafe documents. You don’t want to ever connect it to the Internet because you’re afraid of some unsafe document being able to root it and exfiltrate data, for example. But you’re fine with trusting some other machine, for example, to upgrade the airgapped Tails or copy .deb files to it.
>

Yes, this is the way I understand it too. Generally the airgap Tails is upgraded by another fresh Tails drive, so any new packages installed in the latest Tails would become available on the newly-upgraded airgap Tails.

#14 Updated by sajolida 2019-11-20 20:47:39

I didn’t mean to trigger a meta discussion on supported/unsupported here, which would be better suited for Feature #16531 but I understand your concern.

As you might have noticed already, I’m not super fan of this terminology and, as time passed by, I think that my concerns have matured a bit in my head.

What we should be concerned about above is to serve our users, both current users and target users, in a strategical balance. Whether something has been flagged as “supported” or “unsupported” at some point in the past becomes pointless if we have the tools and the culture to always focus our resources on what matters the most to our users. If something that we might have flagged as “supported” becomes little relevant to our users, we shouldn’t put more energy into it, maybe let it rot, maybe remove it, etc. If something that we might have flagged as “unsupported” becomes important to our users, we should put more energy into it anyway.

I think that this is the essence of Agile, not as a coding practice, but as a business practice (sorry for the dirty word). Don’t draw hypothesis to early, try things out as cheaply as possible, evaluate the impact, and work incrementally to always maximize value.

This implies having good data about what matters the most to our users (and to us). That’s partly what user research is about, partly what better metrics could allow us to do, etc. This goes hand-in-hand with learning a bit more to say “no” or “sorry but fixing this is not in our priorities right now” to the vocal minority on Redmine on WhisperBack, which are more likely to be the script kiddies from Reddit than Cris or Kim.

Of course, removing things is hard and so adding things should be handled with real care. But here again, starting small and knowing better what really matters to our users would help us be more confident when deciding to improve or remove things.

I agree that what I’m saying here is pretty theoretical and we’re as a project pretty far away from making this a reality.

Regarding the matter at hand and seeing your concerns, it seems a better bet to got back to the idea of adding the packages that they need on demand as soon as they are available in Debian.

#15 Updated by intrigeri 2019-11-21 08:12:39

> What we should be concerned about above is to serve our users, both current users and target users, in a strategical balance. Whether something has been flagged as “supported” or “unsupported” at some point in the past becomes pointless if we have the tools and the culture to always focus our resources on what matters the most to our users. If something that we might have flagged as “supported” becomes little relevant to our users, we shouldn’t put more energy into it, maybe let it rot, maybe remove it, etc. If something that we might have flagged as “unsupported” becomes important to our users, we should put more energy into it anyway.

> I think that this is the essence of Agile, not as a coding practice, but as a business practice (sorry for the dirty word). Don’t draw hypothesis to early, try things out as cheaply as possible, evaluate the impact, and work incrementally to always maximize value.

> This implies having good data about what matters the most to our users (and to us). That’s partly what user research is about, partly what better metrics could allow us to do, etc. This goes hand-in-hand with learning a bit more to say “no” or “sorry but fixing this is not in our priorities right now” to the vocal minority on Redmine on WhisperBack, which are more likely to be the script kiddies from Reddit than Cris or Kim.

> Of course, removing things is hard and so adding things should be handled with real care. But here again, starting small and knowing better what really matters to our users would help us be more confident when deciding to improve or remove things.

> I agree that what I’m saying here is pretty theoretical and we’re as a project pretty far away from making this a reality.

I feel my concerns were correctly understood and taken into account.
I really like how you’ve summed this up, both wrt. long-term vision and wrt. where we are on this path at the moment. Your thinking on this is clearly more mature than my somewhat defensive initial reaction.

> Regarding the matter at hand and seeing your concerns, it seems a better bet to got back to the idea of adding the packages that they need on demand as soon as they are available in Debian.

This works for me. This being said, in principle, I’m also fine with trying your more ambitious idea out (I did not evaluate engineering costs yet though) and using it as a playground for starting with something small that provides immediate value, learning what really matters to our users, and being ready to consider removing this new feature if we realize its cost/benefit is not worth it anymore.

#16 Updated by sajolida 2019-11-28 17:16:35

Sorry @huertanix for the big meta discussion :)

Let’s try to summarize it in one short question for you:

  • Is it useful already for SecureDrop if we install pdf-redact-tools, tesseract-ocr, and ffmpeg by default in Tails?

#17 Updated by huertanix 2019-12-06 00:01:31

sajolida wrote:
> * Is it useful already for SecureDrop if we install pdf-redact-tools, tesseract-ocr, and ffmpeg by default in Tails?

It very much is! :)

#18 Updated by sajolida 2019-12-17 12:27:27

  • Status changed from Confirmed to Needs Validation
  • Assignee deleted (huertanix)
  • Feature Branch set to feature/17178-metadata-tools

Ok, so here is a branch.

I tested that it builds fine: https://nightly.tails.boum.org/build_Tails_ISO_feature-17178-metadata-tools/lastSuccessful/archive/build-artifacts/

It’s 7.8 MB bigger than 4.1. Not too bad.

#19 Updated by intrigeri 2019-12-20 11:41:16

  • Status changed from Needs Validation to Resolved
  • % Done changed from 0 to 100

Applied in changeset commit:tails|ac6034b91c0acaa188d32e19d5ff0aaf24bbd078.

#20 Updated by intrigeri 2019-12-20 11:43:21

  • Target version set to Tails_4.2

Note that I’m slightly dubious about the status of maintenance of pdf-redact-tools in Debian. We’ll see how it goes :)