Bug #17230

"input/output error" warning messages in Tails 4.0 logs

Added by goupille 2019-11-14 14:01:45 . Updated 2019-12-15 10:02:15 .

Status:
Confirmed
Priority:
Normal
Assignee:
goupille
Category:
Target version:
Start date:
Due date:
% Done:

0%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

I saw these error messages in several whisperback reports, I don’t know what trigger this warning:

amnesia /usr/lib/gdm3/gdm-x-session[4666]: (WW) xf86CloseConsole: KDSETMODE failed: Input/output error
amnesia /usr/lib/gdm3/gdm-x-session[4666]: (WW) xf86CloseConsole: VT_GETMODE failed: Input/output error
amnesia /usr/lib/gdm3/gdm-x-session[4666]: (WW) xf86CloseConsole: VT_ACTIVATE failed: Input/output error

Searching for input/output errors is a good way for us to find out if a USB stick memory is failing (I almost misdiagnose a report due to those)


Subtasks


Related issues

Related to Tails - Feature #5856: Detect SquashFS errors (bad medium or optical drives) while Tails is running Confirmed
Related to Tails - Bug #16030: SquashFS errors during boot lead to false-positives on graphics card error reports Confirmed 2018-10-03
Related to Tails - Bug #17351: udisksd reports tons of errors about failure to determine whether loop devices seem to be encrypted Confirmed

History

#1 Updated by intrigeri 2019-11-16 10:26:16

  • related to Feature #5856: Detect SquashFS errors (bad medium or optical drives) while Tails is running added

#2 Updated by intrigeri 2019-11-16 10:32:23

Hi @goupille,

> I saw these error messages in several whisperback reports, I don’t know what trigger this warning:

It’s caused by us killing GDM’s X session (with loginctl --signal SIGKILL kill-user Debian-gdm, a couple lines above these lines in the Journal) i.e. Bug #12092.
FTR, this code and the ensuing problem should go away once we switch to Wayland in Tails 5.0.

> Searching for input/output errors is a good way for us to find out if a USB stick memory is failing (I almost misdiagnose a report due to those)

I understand this. As you’ve realized, seeing the “Input/output error” string in itself is not sufficient to draw conclusions: depending on context, this message can mean lots of different things, and hardware I/O errors are only one of those.

I took a look and can think of no simple way to get rid of these errors without making the surrounding code/behavior more racy and fragile (which could result in problems like “sometimes, on some computers, Tails won’t display the GNOME desktop” i.e. “Tails does not fully start”). While saving some analysis time for our help would be sweet, IMO it is not worth taking the risk to make Tails less reliable for our users. So I’d like us to work together and explore ways in which we could make help desk’s life easier without taking such a risk.

One way to do this would be to clarify what kind of I/O error messages can indicate faulty hardware, i.e. what kind of messages help desk should be searching for. I believe that “SQUASHFS error” is a good search string for this case. What string do you search for usually?

Another way would be to prioritize Feature #5856 higher.

#3 Updated by goupille 2019-11-18 15:06:10

>One way to do this would be to clarify what kind of I/O error messages can indicate faulty hardware, i.e. what kind of messages help desk should be searching for. I believe that “SQUASHFS error” is a good search string for this case. What string do you search for usually?

I usually browse the ‘I/O’ and ‘Input/Output’ occurrences in the logs and try to understand why they are here (in case I have a doubt I usually compare them to my current logs)… at one point I remember searching specifically for ‘SQUASHFS’ but for some reason, I thought it would not cover all the cases of a broken memory (would there be a SQUASHFS error message if there was something broken on the persistent volume, for instance ?)…

#4 Updated by intrigeri 2019-11-22 11:25:02

Hi goupille,

>> One way to do this would be to clarify what kind of I/O error messages can indicate faulty hardware, i.e. what kind of messages help desk should be searching for. I believe that “SQUASHFS error” is a good search string for this case. What string do you search for usually?

> I usually browse the ‘I/O’ and ‘Input/Output’ occurrences in the logs and try to understand why they are here (in case I have a doubt I usually compare them to my current logs)…

Thank you. I see three ways to proceed from there:

  • (A) Rely on “in case I have a doubt I usually compare them to my current logs”: the specific false positive error messages we are talking about should be in your own logs too. But that boils down to rejecting this ticket merely because you already have a crappy way to handle that, which I’m not too keen on doing.
  • (B) Document in the help desk Git repo a list of known false positives that help desk should ignore. Would it save you any time vs. comparing with your own logs? If not, then it’s not better than the status quo.
  • © Filter out these false positives from the logs attached by WhisperBack. Do I understand correctly that it would fully solve the problem you’ve raised? It should be a rather simple matter of programming.

> at one point I remember searching specifically for ‘SQUASHFS’ but for some reason, I thought it would not cover all the cases of a broken memory (would there be a SQUASHFS error message if there was something broken on the persistent volume, for instance ?)…

Good point! I confirm that looking only for SquashFS errors will not spot problems with the persistent volume. It follows that prioritizing Feature #5856 higher would not fully address the problem at hand.

#5 Updated by goupille 2019-11-25 14:17:12

intrigeri wrote:
> Hi goupille,
>
> >> One way to do this would be to clarify what kind of I/O error messages can indicate faulty hardware, i.e. what kind of messages help desk should be searching for. I believe that “SQUASHFS error” is a good search string for this case. What string do you search for usually?
>
> > I usually browse the ‘I/O’ and ‘Input/Output’ occurrences in the logs and try to understand why they are here (in case I have a doubt I usually compare them to my current logs)…
>
> Thank you. I see three ways to proceed from there:
>
> * (A) Rely on “in case I have a doubt I usually compare them to my current logs”: the specific false positive error messages we are talking about should be in your own logs too. But that boils down to rejecting this ticket merely because you already have a crappy way to handle that, which I’m not too keen on doing.

that’s because those error messages were on my logs that I understood my almost mistaking and opened this ticket… I may have been not clear enough but I wanted to point out that, there are “warning” messages disguised as “errors”, and we (helpdesk), do not have a rigourous way to find out if a USB stick is failing.

> * (B) Document in the help desk Git repo a list of known false positives that help desk should ignore. Would it save you any time vs. comparing with your own logs? If not, then it’s not better than the status quo.
that would be great (but I wonder how hard it would be to maintain such a list)

> * © Filter out these false positives from the logs attached by WhisperBack. Do I understand correctly that it would fully solve the problem you’ve raised? It should be a rather simple matter of programming.
>
once there is a list of false positives for human to ignore, indeed, why not ask whisperback to ignore them as well…

a few years ago, just searching “error” in the logs would bring up the useful lines of the logs (alongside a few non relevant lines), now there are more than 300 occurences of “error” in any whisperback report from Tails 4.0

I’m aware that asking for whisperback to ignore some error messages might raise new issues, I really don’t know what would be best to do

#6 Updated by intrigeri 2019-12-15 09:48:52

  • related to Bug #16030: SquashFS errors during boot lead to false-positives on graphics card error reports added

#7 Updated by intrigeri 2019-12-15 10:02:15

  • Assignee changed from intrigeri to goupille

Hi goupille,

I don’t think we’ll have a perfect solution any time soon so I propose we start with the cheapest bits and iterate:

goupille wrote:
> intrigeri wrote:
>> * (B) Document in the help desk Git repo a list of known false positives that help desk should ignore. Would it save you any time vs. comparing with your own logs? If not, then it’s not better than the status quo.

> that would be great (but I wonder how hard it would be to maintain such a list)

OK, please go ahead. I’m not concerned too much about maintenance: the kind of problem this ticket is about gives you folks an incentive to new false positives there, and you’re the ones who primarily benefit from it, so the way I see it, it boils down to “the best you take care of yourselves, the nicer your work is” :)

On top of the false positive that started this conversation, you can add this one:

udisksd[3029]: Error determining whether device '/dev/loop1' seems to be encrypted: Failed to read device (g-bd-crypto-error-quark, 0)

>> * © Filter out these false positives from the logs attached by WhisperBack. Do I understand correctly that it would fully solve the problem you’ve raised? It should be a rather simple matter of programming.
>>
> once there is a list of false positives for human to ignore, indeed, why not ask whisperback to ignore them as well…

Here’s one reason against filtering these out from the logs: it removes some context and timing info that can be useful to understand the remaining log lines and debug problems. I’m not 100% sure what’s a good trade-off here so I propose we put this idea on the back burner and first do (B) plus what I’m suggesting below:

> a few years ago, just searching “error” in the logs would bring up the useful lines of the logs (alongside a few non relevant lines), now there are more than 300 occurences of “error” in any whisperback report from Tails 4.0

It turns out that the vast majority of these errors are about udisksd failing to determine whether a loop device is encrypted. kibi, wearing his Tails 4.1 RM hat, also raised this issue on Bug #17294. So I’ll file a ticket about this and will ask segfault if we can improve this somehow.

#8 Updated by intrigeri 2019-12-15 10:14:38

  • related to Bug #17351: udisksd reports tons of errors about failure to determine whether loop devices seem to be encrypted added