Bug #9900

Improve Website search

Added by BitingBird 2015-08-05 08:33:24 . Updated 2020-04-23 16:33:43 .

Status:
Confirmed
Priority:
Normal
Assignee:
Category:
Target version:
Start date:
2014-11-09
Due date:
% Done:

29%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Right now, it’s barely usable.

jvoisin said he would work on that.


Subtasks

Bug #9899: Website search should prioritize titles Confirmed

0

Bug #8247: Website search doesn't support quotes properly Confirmed

0

Bug #9898: Website search should allow to sort by language Confirmed

0

Feature #9904: Have an option to include mailing-lists to website search Resolved

100

Bug #12473: Search bar on the website points to 404 Confirmed

0

Bug #13575: Website search should indentify the different translations of one document, and only provide one result. Duplicate

0

Feature #17652: Consider using DuckDuckGo for search queries on our website Confirmed sajolida

0


Related issues

Related to Tails - Bug #11650: Analyze third-party search engine requests Confirmed 2016-08-16
Related to Tails - Bug #11649: Analyze internal search engine requests Confirmed 2016-08-16
Related to Tails - Feature #6569: Make the Tails documentation searchable offline Rejected 2014-01-05
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed

History

#1 Updated by jvoisin 2015-11-03 09:49:28

It seems that this is a known bug from ikiwiki, and there is little that I can do about this :/

#2 Updated by intrigeri 2015-11-05 03:09:06

> It seems that this is a known bug from ikiwiki, and there is little that I can do about this :/

Perhaps that explains Bug #9899, but probably none of the other subtasks.

#3 Updated by jvoisin 2016-08-04 10:46:32

Ikiwiki has some plugins to improves its searching capabilities, but they are either cumbersome or inefficient .

I took at look at what other privacy-minded websites are doing, and it seems that they are mostly using external search engines. For example, Qubes is using duckduckgo (at the bottom of the page).

This is an easy, zero-maintainance and effective way: I do trust more a generic web search engine to do effective, secure, meaningful and semantic searches than an ikiwiki plugin.

The question being, are we ok with externalising the search feature ?

#4 Updated by intrigeri 2016-08-05 02:52:32

  • Type of work changed from Website to Discuss

#5 Updated by elouann 2016-08-16 16:16:17

We have talked again about xapian today. See also https://ikiwiki.info/todo/different_search_engine/

#6 Updated by sajolida 2017-04-27 08:31:10

  • related to Bug #12473: Search bar on the website points to 404 added

#7 Updated by Anonymous 2017-04-27 10:06:16

I personally hate websites which send me to some external search because it’s often quite unusable.

#8 Updated by intrigeri 2017-04-29 11:24:04

elouann wrote:
> We have talked again about xapian today. See also https://ikiwiki.info/todo/different_search_engine/

JFTR ikiwiki already uses xapian, as written on top of the page you’re linking to.

#9 Updated by intrigeri 2017-04-29 11:25:27

  • related to deleted (Bug #12473: Search bar on the website points to 404)

#10 Updated by intrigeri 2017-04-29 11:36:43

> I personally hate websites which send me to some external search because it’s often quite unusable.

I can relate to this feeling, and I agree that ideally we should fix all the major problems of the ikiwiki internal search engine and not rely on an external one. Now, as the many subtasks of this ticket show, the current search UX on our website as already pretty bad, and the DDG results for the search you used as an example on Bug #12473 are much better than what we provide currently. So switching to DDG would be a great incremental improvement, and a great first step compared to what we currently have; I believe it’s easy to implement, and then we can discuss, next time we update our roadmap, how much resources we want to put into improving ikiwiki’s internal search engine.

What do you think? If you disagree, I would appreciate if you could elaborate a bit on “quite unusable”, e.g. with a few examples applied to how it would work for our website :)

#11 Updated by Anonymous 2017-05-03 20:47:57

Report of our discussion at the monthly meeting of May:

We agree that there are heavy problems currently.
- improving the search feature might also help decrease the backlog of frontdesk.

Are we okay to externalise the search feature?

- one of us feels uncomfortable to give up autonomy over the website

- two of us think it’s fine

- one of us is asking if this would raise security implications to give the
search string or several search strings of a user to a third party website.
What’s the worst that could happen? The user’s IP could be linked to these
strings and the person could be identified as a Tails user, if not using
TorBrowser.
- most of us think that the security implications for users should be reconsidered with more core people present at the meeting before taking any decision.

So let’s brainstorm first some more on the implications of this.

As a sidenote, we could research a better solution than the ikiwiki search.
Something like Apache/solr or another local search like phinde.

#12 Updated by Anonymous 2017-05-03 20:48:18

  • Type of work changed from Discuss to Research

#13 Updated by sajolida 2017-05-04 20:52:32

> I would appreciate if you could elaborate a bit on “quite unusable”, e.g. with a few examples applied to how it would work for our website.

I’m not a big user of search bars and maybe this comes from being too
frequently frustrated by the search results. But my gut feeling is that
I’m not often frustrated by bad in-house implementations (like ours)
than externalized search.

But yes, I’m very interested in learning about Ulrike’s experience.
A first thing that comes to my mind is whether the search results
provided by DDG would be integrated in our website design and navigation
or whether they would be a redirection to DDG’s website. Because being
dragged outside of the website would be a clear usability downside.

What else?

#14 Updated by Anonymous 2017-05-05 09:11:18

sajolida wrote:

> I’m not a big user of search bars and maybe this comes from being too
> frequently frustrated by the search results. But my gut feeling is that
> I’m not often frustrated by bad in-house implementations (like ours)
> than externalized search.

Same as me then.

> A first thing that comes to my mind is whether the search results
> provided by DDG would be integrated in our website design and navigation
> or whether they would be a redirection to DDG’s website. Because being
> dragged outside of the website would be a clear usability downside.

In general, when using externalized search, one is redirected to the external search engine page, see how it’s done at qubes-os.org.

> What else?

Well, for me there are indeed some privacy considerations to look into before externalizing this feature.

#15 Updated by Anonymous 2017-05-05 09:16:37

intrigeri wrote:
> > I personally hate websites which send me to some external search because it’s often quite unusable.
>
> I can relate to this feeling, and I agree that ideally we should fix all the major problems of the ikiwiki internal search engine and not rely on an external one. Now, as the many subtasks of this ticket show, the current search UX on our website as already pretty bad, and the DDG results for the search you used as an example on Bug #12473 are much better than what we provide currently. So switching to DDG would be a great incremental improvement, and a great first step compared to what we currently have; I believe it’s easy to implement, and then we can discuss, next time we update our roadmap, how much resources we want to put into improving ikiwiki’s internal search engine.
>
> What do you think? If you disagree, I would appreciate if you could elaborate a bit on “quite unusable”, e.g. with a few examples applied to how it would work for our website :)

By unusable I mean that with an externalized search I’m sent to another page with a very different design, so I might get lost. Once there, I need to click again to get back to where I want to be, and get the answer I was actually searching for.

After thinking about this issue for some days now, I feel that we should not do it because of the aforementioned UX reasons, as well as privacy considerations.
Furthermore, while discussing this at the last contributor meeting, some people said they never even use the internal search but they always use the browser built-in search themselves.

Can we have some stats about the usage of search pages on the websites?
Is it possible to update the Xapian DB more frequently using a cron job?

#16 Updated by Anonymous 2017-05-05 09:32:37

Oh, and let me add one more thing: while DDG is very good with search results in english, i find it sometimes hard to find information in other languages.

#17 Updated by sajolida 2017-05-16 10:30:24

Sidenote, I’m not doing to do stats manually about that on the Apache logs. I’d rather spend that time trying to set up Piwiki or something like this.

Still, I’m happy to help people on tails@boum.org to learn how to download the logs.

#18 Updated by Anonymous 2017-05-16 11:12:25

sajolida wrote:
> Sidenote, I’m not doing to do stats manually about that on the Apache logs. I’d rather spend that time trying to set up Piwiki or something like this.

Awstats maybe? :)

#19 Updated by intrigeri 2017-05-31 15:17:21

Hi!

u wrote:
> By unusable I mean that with an externalized search I’m sent to another page with a very different design, so I might get lost. Once there, I need to click again to get back to where I want to be, and get the answer I was actually searching for.

OK, thanks for clarifying! I personally find this UX better than not finding at all the page I was searching for, or finding N copies (Bug #9898), or finding a 404 (Bug #12473) but whatever, I’m not going to argue further on this point since I seem to be the only one who would find almost anything better than the current situation :]

> Furthermore, while discussing this at the last contributor meeting, some people said they never even use the internal search but they always use the browser built-in search themselves.

Indeed, I happen to do this myself as well (when I don’t simply use find or git grep locally), but that’s because our website’s search engine is so crappy I’ve entirely given up using it. I would be curious why these other people don’t use it. And then we’ll see what conclusion we can draw from this info.

> Can we have some stats about the usage of search pages on the websites?

Sure, in theory all tails@ members have access to the raw data. I did it myself this time, because I felt somewhat responsible for it after having restarted this discussion.

So, there have been 48k searches over the last 2 months, i.e. one every 1.8 minutes. I had a quick look at the last 200 of them, and the vast majority seems legit (user agent pretents it’s a real browser, and the query string is plausibly the kind of things I would expect humans interested in Tails would search on our website). But I have no clue how to understand this figure, other than “lots of people try using our internal search engine”: we don’t know if they’ll find what they were looking for, and if they fail then we don’t know if they’ll try again next time or will simply give up using this search engine. On this topic I concur with sajolida wrt. the need for more powerful web analytics tools.

> Is it possible to update the Xapian DB more frequently using a cron job?

I see no reason why it wouldn’t be possible, but that’s not a trivial coding task (nothing is trivial once concurrent access to data is involved). Rough guesstimates for an experienced Perl software developer, assuming the expected outcome is a bit more clearly specified first: probably 3-4 hours for someone who’s already at ease with the ikiwiki code base and the way our production website runs, and rather 10-12 hours otherwise. As usual, doubling or triping these estimates would be sound. And then add a couple hours to deploy this thing on our production website. From there, two comments:

  • Actually fixing the root cause of Bug #12473 would probably take less time than implementing this workaround.
  • This workaround, or a more proper solution, would address Bug #12473 only, and in the end we would still be left with the other issues this search engine suffers from. So I don’t think it’s worth doing it in isolation; but it could surely be made part of a Great Plan™ to fix all the biggest issues at once without relying on an external provider, since that’s apparently the option preferred by most of us.

> As a sidenote, we could research a better solution than the ikiwiki search.
> Something like Apache/solr or another local search like phinde.

I don’t know any of these tools so I’ll shut up. If someone has time to evaluate them, I suggest first looking at the subtasks to better understand the problems we’re trying to solve here. And to keep in mind: the hardest part with such tools might be to integrate it into the (rather restrictive, for security’s sake) setup our production website runs on.

At this point I’m giving up on the external search engine idea. I don’t feel like trying to actively lead this discussion to a conclusion myself. I hope someone else will catch the ball and we won’t be stuck for too long in the (crappy) status quo: there’s no way not to make any decision, as not deciding anything or postponing is de facto equivalent to deciding that we’re fine with keeping what we currently have until something happens.

Regardless, if required by the project, I could be the one working on the ikiwiki search engine, or any other solution that requires Perl skills. As I have probably made more than clear enough already, I still am rather unconvinced it would be a good use of our precious software development time, but if the project wants to prioritize this over some other things I could do, in the same time, with my Foundations Team hat, then I’ll comply without complaining too much: it’ll actually be fun hacking time for me ;)

Thanks everyone for your input!

#20 Updated by sajolida 2017-06-28 15:38:31

  • related to Bug #11650: Analyze third-party search engine requests added

#21 Updated by sajolida 2017-06-28 15:38:55

  • related to Bug #11649: Analyze internal search engine requests added

#22 Updated by sajolida 2017-06-28 16:31:09

From the DuckDuckGo website: https://duckduckgo.com/search_box

« Because of the way we generate our search results, we do not have the syndication rights to allow you to host our results on your site (e.g. in a frame). When your users click on the results they will be instead taken to our site. »

#23 Updated by sajolida 2017-08-03 18:57:39

#24 Updated by sajolida 2017-08-03 19:07:06

  • Assignee changed from jvoisin to sajolida

jvoisin: I’m shamelessly stealing this one from you. I hope you don’t mind.

I read a bit about search and UX lately and I want to understand better the technologies behind them and see if we can keep this under our control.

#25 Updated by sajolida 2017-10-03 07:20:24

  • blocked by deleted (Feature #13424: Core work 2017Q3: User experience)

#26 Updated by sajolida 2018-06-25 16:29:54

  • Assignee changed from sajolida to jvoisin

I tried to read the documentation of Xapian and Lucene and it’s definitely not written for human beings like me. I’m dropping the ball…

#27 Updated by Anonymous 2018-08-19 08:09:45

  • related to Feature #6569: Make the Tails documentation searchable offline added

#28 Updated by intrigeri 2019-04-14 07:03:12

  • Assignee deleted (jvoisin)

I’m de-assigning from jvoisin because AFAIK there’s no WIP by him on this front and I’d rather make it clear that we need someone to lead this conversation to a conclusion.

Then let’s first check whether DDG’s search results are substantially better than what we currently have (if they are not, it’s pointless to discuss privacy concerns about DDG). My claim that they were much better has not been challenged so far except perhaps:

u wrote:
> while DDG is very good with search results in english, i find it sometimes hard to find information in other languages.

u, was this feedback about using DDG for searching the web in general, or about searching our website specifically?

FWIW I’ve just given it a try with the “contraseña” Spanish word:

I see that our current internal search engine returns 3 results, that are all relevant. DuckDuckGo returns many more results, most of them being relevant and missing in ikiwiki’s own search results. Both have pros & cons wrt. ordering and presentation of results.

#29 Updated by intrigeri 2020-04-19 17:50:19