Feature #14601

Know which ressources we would need to run Matomo on our infrastructure

Added by sajolida 2017-09-04 21:51:25 . Updated 2019-03-02 13:11:50 .

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
2017-09-04
Due date:
% Done:

0%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

When importing logs on the prototype:

  • Were all CPU cores used during this process?
  • Was I/O a blocker, i.e. were processes blocked waiting for I/O?
  • Was all available memory used by this process?
  • Did you configure MariaDB in any way to optimize for large DBs?

To start with, we need:

  • The list of package dependencies
  • What access you need beside a shell (e.g. write access to file X, ability to run command Y as root)
  • The list of DBs and directories to backup
  • Resources requirements (ideally: current needs & what you’ll need in 2 years).

Files

import logs.webm (1501473 B) sajolida, 2017-10-26 15:33:39

Subtasks


Related issues

Related to Tails - Bug #11680: Upgrade server hardware (2017-2019 edition) Resolved 2016-09-19
Related to Tails - Feature #14846: Understand the user agent issue in the logs of our website Resolved 2017-10-13
Related to Tails - Feature #14872: Use Matomo to analyze the 2017 donation campaign Resolved 2017-10-20

History

#1 Updated by intrigeri 2017-09-05 08:17:16

  • related to Bug #11680: Upgrade server hardware (2017-2019 edition) added

#2 Updated by sajolida 2017-10-16 14:13:55

  • blocks Feature #14761: Core work 2017Q4 → 2018Q1: User experience added

#3 Updated by sajolida 2017-10-20 17:06:57

  • Target version set to Tails_3.3

#4 Updated by sajolida 2017-10-26 15:41:33

Regarding the ressource usage when importing logs, I’m attaching a screencast of my prototype machine importing some logs and running iotop and top. I hope this helps!

> * Were all CPU cores used during this process?

The imports_logs scripts is simulating web requests and hitting back Apache. MySQL was the biggest CPU eater one 1 thread, and then Apache with 4 threads.

I’m not very good at top, how can I see which cores are used and how much?

> * Was I/O a blocker, i.e. were processes blocked waiting for I/O?

It doesn’t seem so, but again, I’m not sure how to check that. I tried to run the OS plugging my hard disk inside the computer instead of running from USB as earlier and the importing speed was roughly the same: 1.5 h/day of logs.

> * Was all available memory used by this process?

No. I have 4GB and only 320MB were used.

> * Did you configure MariaDB in any way to optimize for large DBs?

No.

#5 Updated by sajolida 2017-10-26 15:45:19

  • related to Feature #14846: Understand the user agent issue in the logs of our website added

#6 Updated by sajolida 2017-10-26 15:45:35

  • related to Feature #14872: Use Matomo to analyze the 2017 donation campaign added

#7 Updated by sajolida 2017-10-26 15:49:07

  • Assignee changed from sajolida to intrigeri
  • QA Check set to Info Needed

Now I’m wondering if it’s crazy to ask you for a disposable VM to test Piwik on our infra as part of the analysis of the donation campaign 2017. See Feature #14872.

I know that I can get Piwik running in 1-2 hours and the donation campaign analysis might be a good occasion to experiment with it with a clear objective (and also avoid the headaches and doubts I had parsing the logs with custom code this year). See Feature #14846.

It might help us get a better idea on the ressources we’ll need. Then we can destroy that VM or discuss improving it.

But I can also continue to run Piwik on my prototype machine like I’ve been doing until now if you think that the extra work is not worth it.

#8 Updated by intrigeri 2017-10-26 17:55:46

  • Assignee changed from intrigeri to sajolida

> Now I’m wondering if it’s crazy to ask you for a disposable VM to test Piwik on our infra as part of the analysis of the donation campaign 2017.

When would you need it? (I’m asking because last year’s analysis was done a looong time after the end of the campaign, so perhaps a relaxed timeframe would work even though I’m sure you want to be faster this time.)

#9 Updated by sajolida 2017-11-14 13:22:54

  • Target version changed from Tails_3.3 to Tails_3.5

#10 Updated by intrigeri 2017-12-17 12:01:09

sajolida wrote:
> Regarding the ressource usage when importing logs, I’m attaching a screencast of my prototype machine importing some logs and running iotop and top. I hope this helps!

It does!

> > * Were all CPU cores used during this process?
>
> The imports_logs scripts is simulating web requests and hitting back Apache. MySQL was the biggest CPU eater one 1 thread, and then Apache with 4 threads.
>
> I’m not very good at top, how can I see which cores are used and how much?

In top, type “1” and you’ll get per-core (really: per-hyperthread) usage. Or use htop instead, which is nicer in many ways :)

> > * Was I/O a blocker, i.e. were processes blocked waiting for I/O?
>
> It doesn’t seem so, but again, I’m not sure how to check that.

The “wa” number is “time waiting for I/O completion”; that’s around 10% in your case.

> I tried to run the OS plugging my hard disk inside the computer instead of running from USB as earlier and the importing speed was roughly the same: 1.5 h/day of logs.

Thanks, this is interesting, see below.

> > * Was all available memory used by this process?
>
> No. I have 4GB and only 320MB were used.

+ 1.1GB (increasing) in buffer/cache.

So looking at your video, there’s no obvious bottleneck: not much I/O wait, not much I/O operations, plenty of free memory and I see that your CPUs are idle ~40% of the time. So I suspect that either the import script does not send as many HTTP requests in parallel as it could, or Apache/PHP is configured to handle too many parallel requests at a time.

What value did you pass to the --recorders option? The doc says: “It should be set to the number of CPU cores in your server. You can also experiment with higher values which may increase performance until a certain point”. If you did not use this option, then I think this explains your results, and I’d like to see a new benchmark that actually tries to use the available hardware resources :)

In passing, you did not pass --enable-reverse-dns nor --disable-bulk-tracking, did you?

#11 Updated by sajolida 2017-12-17 16:01:03

It seems like you read the doc much more than me. The command I ran was:

/var/www/misc/log-analytics/import_logs.py --url=http://localhost/ \
                                           --enable-http-errors --enable-http-redirects --enable-static --enable-bots \
                                           --idsite=1 \
                                           --log-format-regex='.* ((?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*' \
                                           access.log-YYYY-MM-DD.gz

So yeah, it’s with only 1 recorder :)

I want to play more with Piwik in the coming weeks and having the prototype machine up and running for days. I should even be able to give you a hand on the machine if you want.

I’ll do some more benchmarking and report again here.

#12 Updated by sajolida 2017-12-17 16:13:51

As you mentioned on XMPP, I would be super happy to try this on a VM in the cloud first. Maybe we should have a closer look at what kind of data we are confident to give to a VM in the cloud (I don’t know if you’re thinking about Amazon or Greenhost or whatever).

#13 Updated by sajolida 2018-01-11 01:32:48

  • Subject changed from Know which ressources we would need to run Piwik on our infrastructure to Know which ressources we would need to run Matomo on our infrastructure

Piwik is now Matomo: https://matomo.org/blog/2018/01/piwik-is-now-matomo/

#14 Updated by anonym 2018-01-23 19:52:44

  • Target version changed from Tails_3.5 to Tails_3.6

#15 Updated by sajolida 2018-03-13 13:07:51

  • Target version changed from Tails_3.6 to Tails_3.7

#16 Updated by sajolida 2018-03-27 20:24:05

  • blocked by deleted (Feature #14761: Core work 2017Q4 → 2018Q1: User experience)

#17 Updated by sajolida 2018-03-27 20:24:12

  • blocks Feature #15392: Core work 2018Q2 → 2018Q3: User experience added

#18 Updated by sajolida 2018-05-07 16:11:37

  • Target version deleted (Tails_3.7)

#19 Updated by intrigeri 2018-08-09 14:18:09

sajolida wrote:
> I want to play more with Piwik in the coming weeks and having the prototype machine up and running for days. I should even be able to give you a hand on the machine if you want.
>
> I’ll do some more benchmarking and report again here.

Reminder: next time you play with Matomo, please use as many recorders as your machine supports and report back here :)

#20 Updated by sajolida 2018-10-29 14:20:59

  • blocked by deleted (Feature #15392: Core work 2018Q2 → 2018Q3: User experience)

#21 Updated by sajolida 2019-03-02 13:11:50

  • Status changed from Confirmed to Rejected
  • Assignee deleted (sajolida)

I’m rejecting because I don’t think we’ll work on this any time soon.