Online course 6: LAN scalability, adding pollers

Gabès Jean

Hi,

This is our 6th online course. After the 5th from Seb that was about creating checks, you got numerous ideas about what to check and make your IT even stronger. But more checks means more load, and maybe one poller is not enough now :)

We will look in this course about one #shinken core feature that will help you there: adding pollers is just tooooo easy :)

Let's start with two standard 2.0 installation in two linux box. We will disable all the daemons on the second box but not the poller.

You will be able to follow this tutorial on video soon.

Our setup with 2 linux boxes

We will have 2 linux and we want the second to be just a poller available for the first box shinken. The box courses-a will be our primary one, and courses-b the extended poller one. We just installed shinken in the 2 boxes. Both got a valid dns entry :

root@courses-a$ cat /etc/hosts
[...]
192.168.56.104  courses-a
192.168.56.105  courses-b
root@courses-b$ cat /etc/hosts
[...]
192.168.56.104  courses-a
192.168.56.105  courses-b

We want to have something like this :

Preparing the first box (master)

Let's first focus on the courses-a box. This will be quite quick. The poller we want to enable on the courses-b box will have to connect to the scheduler on the courses-a to get its job. It will have the scheduler address and port from a configuration send by the arbiter.

But the default address of the scheduler-master is with the good port (7768), but with the localhost address. This won't be a problem for the arbiter to reach it, but if the poller on the courses-b try, it won't have a change to reach the courses-a scheduler :p

So all we need to do is to setup the LAN address on the scheduler definition object:

root@courses-a$ vi /etc/shinken/schedulers/scheduler-master.cfg
define scheduler {
    scheduler_name      scheduler-master ; Just the name
    address             courses-a   ; IP or DNS address of the daemon
    port                7768        ; TCP port of the daemon
    spare               0   ; 1 = is a spare, 0 = is not a spare

    weight              1   ; Some schedulers can manage more hosts than others
    timeout             3   ; Ping timeout
    data_timeout        120 ; Data send timeout
    max_check_attempts  3   ; If ping fails N or more, then the node is dead
    check_interval      60  ; Ping node every N seconds
    modules
    realm   All
    skip_initial_broks  0
    use_ssl          0
}

Now the second box :)

All we need now it to look at the courses-b box, and let's disable LOT of daemons ^^

You must set daemon_enabled=0to the following files:

root@courses-b$ grep daemon_enabled /etc/shinken/daemons/*ini
/etc/shinken/daemons/brokerd.ini:daemon_enabled=0
/etc/shinken/daemons/pollerd.ini:daemon_enabled=1
/etc/shinken/daemons/reactionnerd.ini:daemon_enabled=0
/etc/shinken/daemons/receiverd.ini:daemon_enabled=0
/etc/shinken/daemons/schedulerd.ini:daemon_enabled=0

And also on the arbiter that is installed by default on the courses-b:

root@courses-b$ grep daemon_enabled /etc/shinken/shinken.cfg
daemon_enabled=0

Now you can restart your shinken daemons on the courses-b and look if only the poller is really started:

root@courses-b$ /etc/init.d/shinken -d restart
root@courses-b$ /etc/init.d/shinken status
Checking status of scheduler
scheduler NOT RUNNING
     failed!
Checking status of poller
poller RUNNING (pid 25504)
. ok
Checking status of reactionner
reactionner NOT RUNNING
     failed!
Checking status of broker
broker NOT RUNNING
 failed!
 Checking status of receiver
 receiver NOT RUNNING
     failed!
 Checking status of arbiter
 arbiter NOT 
     failed!

We temporary put the poller in debug mode so we will have all it's logs available to see the good connexions and internal data.

So now that our courses-b poller is ok and no other daemons will annoy it, we can declare it to the arbiter side on the courses-a server:

root@courses-a$ cp /etc/shinken/pollers/poller-master.cfg /etc/shinken/pollers/poller-courses-b.cfg 
root@courses-a$ vi /etc/shinken/pollers/poller-courses-b.cfg
define poller {
    poller_name     poller-courses-b
    address         courses-b
    port            7771
    spare           0

    manage_sub_realms   0   ; Does it take jobs from schedulers of sub-Realms?
    min_workers         0   ; Starts with N processes (0 = 1 per CPU)
    max_workers         0   ; No more than N processes (0 = 1 per CPU)
    processes_by_worker 256 ; Each worker manages N checks
    polling_interval    1   ; Get jobs from schedulers each N minutes
    timeout             3   ; Ping timeout
    data_timeout        120 ; Data send timeout
    max_check_attempts  3   ; If ping fails N or more, then the node is dead
    check_interval      60  ; Ping node every N seconds
    modules
    use_ssl          0
    realm   All
}

So the only modification you need are the poller_name and address parameters. You can keep the others by default.

Restart and look at your success :)

Then restart the arbiter shinken on the ourses-a server:

root@courses-a$ /etc/init.d/shinken-arbiter restart

The arbiter-master will load the new poller and will give it some new job, like taking checks from the courses-a's scheduler :)

If we look at the courses-b logs, we can see the connexion:

root@courses-b$ grep 'scheduler-master' /var/log/shinken/pollerd.log
[1410170010] INFO: [Shinken] [poller-courses-b] Init connection with scheduler-master at http://courses-b:7768/ (3s,120s)
[1410170010] WARNING: [Shinken] [poller-courses-b] Scheduler scheduler-master is not initialized or has network problem: Connexion error to http://courses-a:7768/ : couldn't connect to host
[1410170012] INFO: [Shinken] [poller-courses-b] Connection OK with scheduler scheduler-master
[1410170012] INFO: [Shinken] [poller-courses-b] Init connection with scheduler-master at http://courses-a:7768/ (3s,120s)
[1410170012] INFO: [Shinken] [poller-courses-b] Connection OK with scheduler scheduler-master

Don't worry about the WARNING here. It's just the poller is too fast for the scheduler to be fully running with a configuration. If you got some few seconds after a Connection OK it means you are ok :)

So now you can look at this poller and look if it really get new checks from the scheduler:

root@courses-b$ tail -f /var/log/shinken/pollerd.log
...
[1410170033] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:1 (Queued:1 TotalReturnWait:1)
...

Here the poller ask the scheduler and get one check and one other result to return to this scheduler. You will have one line by worker and by scheduler. So this poller is working great :)

Did we forgot something?

But we forgot some things: pollers are launching check plugins, located in the /var/lib/shinken/libexec/ directory or in the /usr/lib/nagios/plugins/ directory for standard nagios plugins ones. And here you poller got none of them :)

So for each pollers you need to add the same plugins, so for example here you can just :

root@courses-b$ apt-get install nagios-plugins

for the nagios plugins, and:

root@courses-a$ scp -rp /var/lib/shinken/libexec/* root@courses-b:/var/lib/shinken/libexec/

And you will have all you need on the courses-b poller.

The next course

The next course will be an extension of this one. Here all checks are distributed in all our pollers, in a round robin way. But maybe we want to dedicate one box for a specific checks or hosts? We will see this in the next course, about the poller tags, and one classic usage for this: pollers in DMZ :)

jean gabès

Posted in: Talk

Tagged as: shinken / online course / scalability

Wanna tell me what you think? Get in touch.


comments powered by Disqus
caret shinken