Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negate attribute inside Group element in services.xml issue #78

Open
igorwidlinski opened this issue May 11, 2013 · 13 comments
Open

Negate attribute inside Group element in services.xml issue #78

igorwidlinski opened this issue May 11, 2013 · 13 comments

Comments

@igorwidlinski
Copy link
Contributor

Hi there,

We are running Bcfg2 server 1.3.1 with ServiceCompat,POSIXCompat as we are still to upgrade our clients from 1.2.x.

After upgrading the server to 1.3.1 from 1.2.1 we noticed some services started to flap (shutdown and start) at random times.

We have quite a bit of services in our services.xml, but only three are flapping and these three have negate attribute in the services.xml.

Here is basic outline of our services, including one flapping service called postfix:

<Rules priority="0">
    <Group name="osx">
             ..... services here
     </Group>
     <Group name="scientific-linux">
        .... services here ...
                <Group name="mail-server" negate="true">
                        <Service type="chkconfig" name="postfix" status="on"/>
                </Group>
                <Group name="mail-server">
                        <Service type="chkconfig" name="postfix" status="off"/>
                </Group>
        ... more services..
     </Group>
    ... more groups ...
</Rules>

If i understand the negate function, it should set postfix to on on all servers that are not in mail-server group, and should turn off postfix on server that is in mail-server (we do not use postfix on our mailservers).

Here are some logs from the gui showing the service is being turned on/off after bcfg2 runs on the client that belongs to mail-server group:

Turn it ON:
Problem Type    Expected    Found
Status  on  off
Occurences on 2013-05-11
x.com   May 11, 2013, 12:45 p.m.
x.com   May 11, 2013, 10:45 a.m.
Turn it OFF:
Problem Type    Expected    Found
Status  off     on
Occurences on 2013-05-11
x.com   May 11, 2013, 11:45 a.m.

Thanks!

@solj
Copy link
Member

solj commented May 13, 2013

Can you cache a copy of your client's configuration with bcfg2 -qnc /tmp/cached.xml and look through there for any service entries with name='postfix'. My best guess at this point would be that there are conflicting entries being sent to the client.

@igorwidlinski
Copy link
Contributor Author

Ok, I had to run this commend about 50 times before it returned different .xml file:

Here is the diff between the 49 times that correct entry is sent, and the one time that postfix is set to on.

< </Path><Action name="newaliases" timing="post" status="check" when="modified" command="/usr/bin/newaliases"/><Package name="postfix" version="auto" type="yum"/><Service name="postfix" status="off" type="chkconfig" mode="default"/></Bundle><Bundle name="md"/><Bundle name="monit"><Path name="/etc/monit.conf" group="root" paranoid="true" sensitive="false" important="False" secontext="__default__" mode="0500" owner="root" type="file" perms="0500"> set daemon  120
---
> </Path><Action name="newaliases" timing="post" status="check" when="modified" command="/usr/bin/newaliases"/><Package name="postfix" version="auto" type="yum"/><Service name="postfix" status="on" type="chkconfig" mode="default"/></Bundle><Bundle name="md"/><Bundle name="monit"><Path name="/etc/monit.conf" group="root" paranoid="true" sensitive="false" important="False" secontext="__default__" mode="0500" owner="root" type="file" perms="0500"> set daemon  120

It is very perplexing....

@solj
Copy link
Member

solj commented May 13, 2013

You should be able to debug this using bcfg2-info buildbundle on the server. Maybe double-check that the service entry is not present anywhere else in the repository?

@igorwidlinski
Copy link
Contributor Author

Yep, I m pretty sure it is not available anywhere else. They appear twice, but one has negate attribute, and one doesn't. Only services with this kind of setup cause problems..

@stpierre
Copy link
Member

It sounds to me like the host's group membership might be flapping. Is
that possible?

On Mon, May 13, 2013 at 4:37 PM, igorwidlinski notifications@github.comwrote:

Yep, I m pretty sure it is not available anywhere else. They appear twice,
but one has negate attribute, and one doesn't. Only services with this kind
of setup cause problems..


Reply to this email directly or view it on GitHubhttps://github.com//issues/78#issuecomment-17838893
.

Chris St. Pierre

@igorwidlinski
Copy link
Contributor Author

I do not think so. We do not use probes for the groups. And it should show up in my diff I've posted previously.

By the way I can't run buildbundle. Says "no such bundle". I've tried it for all bundles we have set up (installing bundles on bcfg2-clients works fine).

@stpierre
Copy link
Member

Make sure you're using the bundle filename, not the 'name' attribute. E.g.:

bcfg2-info buildbundle postfix.xml foo.example.com

A change in group membership wouldn't necessarily show up in that diff,
unless you had a template that dumped group membership to a file or
similar. The cached config doesn't contain a list of all groups itself.

Are you using group categories at all?

On Mon, May 13, 2013 at 5:14 PM, igorwidlinski notifications@github.comwrote:

I do not think so. We do not use probes for the groups. And it should show
up in my diff I've posted previously.

By the way I can't run buildbundle. Says "no such bundle". I've tried it
for all bundles we have set up (installing bundles on bcfg2-clients works
fine).


Reply to this email directly or view it on GitHubhttps://github.com//issues/78#issuecomment-17841189
.

Chris St. Pierre

@igorwidlinski
Copy link
Contributor Author

Ah yes, we actually generate /etc/bcfg2.info that contains bunch of stuff, including groups:

...bunch of stuff here...
= Groups =
{% for group in sorted(metadata.groups) %}\
${group}
{% end %}\

This file wasn't changed.

Here is build bundle for postfix for our mail-server :


> buildbundle mail.xml x.com
<Bundle name="mail">
  <Package name="postfix"/>
  <Service name="postfix"/>
  <Path name="/etc/aliases"/>
  <Path name="/etc/postfix/main.cf"/>
  <Action name="newaliases"/>
</Bundle>

Nobody down here knows what categories are so we probably are not using them.

@solj
Copy link
Member

solj commented May 14, 2013

Categories provide a way to restrict group assignments (http://docs.bcfg2.org/server/plugins/grouping/metadata.html#element:Group). I agree with @stpierre. Either the group is changing or there are multiple postfix Service entries showing up in the client's configuration.

@igorwidlinski
Copy link
Contributor Author

I looked at the reporting website for the flapping events and the group membership has not changed:
before:
bcfg2_0

few hours later:
bcfg2_1

@lukecyca
Copy link
Contributor

Hey @solj & @stpierre,
I created a small test repository to try and reproduce this problem given the conditions we think trigger this issue, and I could not reproduce it. That seems to suggest it is indeed specific to our (rather large and complicated) repository. I'm going to work with Igor to try and isolate this further.

In the mean time, if you think of any other hints or possibilities that may have changed between 1.2.1 and 1.3.1, please let us know.

@igorwidlinski
Copy link
Contributor Author

Our mail server checks in at 45 minute each hour. Only machines that check in at exactly same time have postfix flapping...

Normally we have 4-6 machines checking in at the same time.

@igorwidlinski
Copy link
Contributor Author

Similar issue happening to our autofs services. We set autofs to "off" on Suse, and to "on" on Linux servers. When both Suse and Linux check at exactly same time, sometimes the linux machine will have autofs set to off, and later on it will revert to on.

I still can't reproduce in testing environment though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants