-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange UDP firewall behaviour #187
Comments
All Gossip for these nodes are running over 8301 ipfstat -t -P udp -G 9b11635d-453e-6ad4-f525-cf504ae5a541
|
At this point I had to redeploy the same configuration as before since I implemented the fix by disabling a rule. Here is a updated list of addresses / instances:
Working Client Rules:
Non Working Client Rules:
|
Snoop from the working and non working zones:
|
dig for the cname used to join:
restart of a non joined node:
|
Just hit this issue again while using Triton, this time with tcp, and ssh. Platform
Example
|
Enabling and Disabling the firewall through Triton / the CN clears up the non working fwrule. |
Dumb question: Are these zones native or LX? If so, there are some additional tests that MIGHT be able to run. |
@danmcd I was using lx with the following images: 7b5981c4-1889-11e7-b4c5-3f3bdfc9b88b To add some context, I am deplpoying all of these async, so the requests are coming into CloudAPI all at the same time. Not sure if there is a race or something. |
First off, thanks for information it's LX, that's useful. Doing a bit of diving (after knowing it's LX), however, it's not clear to me whether or not there's a way to find the race easily. |
@danmcd I am able to reproduce this today 3 times in a row, let me know if you want me to grab some state or more information. |
Lots of details up top makes it hard for me to understand the exact problem. I saw the snoops above, and the non-working one is sending packets that appear never to reach the peer (assuming your snoops are correct). Make sure you do single pings in both directions, and you do a single TCP connection in both directions while snooping. That'll help narrow things down a lot. One thing I noticed was a CFW rule containing "(PORT 22 AND PORT 8080)". AIUI, this means the TCP traffic must contain both port 22 AND port 8080. Am I wrong? (I don't know CFW that well, it's higher-level than where I normally hang out.) |
I'll try to clear up the scenario a bit. Currently in this environment where the behavior is occurring I only have 2 CNs. And most of the time the provisioned instances are landing on the same CN from what I gathered, but not always. I am provisioning all the instances at the same time with CloudAPI, using instance tags for the firewall rules. When the instances came up, some of them honored the FW rule and some didn't. What is strange, is that by just disabling the firewall on 1 of the instances, all of the other instances started to honor the rule and started passing traffic. If a disable and re-enable the fw rule they all start working fine also. So I don't think it's the way the rule is written. I'll attempt doing this in the morning 1 provision at a time, and see if it happens. |
"What is strange, is that by just disabling the firewall on 1 of the instances, all of the other instances started to honor the rule and started passing traffic." I'm guessing fwadm may do a brute-force reset of some kind. I'm very curious if there's a way to follow the bouncing packet in a zone whose fw rules appear to be in place, but aren't, per your description earlier. (I'm happy to help with this, but it'll require global-zone dtrace access on the CN with the faulty VM.) |
note: itt is aliased to triton -i
While deploying Consul with the firewall enabled I noticed a very strange behavior.
Deploying Consul Masters with firewall group "k8s_rethinkdb":
Masters Deployed and formed a healthy cluster:
Deploying the 5 RethinkDB nodes which run the consul agent who attempt to gossip on boot.:
Firewall Tags applied on Creation
fwadm list on CN that all of these instances are on:
Only 2 of the rethinkdb clients joined:
Disable the firewall rule for 1 of the clients that is in the cluster successfully:
All the other members are able to join the cluster "_"
The text was updated successfully, but these errors were encountered: