Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release branch v1.3.1 #93

Merged
merged 26 commits into from
Oct 11, 2023
Merged

Release branch v1.3.1 #93

merged 26 commits into from
Oct 11, 2023

Conversation

cuonglm
Copy link
Collaborator

@cuonglm cuonglm commented Sep 22, 2023

Minor Release

This is a minor release with new features, some performance improvements, and various bug fixes.

Added

  • Support the hosts file as a source for resolving hostnames.
  • Including the OS version in the DoH header.
  • Including the client's IP address when ctrld is an upstream of dnsmasq.

Improved

  • Making PTR lookup failures less noisy for users.
  • Upgrading quic-go to v0.38.0
  • ctrld now generates a working default configuration in both cd and local mode.
  • Using /etc/version to detect a UniFi Gateway.
  • The general performance and stability of ctrld:
    • All upstreams are monitored to prevent high resource consumptions in case of outage.
    • Guarding against DNS loops.
  • Reporting error when --cd/--cd-org is set to empty string.

Fixed

  • Service restart loop in EdgeOS.
  • Using Control D bootstrap DNS for the OS resolver, fixed UDM Pro and iOS / apple issue #59.
  • Fix the default route IP address with the public interface.
  • Fix the setting of DNS so that it takes effect in some cloud providers with systemd-resolved.
  • Fix ctrld home directory created randomly with windows RMM.

cuonglm and others added 19 commits September 22, 2023 18:27
Using the same approach as in cd mode, but do it only once when running
ctrld the first time, then the config will be re-used then.

While at it, also adding Dockerfile.debug for better troubleshooting
with alpine base image.
The only reason that forces ctrld to depend on vyatta-dhcpd service on
EdgeOS is allowing ctrld to watch lease files properly, because those
files may not be created at the time client info table initialized.

However, on some EdgeOS version, vyatta-dhcpd could not start with an
empty config file, causing restart loop itself, flooding systemd log,
making the router run out of memory.

To fix this, instead of depending on vyatta-dhcpd, we should just watch
for lease files creation, then adding them to watch list.

While at it, also making ctrld starts after nss-lookup, ensuring we have
a working DNS before starting ctrld.
Since mca-cli-op may not be available during boot time.
So in case no nameservers can be found, default OS resolver could still
resolve queries.
For reporting router queries, ctrld uses private IP of the default route
interface. However, when the default route is conntected directly to
ISP, the interface will have a public IP, and another interface with the
same MAC address will be created for LAN ip. So when no private IP found
for default route interface, ctrld must look at the other interface to
find the corret LAN ip.
txn2/txeh lower the hostname, which is not suitable for ctrld use case.
The current approach to get default route IP is finding the LAN
interface with the same MAC address. However, there could be multiple
interfaces like that, making ctrld confused.

This commit fixes this issue, by listing all possible private IPs, then
sorting them and use the smallest one for router self queries.
So ctrld can record the raw/original client IP instead of looking up
from MAC to IP, which may not the right choice in some network setup
like using wireguard/vpn on Merlin router.
In case the resolver could not reach nameserver, ptr discover should
only print error message once, then stop doing the query until the
nameserver is reachable. This would prevent ptr discover from flooding
ctrld log with a lot of duplicated messages.
So the current selected DNS server will be reset, and the new one will
be used by systemd-resolved after first query made.
Currently, ctrld assumes that NetworkManager is not available if writing
to /etc/NetworkManager/conf.d return directory not exist error. That
would work on most Linux distros. However, cloud provider may do some
hacks, causing ctrld confusion and think that NetworkManager is
available.

Fixing this by checking whether NetworkManager binary presents first.

While at it, also fixing a bug when restarting NetworkManager failed
causing ctrld hangs. The go-systemd library is not clear about this, but
the waitCh channel won't never be closed if error occurred, so we must
return immediately instead of receiving from it blindly.
So it's easier, more clear, more isolation between code on non-mobile
and mobile platforms.
Some users mentioned that when there is an Internet outage, ctrld fails
to recover, crashing or locks up the router. When requests start
failing, this results in the clients emitting more queries, creating a
resource spiral of death that can brick the device entirely.

To guard against this case, this commit implement an upstream monitor
approach:

 - Marking upstream as down after 100 consecutive failed queries.
 - Start a goroutine to check when the upstream is back again.
 - When upstream is down, answer all queries with SERVFAIL.
 - The checking process uses backoff retry to reduce high requests rate.
 - As long as the query succeeded, marking the upstream as alive then
   start operate normally.
@mbower
Copy link

mbower commented Sep 27, 2023

Been waiting on this one

Connie Lukawski and others added 4 commits October 4, 2023 16:34
RMM uses non-user account which results in config + socket file being
written to a random directory, which is not a real directory that can be
accessed.

Fix this by using directory of ctrld binary as user home dir.
Otherwise, the old code will leave un-used connections open-ed, causing
ports leaking and prevent others from creating UDP conn.
VPN clients often have empty MAC address, because they come from virtual
network interface. However, there's other setup/devices also create
virtual interface, but is not VPN.

Changing source of those clients to empty to prevent confustion in
clients list command output.
Since these ones are either ctrld itself or direct listener that ctrld
is being upstream for, which makes health check query always succeed.
Otherwise, actual hostname will be overriden with "localhost", which is
rather confusing/bad for UX.
@yegors yegors merged commit f1b8d1c into main Oct 11, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UDM Pro and iOS / apple issue
4 participants