Skip to content

5.6.0 npt x64 binary hanging #1287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpswan opened this issue Aug 23, 2024 · 7 comments · Fixed by #1290
Closed

5.6.0 npt x64 binary hanging #1287

cpswan opened this issue Aug 23, 2024 · 7 comments · Fixed by #1290
Assignees
Labels
bug Something isn't working

Comments

@cpswan
Copy link
Member

cpswan commented Aug 23, 2024

Describe the bug

Whilst testing for https://github.com/atsign-foundation/operations/issues/179 I'm finding that the npt binary from the 5.6.0 is causing a hung connection.

Steps to reproduce

  1. First I get sshnpd installed on a test VM
  2. Then I run up a test web server (in a gnu screen) with python -m http.server
  3. And then I run a daemon that's permissioned to connect to that web server (port 8000) sshnpd -a @bareindoornetball -m @cpswan -d ocinpd1 --po localhost:8000
  4. Then I run an npt client to connect: npt -f @cpswan -t @bareindoornetball -d ocinpd1 -p 8000 -l 8000 -r @rv_eu -K
  5. Then (in another shell) I try to get something from the web server with: curl localhost:8000

That curl just hangs, whilst if I use npt from the 5.5.0 release it's just fine.

Expected behavior

I get the HTML wrapped directory listing from the test VM

@cpswan cpswan added the bug Something isn't working label Aug 23, 2024
@cconstab
Copy link
Member

cconstab commented Aug 23, 2024

WIth my testing these are my results against a 5.5.0 sshnpd on a remote machine

using the 5.6.0 npt

╰$ ./npt -f @cconstab -t @ssh_1 -r @rv_am -d orac -l 9000 -p 22 -T 0 -K
Connecting ... Connected
2024-08-23 15:40:20.971849 : Requested "never" timeout: set to 365 days
2024-08-23 15:40:20.984932 : Sending daemon feature check request
2024-08-23 15:40:20.991174 : Fetching host and port from srvd
2024-08-23 15:40:23.409389 : Received host and port from srvd
2024-08-23 15:40:23.409498 : Waiting for daemon feature check response
2024-08-23 15:40:23.786456 : Received daemon feature check response
2024-08-23 15:40:23.788417 : Required daemon features are supported
2024-08-23 15:40:24.312620 : Sending session request to the device daemon
2024-08-23 15:40:24.616581 : Waiting for response from the device daemon
2024-08-23 15:40:25.678414 : Received response from the device daemon
2024-08-23 15:40:25.697107 : Will use local port 9000
2024-08-23 15:40:25.709360 : Creating connection to socket rendezvous
2024-08-23 15:40:25.713331 : Sending session request to the device daemon
2024-08-23 15:40:25.953350 : Waiting for response from the device daemon
2024-08-23 15:40:26.104417 : Received response from the device daemon
2024-08-23 15:40:26.130513 : Will use local port 9000
2024-08-23 15:40:26.133414 : Creating connection to socket rendezvous
2024-08-23 15:40:26.162776 : npt is listening on localhost:9000

Then connecting to the port 9000 with ncat I see

╭─cconstab@cally in ~
╰$ ncat localhost 9000
��,��2���٫7��}eGЏ�ICCS��VZ0���BU%�_^C
╭─cconstab@cally in ~

but using 5.5.0 npt

╭─cconstab@cally in ~/test/sshnp
╰$ npt -f @cconstab -t @ssh_1 -r @rv_am -d orac -l 9000 -p 22 -T 0 -K
Connecting ... Connected
2024-08-23 15:39:55.398137 : Requested "never" timeout: set to 365 days
2024-08-23 15:39:55.409452 : Sending daemon feature check request
2024-08-23 15:39:55.413924 : Fetching host and port from srvd
2024-08-23 15:39:58.084404 : Received host and port from srvd
2024-08-23 15:39:58.084515 : Waiting for daemon feature check response
2024-08-23 15:39:58.084550 : Received daemon feature check response
2024-08-23 15:39:58.086382 : Required daemon features are supported
2024-08-23 15:40:01.132847 : Sending session request to the device daemon
2024-08-23 15:40:01.382362 : Waiting for response from the device daemon
2024-08-23 15:40:02.295409 : Received response from the device daemon
2024-08-23 15:40:02.328552 : Will use local port 9000
2024-08-23 15:40:02.329504 : Creating connection to socket rendezvous
2024-08-23 15:40:02.380042 : npt is listening on localhost:9000

the ncat looks just fine

╰$ ncat localhost 9000
SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.10

Using the 5.6.0 sshnpd and ssh forwarding works just fine

╭─cconstab@cally in ~/test/sshnp
╰$ ./sshnp -f @cconstab -t @ssh_1 -r @rv_am -d orac -o "-L 9000:127.0.0.1:22"
2024-08-23 15:56:36.984674 : Sending daemon feature check request
2024-08-23 15:56:36.988323 : Resolving remote username for user session
2024-08-23 15:56:37.949124 : Resolving remote username for tunnel session
2024-08-23 15:56:37.961247 : Fetching host and port from srvd
2024-08-23 15:56:40.252292 : Received host and port from srvd
2024-08-23 15:56:40.252393 : Waiting for daemon feature check response
2024-08-23 15:56:40.252420 : Received daemon feature check response
2024-08-23 15:56:40.254384 : Required daemon features are supported
2024-08-23 15:56:44.127565 : Sending session request to the device daemon
2024-08-23 15:56:44.512379 : Waiting for response from the device daemon
2024-08-23 15:56:45.572417 : Received response from the device daemon
2024-08-23 15:56:45.600560 : Creating connection to socket rendezvous
2024-08-23 15:56:45.773560 : Starting user session
Warning: Permanently added '[localhost]:35649' (ED25519) to the list of known hosts.
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-92-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information as of Fri Aug 23 08:56:46 AM PDT 2024

  System load:  0.0                 Processes:            621
  Usage of /:   65.0% of 203.11GB   Users logged in:      1
  Memory usage: 16%                 IPv4 address for br0: 192.168.1.22
  Swap usage:   4%

Expanded Security Maintenance for Applications is not enabled.

29 updates can be applied immediately.
To see these additional updates run: apt list --upgradable

17 additional security updates can be applied with ESM Apps.
Learn more about enabling ESM Apps service at https://ubuntu.com/esm

*** System restart required ***
Last login: Fri Aug 23 08:44:22 2024 from 192.168.1.15
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdf[5] sdb[0] sde[7] sdd[6] sdc[1](F)
      23441685504 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [U_UUU]
      bitmap: 46/59 pages [184KB], 65536KB chunk

unused devices: <none>
Filesystem       1K-blocks        Used  Available Use% Mounted on
tmpfs              6577568        5376    6572192   1% /run
/dev/sda1        212978716   138423012   63710828  69% /
tmpfs             32887824          16   32887808   1% /dev/shm
tmpfs                 5120           4       5116   1% /run/lock
tmpfs             32887824         648   32887176   1% /run/qemu
tmpfs              6577564          88    6577476   1% /run/user/120
tmpfs              6577564         156    6577408   1% /run/user/1000
/dev/md0       23348263728 13818292980 8357870092  63% /store
tmpfs              6577564         120    6577444   1% /run/user/1001
cconstab@orac:~$

then ncat

╭─cconstab@cally in ~
╰$ ncat localhost 9000
SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.10

And yes, I have a disk out in my raid array :-(

@gkc
Copy link
Contributor

gkc commented Aug 23, 2024

Looks like the npt_to_port_22 test has not been running in recent action runs which likely explains how this slipped through

@gkc
Copy link
Contributor

gkc commented Aug 23, 2024

Looks like the npt_to_port_22 test has not been running in recent action runs which likely explains how this slipped through

That wasn't the case, I just wasn't reading the test output properly

@gkc
Copy link
Contributor

gkc commented Aug 23, 2024

Progress:

If I run npt with the -x flag (exit once connected) then everything works fine. This is the scenario which the e2e test covers.

However if I run npt without the -x flag, so npt keeps running in the foreground, then the program hangs. This scenario is not covered by the e2e tests.

@gkc
Copy link
Contributor

gkc commented Aug 23, 2024

Identified the bug, introduced in this PR

In brief: when running npt from the command line without the -x flag, the new _preRun function is being executed twice, thus sending two requests to the daemon with the same parameters

Have made a fix and tested; will make a PR. Have created another ticket to add the e2e test

@cconstab
Copy link
Member

How does that explain the oddness with the encryption ?

@gkc
Copy link
Contributor

gkc commented Aug 23, 2024

Two srv's on the daemon side; both open control sockets; srvd now has two connections from daemon.

Client creates control socket - gets joined at the srvd to the 1st control socket on the daemon side

Client creates "real" socket - gets joined at the srvd to the 2nd control socket on the daemon side. Encryption keys are therefore different, therefore boom

@gkc gkc closed this as completed in #1290 Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants