Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running iterative alg stuck in Ubuntu system #552

Open
stefenmax opened this issue May 29, 2024 · 18 comments
Open

Running iterative alg stuck in Ubuntu system #552

stefenmax opened this issue May 29, 2024 · 18 comments

Comments

@stefenmax
Copy link

Hi, I run your code smoothly on Windows, when I transfer to linux, after compile, it could run the forward and backprojection on my data. But every time When I run OSART-TV like below, it will stuck with no response. In windows it give me response within few seconds.
algs.ossart_tv(proj, self.geo, angles, niter=1, init = init)
Thanks for your help

Specifications

  • python version:3.10
  • OS:Linux
  • CUDA version:12.2
    conda list
    image
@AnderBiguri
Copy link
Member

There are a couple of rare issues that may be causing this, but its been hard to debug because I can't reproduce it.

One thing to try: in the following function, a new geoemtry is created from the input one.

geox.sVoxel[1:] = geox.sVoxel[1:] * 1.1 # a bit larger to avoid zeros in projections

Can you try changing the code locally so it doesn't do this modification of the geoemtry? Just the copy.

@stefenmax
Copy link
Author

Do you mean comment this line right? I tried and failed. But I tried some Krylov subspace algorithms like CGLS and LSQR it worked, That is weired. But the OSART-TV's performence is the best...

@AnderBiguri
Copy link
Member

@stefenmax not just that line, but the few after.
Apologies I am in a trip so can't help much, but the idea is to pass an un modified geo to Atb

@stefenmax
Copy link
Author

Thanks for you help. But it still didn't works. Maybe I should run it using windows. And I found that the speed is faster than linux lol

@AnderBiguri
Copy link
Member

AnderBiguri commented May 29, 2024

hum... I don't really know then why.
As I can not reproduce I would need to know which function hangs, is there any way you can try to figure that out?
I have extensively used TIGRE in Linux, so its certainly a specific case of geometry, CUDA, number of GPUS, OS, python version or something like that that causes this strange error, but its hard for me to figure out simply because I don't see it.

I'll keep the issue open, if you do happen to pinpoint what exactly hangs (has to be some Ax() or Atb() call somewhere) do let me know. I do suspect its set_w or set_v that hang...

@stefenmax
Copy link
Author

I found that I can run the ossart algogrithm in the example.py in my linux system. So I tried replace my geometry using the head phantom and found it hangg in the tigre.Ax. That is weired cause previously I could do the Ax and FDK for my own data. Here is the example code, I don't know if you can reproduce this.

from __future__ import division
from __future__ import print_function

import numpy as np
import tigre
import tigre.algorithms as algs
from tigre.utilities import sample_loader
from tigre.utilities.Measure_Quality import Measure_Quality
import tigre.utilities.gpu as gpu
import matplotlib.pyplot as plt
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
### This is just a basic example of very few TIGRE functionallity.
# We hihgly recomend checking the Demos folder, where most if not all features of tigre are demoed.

listGpuNames = gpu.getGpuNames()
if len(listGpuNames) == 0:
    print("Error: No gpu found")
else:
    for id in range(len(listGpuNames)):
        print("{}: {}".format(id, listGpuNames[id]))

gpuids = gpu.getGpuIds(listGpuNames[0])
print(gpuids)

# Geometry
# geo1 = tigre.geometry(mode='cone', high_resolution=False, default=True)
img_size = 256
geo = tigre.geometry(mode="cone")
geo.DSD = 950
geo.DSO = 540
geo.nDetector = np.array([1, 835]) 
geo.dDetector = np.array([1, 0.9643345*950 / 835])
geo.sDetector = geo.dDetector * geo.nDetector
geo.nVoxel = np.array([1, img_size, img_size])
geo.sVoxel = geo.nVoxel
geo.dVoxel = geo.sVoxel / geo.nVoxel 
geo.accuracy=0.5  
angles = np.linspace(0, np.pi/2, 180, dtype=np.float32)
# Prepare projection data
head = sample_loader.load_head_phantom(geo.nVoxel)
breakpoint()
proj = tigre.Ax(head, geo, angles, gpuids=gpuids)
test = tigre.Atb(proj,geo,angles,backprojection_type="matched",gpuids=gpuids)
# Reconstruct
niter = 20
fdkout = algs.fdk(proj, geo, angles, gpuids=gpuids)
breakpoint()
ossart = algs.ossart(proj, geo, angles, niter, blocksize=20, gpuids=gpuids)

# Measure Quality
# 'RMSE', 'MSSIM', 'SSD', 'UQI'
print("RMSE fdk:")
print(Measure_Quality(fdkout, head, ["nRMSE"]))
print("RMSE ossart")
print(Measure_Quality(ossart, head, ["nRMSE"]))

# Plot
fig, axes = plt.subplots(3, 2)
axes[0, 0].set_title("FDK")
axes[0, 0].imshow(fdkout[geo.nVoxel[0] // 2])
axes[1, 0].imshow(fdkout[:, geo.nVoxel[1] // 2, :])
axes[2, 0].imshow(fdkout[:, :, geo.nVoxel[2] // 2])
axes[0, 1].set_title("OS-SART")
axes[0, 1].imshow(ossart[geo.nVoxel[0] // 2])
axes[1, 1].imshow(ossart[:, geo.nVoxel[1] // 2, :])
axes[2, 1].imshow(ossart[:, :, geo.nVoxel[2] // 2])
plt.show()
# tigre.plotProj(proj)
# tigre.plotImg(fdkout)


@AnderBiguri
Copy link
Member

So it hangs in the Ax in this code?
What if you make a different amount of GPUs visible? Are they all the same GPU?

@stefenmax
Copy link
Author

yeah, it hangs in the Ax.
No it was not the same GPU. But in my another server, there are two same GPU. And it hangs in the same position.
image

@AnderBiguri
Copy link
Member

Certainly with different GPUs behaviour is undefined, so that would be an issue.

I'll try your specific geometry. But out of curiosity, if you change the nvoxel/ndetector a bit, does it still hang?

@stefenmax
Copy link
Author

Do you have any recommendation on how to change the nvoxel/ndetector?

@AnderBiguri
Copy link
Member

Just give it a different value, just to see if its the specific values causing the issue.

@stefenmax
Copy link
Author

Yes,after change it a bit. Still hang

@AnderBiguri
Copy link
Member

Apologies, I don't seem to be able to reproduce this in any way. If you can pinpoint where the error is, do let me know.

@timcogan
Copy link
Contributor

I have the same issue on Ubuntu. The code hangs here on my machine (I haven't stepped through the CUDA yet):

cuda_raise_errors(siddon_ray_projection(c_img, c_geometry[0], c_projections, c_angles, total_projections, c_gpuids[0]))

If interpolation_projection is used instead of siddon_ray_projection, the rest of the code seems to run OK:

W = Ax(
    # np.ones(geox.nVoxel, dtype=np.float32), geox, self.angles, "Siddon", gpuids=self.gpuids
    np.ones(geox.nVoxel, dtype=np.float32), geox, self.angles, "interpolated", gpuids=self.gpuids
)

@timcogan
Copy link
Contributor

This is where the code hangs inside Siddon_projection.cu:

cudaStreamSynchronize(stream[dev*2]);

@AnderBiguri
Copy link
Member

Thanks @timcogan ! Its strange that means that some of the previous stuff gets into some infinite loop. Its hard to debug because its parallel code that I can't stop, but this information helps a lot actually.

@AnderBiguri
Copy link
Member

What if you set the code to only use 1 GPU? does it still hang?

@timcogan
Copy link
Contributor

Yes, it hangs when using only 1 GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants