Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow Complex valued FFT #275

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kunzaatko
Copy link

When using the FFT algorithm, previously there was only a method that
used the rfft and irfft methods which rely on getting a Real array
input. Added a method and specialization that uses the the old method
for Real valued arrays and a new method that calls fft and ifft for
other arrays.

When using the FFT algorithm, previously there was only a method that
used the `rfft` and `irfft` methods which rely on getting a Real array
input. Added a method and specialization that uses the the old method
for Real valued arrays and a new method that calls `fft` and `ifft` for
other arrays.
@kunzaatko
Copy link
Author

kunzaatko commented Oct 19, 2024

@johnnychen94 Hi there! Can you, please, take a look at this PR? It should be a non-breaking change. Basically just allowing a function to take a Complex where a Real was previously required. I am relying on this functionality in my package and it would be nice to have it here, instead of adding a method from my side. Otherwise you can give me pointers on what to change and I will make those changes. BTW sorry for pinging if it is not the appropriate way to do this.

src/imfilter.jl Outdated
@@ -841,6 +841,11 @@ function _imfilter_fft!(r::AbstractCPU{FFT},
end

function filtfft(A, krn)
Copy link
Member

@johnnychen94 johnnychen94 Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the absence @kunzaatko.

Roughly speaking, JuliaImages treats two families of types as numeric types: Colorant{<:Number} and T<:Number.
Thus it's a good idea for the dispatch design to explicitly express this.

I'd like to see three methods:

# simple channelview/colorview wrapper over the plain number methods
filtfft(A::AbstractArray{CT}, krn) where CT<:Colorant 

filtfft(A::AbstractArray{T}, krn) where T<:Complex # your newly added method
filtfft(A::AbstractArray{T}, krn) where T<:Real

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Thank you for your work on this package!
I would argue that we are missing the methods for complex kernels in the two bottom methods. I implemented these in the new version.

Copy link
Member

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a small suggestion for clarity.

Adds an optimization for filtering a complex with a real array so that
it allows the real array to use `rfft` instead of `fft` to save half the
computation. The benchmarks do not show a speed-up of a 1/4 which would
theoretically be expected but a lower speed-up. For some reason do not
show lower memory allocation.

```
In [2]: A = rand(3000, 3000)
3000×3000 Matrix{Float64}:
 0.285528   0.708856  0.620611   0.045052   0.330913  0.213895  0.873503   0.656959   0.0861678  0.938603  …  0.463651  0.491763   0.523761   0.987486   0.707374  0.118724  0.523314   0.278283    0.040879
 0.546583   0.797982  0.419506   0.16004    0.134307  0.341135  0.767713   0.694436   0.465751   0.081541     0.636672  0.493196   0.212694   0.726784   0.137611  0.153863  0.102901   0.623923    0.285838
 0.667936   0.985847  0.316288   0.496242   0.330982  0.802932  0.759763   0.393804   0.27839    0.613204     0.917279  0.488461   0.208559   0.139706   0.013461  0.29185   0.893805   0.198656    0.260889
 0.541975   0.79197   0.593314   0.563877   0.668819  0.835136  0.814535   0.780967   0.402207   0.309717     0.535379  0.800072   0.261613   0.288121   0.743066  0.474665  0.104051   0.910556    0.660157
 0.188497   0.434635  0.464992   0.156539   0.712517  0.229329  0.0900374  0.157132   0.0464563  0.94832      0.594556  0.239523   0.0353333  0.628295   0.974684  0.293648  0.767198   0.00595716  0.917384
 ⋮                                                    ⋮                                                    ⋱                                             ⋮
 0.342483   0.849301  0.522493   0.147907   0.657853  0.316614  0.330971   0.960895   0.790472   0.570171  …  0.805825  0.575838   0.92688    0.545242   0.849474  0.553651  0.347863   0.284356    0.0710359
 0.470919   0.370001  0.17453    0.757803   0.47875   0.514264  0.965828   0.161526   0.490146   0.586126     0.886021  0.426073   0.992352   0.686143   0.277794  0.93995   0.311383   0.394133    0.925497
 0.159736   0.706443  0.0387112  0.219725   0.604491  0.827922  0.0177041  0.993723   0.722314   0.794828     0.661236  0.690974   0.24774    0.0865962  0.955458  0.542212  0.690087   0.779484    0.721536
 0.0613569  0.584916  0.22701    0.0871801  0.285467  0.214378  0.016846   0.0159929  0.0941595  0.814224     0.194526  0.0966802  0.885505   0.315272   0.896789  0.214747  0.0933588  0.803658    0.951828
 0.68916    0.891683  0.223701   0.573007   0.359542  0.151604  0.410088   0.298216   0.371781   0.609701     0.358088  0.0729099  0.142997   0.439933   0.161952  0.350227  0.326087   0.477912    0.386247

In [3]: B = rand(ComplexF32, 3000, 3000)
3000×3000 Matrix{ComplexF32}:
 0.314435+0.350523im   0.31276+0.000674367im   0.403332+0.0267981im   0.836204+0.5429im     …   0.352876+0.946266im    0.938128+0.477059im   0.958154+0.541663im    0.658549+0.493835im
 0.208547+0.811982im  0.506163+0.888442im      0.720267+0.775364im   0.0987085+0.34869im        0.114646+0.363576im   0.0707946+0.951316im   0.897746+0.781405im    0.446996+0.679693im
 0.166268+0.781975im   0.51661+0.284825im      0.488436+0.997122im     0.39242+0.633143im       0.852163+0.575789im     0.95094+0.867025im   0.963788+0.0771892im   0.810548+0.851137im
 0.596596+0.24115im   0.229569+0.74279im       0.716469+0.953966im    0.471434+0.197209im      0.0622576+0.935325im    0.745592+0.505512im   0.424859+0.444605im    0.727532+0.0781438im
 0.225914+0.979836im  0.988875+0.0307086im     0.686103+0.611004im   0.0149557+0.135689im      0.0653017+0.159109im    0.404576+0.134711im   0.220633+0.814599im   0.0810269+0.77369im
         ⋮                                                                                  ⋱
 0.999065+0.361406im  0.986685+0.90664im       0.512323+0.586943im    0.790515+0.239983im   …    0.58594+0.0124457im   0.608344+0.886342im   0.213773+0.25945im     0.339364+0.416441im
 0.458132+0.558931im   0.36812+0.744078im     0.0747138+0.843561im    0.244537+0.998928im       0.307395+0.996972im    0.220357+0.665658im   0.794098+0.536488im    0.945331+0.81305im
 0.632005+0.272713im  0.521595+0.943606im      0.684861+0.998334im   0.0401222+0.158511im        0.47845+0.813752im    0.378089+0.538594im   0.345376+0.468067im    0.867263+0.361034im
 0.785968+0.501483im  0.828497+0.626334im      0.603006+0.704717im    0.413903+0.0815281im      0.728322+0.529951im    0.227673+0.328137im   0.445953+0.994137im    0.953913+0.130831im
 0.938755+0.593728im  0.404837+0.707017im      0.135038+0.498083im    0.874043+0.694357im       0.687566+0.768597im    0.455904+0.358583im  0.0127361+0.115687im    0.415703+0.949933im

In [24]: function full_fft(A,B)
             A_fft = fft(A)
             B_fft = fft(B)
             A_fft .*= conj!(B)
             A_fft
         end
full_fft (generic function with 1 method)

In [30]: function real_fft(A,B)
             A_fft = rfft(A)
             B_fft = fft(B)
             _stretch_mul(Float64, A_fft, ComplexF32, conj!(B), length(axes(A,1)))
         end
real_fft (generic function with 1 method)

In [32]: @benchmark full_fft($A,$B)
BenchmarkTools.Trial: 15 samples with 1 evaluation.
 Range (min … max):  330.713 ms … 346.502 ms  ┊ GC (min … max): 0.72% … 0.98%
 Time  (median):     337.403 ms               ┊ GC (median):    0.94%
 Time  (mean ± σ):   338.132 ms ±   4.592 ms  ┊ GC (mean ± σ):  0.81% ± 0.34%

  ▁      ▁ ▁   █       ▁ ▁ ▁          ▁ ▁  █   ▁   ▁          ▁
  █▁▁▁▁▁▁█▁█▁▁▁█▁▁▁▁▁▁▁█▁█▁█▁▁▁▁▁▁▁▁▁▁█▁█▁▁█▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  331 ms           Histogram: frequency by time          347 ms <

 Memory estimate: 343.32 MiB, allocs estimate: 19.

In [33]: @benchmark real_fft($A,$B)
BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min … max):  261.506 ms … 467.978 ms  ┊ GC (min … max): 0.00% … 43.72%
 Time  (median):     267.245 ms               ┊ GC (median):    1.90%
 Time  (mean ± σ):   291.212 ms ±  57.510 ms  ┊ GC (mean ± σ):  9.90% ± 12.93%

   █
  ▅█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  262 ms           Histogram: frequency by time          468 ms <

 Memory estimate: 480.61 MiB, allocs estimate: 62.
```

[fixes: JuliaImages#275]
@kunzaatko
Copy link
Author

@johnnychen94 Now all these methods should be implemented. The implementation is quite cryptic, since I wanted to use the optimization for FFT in real arrays even if we are running imfilter for a complex image and a real kernel array or the other way around. There are benchmarks in the commit message. It does bring a little speed-up. Should I put more comments in the code itself so that anybody who wants to modify it in the future will have an easier time, or do you think that it is self explanatory and the notes that I have put in the code are enough?
I added a few basic tests but probably it could be tested more. Do you propose some tests for this using the general interface?
Do you have any suggestions for the implementation, or is this OK? (I also considered defining a new View type which would make the rfft output act similarly as fft output, but it would be even more convoluted in my opinion.
I am putting the benchmarks here for reference also. They are made for image and kernel sizes of 3000×3000 which are reasonable sizes for padded images. A full padding of a 1024×1024 image would have a similar size.

    ```
    In [2]: A = rand(3000, 3000)
    3000×3000 Matrix{Float64}:
     0.285528   0.708856  0.620611   0.045052   0.330913  0.213895  0.873503   0.656959   0.0861678  0.938603  …  0.463651  0.491763   0.523761   0.987486   0.707374  0.118724  0.523314   0.278283    0.040879
     0.546583   0.797982  0.419506   0.16004    0.134307  0.341135  0.767713   0.694436   0.465751   0.081541     0.636672  0.493196   0.212694   0.726784   0.137611  0.153863  0.102901   0.623923    0.285838
     0.667936   0.985847  0.316288   0.496242   0.330982  0.802932  0.759763   0.393804   0.27839    0.613204     0.917279  0.488461   0.208559   0.139706   0.013461  0.29185   0.893805   0.198656    0.260889
     0.541975   0.79197   0.593314   0.563877   0.668819  0.835136  0.814535   0.780967   0.402207   0.309717     0.535379  0.800072   0.261613   0.288121   0.743066  0.474665  0.104051   0.910556    0.660157
     0.188497   0.434635  0.464992   0.156539   0.712517  0.229329  0.0900374  0.157132   0.0464563  0.94832      0.594556  0.239523   0.0353333  0.628295   0.974684  0.293648  0.767198   0.00595716  0.917384
     ⋮                                                    ⋮                                                    ⋱                                             ⋮
     0.342483   0.849301  0.522493   0.147907   0.657853  0.316614  0.330971   0.960895   0.790472   0.570171  …  0.805825  0.575838   0.92688    0.545242   0.849474  0.553651  0.347863   0.284356    0.0710359
     0.470919   0.370001  0.17453    0.757803   0.47875   0.514264  0.965828   0.161526   0.490146   0.586126     0.886021  0.426073   0.992352   0.686143   0.277794  0.93995   0.311383   0.394133    0.925497
     0.159736   0.706443  0.0387112  0.219725   0.604491  0.827922  0.0177041  0.993723   0.722314   0.794828     0.661236  0.690974   0.24774    0.0865962  0.955458  0.542212  0.690087   0.779484    0.721536
     0.0613569  0.584916  0.22701    0.0871801  0.285467  0.214378  0.016846   0.0159929  0.0941595  0.814224     0.194526  0.0966802  0.885505   0.315272   0.896789  0.214747  0.0933588  0.803658    0.951828
     0.68916    0.891683  0.223701   0.573007   0.359542  0.151604  0.410088   0.298216   0.371781   0.609701     0.358088  0.0729099  0.142997   0.439933   0.161952  0.350227  0.326087   0.477912    0.386247
    
    In [3]: B = rand(ComplexF32, 3000, 3000)
    3000×3000 Matrix{ComplexF32}:
     0.314435+0.350523im   0.31276+0.000674367im   0.403332+0.0267981im   0.836204+0.5429im     …   0.352876+0.946266im    0.938128+0.477059im   0.958154+0.541663im    0.658549+0.493835im
     0.208547+0.811982im  0.506163+0.888442im      0.720267+0.775364im   0.0987085+0.34869im        0.114646+0.363576im   0.0707946+0.951316im   0.897746+0.781405im    0.446996+0.679693im
     0.166268+0.781975im   0.51661+0.284825im      0.488436+0.997122im     0.39242+0.633143im       0.852163+0.575789im     0.95094+0.867025im   0.963788+0.0771892im   0.810548+0.851137im
     0.596596+0.24115im   0.229569+0.74279im       0.716469+0.953966im    0.471434+0.197209im      0.0622576+0.935325im    0.745592+0.505512im   0.424859+0.444605im    0.727532+0.0781438im
     0.225914+0.979836im  0.988875+0.0307086im     0.686103+0.611004im   0.0149557+0.135689im      0.0653017+0.159109im    0.404576+0.134711im   0.220633+0.814599im   0.0810269+0.77369im
             ⋮                                                                                  ⋱
     0.999065+0.361406im  0.986685+0.90664im       0.512323+0.586943im    0.790515+0.239983im   …    0.58594+0.0124457im   0.608344+0.886342im   0.213773+0.25945im     0.339364+0.416441im
     0.458132+0.558931im   0.36812+0.744078im     0.0747138+0.843561im    0.244537+0.998928im       0.307395+0.996972im    0.220357+0.665658im   0.794098+0.536488im    0.945331+0.81305im
     0.632005+0.272713im  0.521595+0.943606im      0.684861+0.998334im   0.0401222+0.158511im        0.47845+0.813752im    0.378089+0.538594im   0.345376+0.468067im    0.867263+0.361034im
     0.785968+0.501483im  0.828497+0.626334im      0.603006+0.704717im    0.413903+0.0815281im      0.728322+0.529951im    0.227673+0.328137im   0.445953+0.994137im    0.953913+0.130831im
     0.938755+0.593728im  0.404837+0.707017im      0.135038+0.498083im    0.874043+0.694357im       0.687566+0.768597im    0.455904+0.358583im  0.0127361+0.115687im    0.415703+0.949933im
    
    In [24]: function full_fft(A,B)
                 A_fft = fft(A)
                 B_fft = fft(B)
                 A_fft .*= conj!(B)
                 A_fft
             end
    full_fft (generic function with 1 method)
    
    In [30]: function real_fft(A,B)
                 A_fft = rfft(A)
                 B_fft = fft(B)
                 _stretch_mul(Float64, A_fft, ComplexF32, conj!(B), length(axes(A,1)))
             end
    real_fft (generic function with 1 method)
    
    In [32]: @benchmark full_fft($A,$B)
    BenchmarkTools.Trial: 15 samples with 1 evaluation.
     Range (min … max):  330.713 ms … 346.502 ms  ┊ GC (min … max): 0.72% … 0.98%
     Time  (median):     337.403 ms               ┊ GC (median):    0.94%
     Time  (mean ± σ):   338.132 ms ±   4.592 ms  ┊ GC (mean ± σ):  0.81% ± 0.34%
    
      ▁      ▁ ▁   █       ▁ ▁ ▁          ▁ ▁  █   ▁   ▁          ▁
      █▁▁▁▁▁▁█▁█▁▁▁█▁▁▁▁▁▁▁█▁█▁█▁▁▁▁▁▁▁▁▁▁█▁█▁▁█▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
      331 ms           Histogram: frequency by time          347 ms <
    
     Memory estimate: 343.32 MiB, allocs estimate: 19.
    
    In [33]: @benchmark real_fft($A,$B)
    BenchmarkTools.Trial: 18 samples with 1 evaluation.
     Range (min … max):  261.506 ms … 467.978 ms  ┊ GC (min … max): 0.00% … 43.72%
     Time  (median):     267.245 ms               ┊ GC (median):    1.90%
     Time  (mean ± σ):   291.212 ms ±  57.510 ms  ┊ GC (mean ± σ):  9.90% ± 12.93%
    
       █
      ▅█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
      262 ms           Histogram: frequency by time          468 ms <
    
     Memory estimate: 480.61 MiB, allocs estimate: 62.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants