Fast reading for primitive types #71

KeKsBoTer · 2024-05-16T14:47:09Z

Hello,

I often want to read the whole data from an array (using .into_vec()).
I have noticed that this is considerably slower for large arrays compared to numpy.

Numpy:

This crates takes 6 times as long just for reading:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = File::open("big.npy")?;

    let npy = npyz::NpyFile::new(file)?;
    let mut sum:f32 = 0.;
    let start = std::time::Instant::now();
    let data = npy.into_vec::<f32>()?;

    println!("reading took: {:?}", start.elapsed());
    
    // do something
    println!("length {:?}", data.len()  );
    

    for arr in data {
        sum+=arr;
    }
    println!("{:.4}", sum);
    Ok(())
}

output:

This boils down to the reader reading and parsing every primitive one by one.
In many cases, we can copy the data into memory and reinterpret it (this is also what Numpy does).

I added a fast read functionality for primitive types using the bytemuck crate.
This makes it about 10 times faster:

What are your thoughts about this?
My solution adds minimal code and only speeds up the reads for the primitives where it is safe.

Sorry for the convoluted git history...please squash it on merge in GitHub.

Best
Simon

ExpHP · 2024-07-07T11:33:20Z

I considered adding block-reading functionality to the crate but concluded I was just reinventing NpyFile<BufReader<File>>. How does this compare to that in benchmarks?

KeKsBoTer and others added 8 commits April 23, 2024 16:35

make public

2acca88

test

30cce5f

added fast read

5085c73

fix

c2a21c5

added read_many

18fbfce

fix half dependencies

428b306

cleanup

876add7

only use with feature

09bd710

AsherJingkongChen approved these changes Jul 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast reading for primitive types #71

Fast reading for primitive types #71

Uh oh!

KeKsBoTer commented May 16, 2024

Uh oh!

ExpHP commented Jul 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fast reading for primitive types #71

Are you sure you want to change the base?

Fast reading for primitive types #71

Uh oh!

Conversation

KeKsBoTer commented May 16, 2024

Uh oh!

ExpHP commented Jul 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants