Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tar.xz produces "InvalidData" error #1

Open
arturh85 opened this issue Aug 5, 2020 · 8 comments
Open

tar.xz produces "InvalidData" error #1

arturh85 opened this issue Aug 5, 2020 · 8 comments

Comments

@arturh85
Copy link

arturh85 commented Aug 5, 2020

I'm trying to extract a tar.xz file and I get an InvalidData io error.

The relevant pattern for .tar.xz seems to be missing in lib.rs:

match buffer {
        #[cfg(all(feature = "bzip", feature = "tar"))]
        [0x42, 0x5A] => Ok(Box::new(Tar::new(Bzip2::new(file)?)?)), // .tar.gz
        #[cfg(all(feature = "gzip", feature = "tar"))]
        [0x1F, 0x8B] => Ok(Box::new(Tar::new(Gzip::new(file)?)?)), // .tar.gz
        #[cfg(feature = "zip")]
        [0x50, 0x4B] => Ok(Box::new(Zip::new(file)?)), // .zip
        _ => Err(Error::from(ErrorKind::InvalidData))?,
    }

When I run the following code I:

    let path = Path::new("my.tar.xz");
    let mut file = File::open(&path)?;
    let mut buffer = [0u8; 2];
    file.read(&mut buffer)?;
    file.seek(SeekFrom::Start(0))?;
    for b in buffer.iter() {
        println!("{:x}", b);
    }

I get 0xfd and 0x37 as the first two bytes so would suggest the following addition:

        #[cfg(all(feature = "xz", feature = "tar"))]
        [0xFD, 0x37] => Ok(Box::new(Tar::new(Xz::new(file)?)?)), // .tar.xz

Thank you for this great crate!

@kinosang
Copy link
Contributor

kinosang commented Aug 6, 2020

io::Read is not implemented for archiver_rs::xz::Xz at this time.

lzma-rs does not export the decode::xz module, so we cannot call xz::decode_stream directly.

Would you suggest another LZMA library so we can move forward for .tar.xz support?

@kinosang
Copy link
Contributor

kinosang commented Aug 6, 2020

We have moved to xz2 for LZMA support.

The changes were not tested.

You can have a try at https://github.com/JoyMoe/archiver-rs/tree/issue-1

@arturh85
Copy link
Author

arturh85 commented Aug 6, 2020

Opening the archieve now works, extracing produces:

Custom { kind: Other, error: "cannot call entries unless archive is at position 0" }'

I removed my call to archieve.files().unwrap().len() beforehand, but the error still comes.

let mut archive = archiver_rs::open(archive_path)?;
// let cnt = archive.files().unwrap().len()
println!("opened");
for file in files {
  // ...
  archive.extract_single(&output_path, file).unwrap();
  println!("extracted {}", &file);
}

The "opened" output is printed, then the error. This worked well for zip files.

@kinosang
Copy link
Contributor

kinosang commented Aug 6, 2020

gotcha, I'll come back asap.

@arturh85
Copy link
Author

I'm starting to think the problem is more with the tar crate, I'm getting the same issue with the following code:

        let mut xz = archiver_rs::Xz::open(archive_path).expect("failed to open");
        let tar_path = workspace_path.join(PathBuf::from("tmp.tar"));
        xz.decompress(&tar_path).unwrap();
        drop(xz);
        let mut archive = archiver_rs::Tar::open(&tar_path).unwrap();
        let files = archive.files().unwrap();

The tmp.tar file is created and can be extracted with other tools like 7zip without issues.
But calling archive.files() causes the same error as above so it has nothing to do with Xz.
As my tar.xz file is quite large I created a small test.tar file with few test files and I get the same error.

@kinosang
Copy link
Contributor

I suggest not doing operations multi times. It may require Seek the stream to the beginning. I won't do Seek in this library internally because it may lead to performance issues.

@arturh85
Copy link
Author

This prevents using extract_single with any .tar files:

#find files
files
files/A
files/B
#tar -cvf foo.tar files

use std::path::Path;
use archiver_rs::Archive;

fn main() {
    let mut archive = archiver_rs::Tar::open(Path::new("foo.tar")).unwrap();
    archive.extract_single(Path::new("outA"), String::from("files/A")).unwrap();
    archive.extract_single(Path::new("outB"), String::from("files/B")).unwrap();
}

The first call to extract_single succeeds but the second produces the same "cannot call entries unless archive is at position 0" error even if you swap the order of the extract_single calls.

But: it works if I extract all at once, which prevents me from displaying progress and the output path is wrong so I need to move it around afterwards.

@kinosang
Copy link
Contributor

kinosang commented Sep 21, 2020

@arturh85 sorry for the delay, I've tried to add features to deal with your issue, but it will be dirty and ugly.

cuz tar::Archive, bzip2::read::BzDecoder, flate2::read::GzDecoder, and xz2::read::XzDecoder does not support Seek.

I suggest you to open your tar file with let mut file = File::open("foo.tar"), and then use it as archiver_rs::Tar::new(file).

after your first extract_single operation, please do file.seek(SeekFrom::Start(0))?;

it may solve your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants