Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CelebA-HQ 256x256 Data Pre-processing #41

Open
KomputerMaster64 opened this issue Sep 22, 2022 · 1 comment
Open

CelebA-HQ 256x256 Data Pre-processing #41

KomputerMaster64 opened this issue Sep 22, 2022 · 1 comment

Comments

@KomputerMaster64
Copy link

Thank you team for the sharing the project resources. I am trying to process the CelebA-HQ 256x256 dataset for the DDGAN model. The DDGAN repository recommends going over the dataset preparation methods in the NVAE repository (this repository).


The following commands will download tfrecord files from GLOW and convert them to store them in an LMDB dataset.

Use the link by openai/glow for downloading the CelebA-HQ 256x256 dataset (4 Gb).
To convert/store the CelebA-HQ 256x256 dataset to/as the lmdb dataset one needs to install module called "tfrecord".
The missing module error can be rectified by simply executing the command pip install tfrecord.



!mkdir -p $DATA_DIR/celeba
%cd $DATA_DIR/celeba
!wget https://openaipublic.azureedge.net/glow-demo/data/celeba-tfr.tar
!tar -xvf celeba-tfr.tar
%cd $CODE_DIR/scripts
!pip install tfrecord
!python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=train



The final command !python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=train gives the following output:

.
.
.
26300
26400
26500
26600
26700
26800
26900
27000
added 27000 items to the LMDB dataset.
Traceback (most recent call last):
  File "convert_tfrecord_to_lmdb.py", line 73, in <module>
    main(args.dataset, args.split, args.tfr_path, args.lmdb_path)
  File "convert_tfrecord_to_lmdb.py", line 58, in main
    print('added %d items to the LMDB dataset.' % count)
lmdb.Error: mdb_txn_commit: Disk quota exceeded


I am not sure I have made the LMDB dataset properly, I request you to guide me.

@LinWeiJeff
Copy link

Thank you team for the sharing the project resources. I am trying to process the CelebA-HQ 256x256 dataset for the DDGAN model. The DDGAN repository recommends going over the dataset preparation methods in the NVAE repository (this repository).

The following commands will download tfrecord files from GLOW and convert them to store them in an LMDB dataset. Use the link by openai/glow for downloading the CelebA-HQ 256x256 dataset (4 Gb). To convert/store the CelebA-HQ 256x256 dataset to/as the lmdb dataset one needs to install module called "tfrecord". The missing module error can be rectified by simply executing the command pip install tfrecord. !mkdir -p $DATA_DIR/celeba %cd $DATA_DIR/celeba !wget https://openaipublic.azureedge.net/glow-demo/data/celeba-tfr.tar !tar -xvf celeba-tfr.tar %cd $CODE_DIR/scripts !pip install tfrecord !python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=train

The final command !python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=train gives the following output:

.
.
.
26300
26400
26500
26600
26700
26800
26900
27000
added 27000 items to the LMDB dataset.
Traceback (most recent call last):
  File "convert_tfrecord_to_lmdb.py", line 73, in <module>
    main(args.dataset, args.split, args.tfr_path, args.lmdb_path)
  File "convert_tfrecord_to_lmdb.py", line 58, in main
    print('added %d items to the LMDB dataset.' % count)
lmdb.Error: mdb_txn_commit: Disk quota exceeded

I am not sure I have made the LMDB dataset properly, I request you to guide me.

@KomputerMaster64 hello, I also want to convert the CelebA-HQ 256x256 dataset to the lmdb dataset, however, there is an error message:"AttributeError: 'bytes' object has no attribute 'cpu'" when runs to code:"im = data['data'][0].cpu().numpy()", and if I remove 'cpu' to let the code be "im = data['data'][0].numpy()", it also throws an error:"AttributeError: 'bytes' object has no attribute 'numpy'", so I want to ask you how you resolved this problem? Thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants