Description
@hanzhanggit
@taoxugit
please help me, what is the main problem behind this?
(base) H:\StackGAN\StackGAN-Pytorch-master\code>python main.py --cfg cfg/coco_eval.yml --gpu 0
Using config:
{'CONFIG_NAME': 'stageI',
'CUDA': True,
'DATASET_NAME': 'coco',
'DATA_DIR': '../data/coco',
'EMBEDDING_TYPE': 'cnn-rnn',
'GAN': {'CONDITION_DIM': 128, 'DF_DIM': 96, 'GF_DIM': 192, 'R_NUM': 4},
'GPU_ID': '0',
'IMSIZE': 64,
'NET_D': '',
'NET_G': '',
'STAGE': 1,
'STAGE1_G': '',
'TEXT': {'DIMENSION': 1024},
'TRAIN': {'BATCH_SIZE': 128,
'COEFF': {'KL': 2.0},
'DISCRIMINATOR_LR': 0.0002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'LR_DECAY_EPOCH': 20,
'MAX_EPOCH': 120,
'PRETRAINED_EPOCH': 600,
'PRETRAINED_MODEL': '',
'SNAPSHOT_INTERVAL': 10},
'VIS_COUNT': 64,
'WORKERS': 4,
'Z_DIM': 100}
Load filenames from: ../data/coco\train\filenames.pickle (82783)
embeddings: (82783, 5, 1024)
This section is run successfully...
STAGE1_G(
(ca_net): CA_NET(
(fc): Linear(in_features=1024, out_features=256, bias=True)
(relu): ReLU()
)
(fc): Sequential(
(0): Linear(in_features=228, out_features=24576, bias=False)
(1): BatchNorm1d(24576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(upsample1): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(1536, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(upsample2): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(768, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(upsample3): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(384, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(upsample4): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(img): Sequential(
(0): Conv2d(96, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): Tanh()
)
)
STAGE1_D(
(encode_img): Sequential(
(0): Conv2d(3, 96, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(1): LeakyReLU(negative_slope=0.2, inplace)
(2): Conv2d(96, 192, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(3): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): LeakyReLU(negative_slope=0.2, inplace)
(5): Conv2d(192, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(6): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.2, inplace)
(8): Conv2d(384, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(9): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(10): LeakyReLU(negative_slope=0.2, inplace)
)
(get_cond_logits): D_GET_LOGITS(
(outlogits): Sequential(
(0): Conv2d(896, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.2, inplace)
(3): Conv2d(768, 1, kernel_size=(4, 4), stride=(4, 4))
(4): Sigmoid()
)
)
)
Preparing training data...
Traceback (most recent call last):
File "main.py", line 77, in
algo.train(dataloader, cfg.STAGE)
File "H:\StackGAN\StackGAN-Pytorch-master\code\trainer.py", line 158, in train
for i, data in enumerate(data_loader, 0):
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in iter
return _DataLoaderIter(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in init
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OSError: [Errno 22] Invalid argument
Activity
matteobsu commentedon Dec 4, 2018
Same issue. Did you solve it somehow?
AnwarUllahKhan commentedon Dec 4, 2018
@matteobsu No dear it's still error I think it's because of memory space. now I am training the tensorflow version of this.
matteobsu commentedon Dec 5, 2018
So I Think I actually solved it. Try to open:
C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py
and replace
ForkingPickler(file, protocol).dump(obj)
with
ForkingPickler(file, protocol).dumps(obj)
at line 60
ARMkernal commentedon Oct 17, 2019
I changed it to dumps but I got this error:
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda\lib\multiprocessing\spawn.py", line 113, in _main
preparation_data = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda\lib\multiprocessing\spawn.py", line 113, in _main
preparation_data = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
File "D:\Anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 724, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "D:\Anaconda\lib\multiprocessing\queues.py", line 105, in get
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:/Python文件/机器学习/lab06NN/MyNN.py", line 57, in
for i, data in enumerate(trainloader, 0):
File "D:\Anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 804, in next
idx, data = self._get_data()
File "D:\Anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 771, in _get_data
success, data = self._try_get_data()
File "D:\Anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 737, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 7668, 8988) exited unexpectedly
How to solve the runtime error?
zianzheng0806 commentedon Aug 10, 2021
Have you solved the this new error? I have met exactly the same problems with you.
matzl commentedon Jan 25, 2024
Had a similar issue using multiprocessing, in my case I saw the RAM going to 100.0% in the task manager just before the issue appears. I never had this issue on another computer with more RAM, so I guess it is an error related to RAM overflow (or a process that can't wait long enough until something useful is returned from the memory in case the RAM is full).
Workaround: decrease RAM usage (either in your python script or by closing other programs).
MeshkatShB commentedon May 14, 2024
I was able to make it work after a week of trial and error going through the internet and testing each possible way. I did two things. In my case, I was using
LdaMulticore
which is ingensim.models
:from gensim.models import LdaMulticore
My code was:
lda = LdaMulticore(input_text, num_topics=topic_num, id2word=dictionary, passes=1, workers=8)
I solved it by removing
passes=1
andworkers=8
totally which led to my code being:lda = LdaMulticore(input_text, num_topics=topic_num, id2word=dictionary)
as suggested in: https://stackoverflow.com/questions/70218051/oserror-errno-22-invalid-argument-pickle-unpicklingerror-pickle-data-was.
At first, it didn't work. Then, I decided to change my virtual environment. I created a fresh new environment. Added all the libraries that I used (without requirements.txt since I was suspecting maybe it is a dependency thing). Then VOILA! It is now running like a charm!
In conclusion, first, remove the
workers
andpasses
in your code. Secondly, create a fresh new venv (virtual environment) and install all your packages, then try. I hope this works for everyone.@matzl @zianzheng0806 @ARMkernal