int64 support for some operations not supported #10

ryx2 · 2019-11-13T15:08:26Z

I have installed all the pip packages in a venv, and when I pip list, everything matches up. I also installed pytorch from source. When I attempt to run

python3 train.py -c /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/ mobilenetv2_test.yaml --log /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log -p /dev/null

INTERFA

CE:
config yaml: /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml
log dir /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log
model path /dev/null
eval only False
No batchnorm False

Commit hash (training version): b'5368eed'

Opening config file /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml
model folder doesnt exist! Start with random weights...
Copying files to /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log for further reference.
Images from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/train/img
Labels from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/train/lbl
Inference batch size: 3
Images from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/valid/img
Labels from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/valid/lbl
Original OS: 32
New OS: 16.0
[Decoder] os: 8 in: 32 skip: 32 out: 32
[Decoder] os: 4 in: 32 skip: 24 out: 24
[Decoder] os: 2 in: 24 skip: 16 out: 16
[Decoder] os: 1 in: 16 skip: 3 out: 16
Using normalized weights as bias for head.

Couldn't load backbone, using random weights. Error: [Errno 20] Not a directory: '/dev/null/backbone'
Couldn't load decoder, using random weights. Error: [Errno 20] Not a directory: '/dev/null/segmentation_decoder'
Couldn't load head, using random weights. Error: [Errno 20] Not a directory: '/dev/null/segmentation_head'
Total number of parameters: 2154794
Total number of parameters requires_grad: 2154794
Param encoder 1812800
Param decoder 341960
Param head 34
Training in device: cuda
/tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/bonnetal/lib/python3.5/site-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
[IOU EVAL] IGNORE: tensor([], dtype=torch.int64)
[IOU EVAL] INCLUDE: tensor([0, 1])
Traceback (most recent call last):
File "train.py", line 118, in
trainer.train()
File "../../tasks/segmentation/modules/trainer.py", line 302, in train
scheduler=self.scheduler)
File "../../tasks/segmentation/modules/trainer.py", line 494, in train_epoch
evaluator.addBatch(output.argmax(dim=1), target)
File "../../tasks/segmentation/modules/ioueval.py", line 42, in addBatch
tuple(idxs), self.ones, accumulate=True)
RuntimeError: "embedding_backward" not implemented for 'Long'

The text was updated successfully, but these errors were encountered:

ryx2 · 2019-11-13T17:23:56Z

I should also include my yaml file:

# training parameters
train:
  loss: "xentropy"       # must be either xentropy or iou
  max_epochs: 300
  max_lr: 0.01           # sgd learning rate max
  min_lr: 0.001          # warmup initial learning rate
  up_epochs: 0.5         # warmup during first XX epochs (can be float)
  down_epochs:  30       # warmdown during second XX epochs  (can be float)
  max_momentum: 0.9      # sgd momentum max when lr is mim
  min_momentum: 0.85     # sgd momentum min when lr is max
  final_decay: 0.995     # learning rate decay per epoch after initial cycle (from min lr)
  w_decay: 0.0005        # weight decay
  batch_size: 5          # batch size
  report_batch: 1        # every x batches, report loss
  report_epoch: 1        # every x epochs, report validation set
  save_summary: False    # Summary of weight histograms for tensorboard
  save_imgs: True        # False doesn't save anything, True saves some 
                         # sample images (one per batch of the last calculated batch)
                         # in log folder
  avg_N: 3               # average the N best models
  crop_prop:
    height: 480
    width: 480

# backbone parameters
backbone:
  name: "mobilenetv2"
  dropout: 0.02
  dropout: 0.02
  bn_d: 0.05
  OS: 16 # output stride
  train: True # train backbone?
  extra:
    width_mult: 1.0
    shallow_feats: True # get features before the last layer (mn2)

decoder:
  name: "aspp_progressive"
  dropout: 0.02
  bn_d: 0.05
  train: True # train decoder?
  extra:
    aspp_channels: 32
    last_channels: 16

# classification head parameters
head:
  name: "segmentation"
  dropout: 0.1

# dataset (to find parser)
dataset:
  name: "persons"
  location: "/tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/"
  workers: 3 # number of threads to get data
  img_means: #rgb
    - 0.46992042
    - 0.45250652
    - 0.42510188
  img_stds: #rgb
    - 0.29184756
    - 0.28221624
    - 0.29719201
  img_prop:
    width: 640
    height: 480
    depth: 3
  labels:
    0: 'background'
    1: 'person'
  labels_w:
    0: 1.0
    1: 1.0
  color_map: # bgr
    0: [0,0,0]
    1: [0,255,0]

Where the imgs and lbl's in that dataset folder are float32's and uint8, respectively

duda1202 · 2020-04-22T17:21:27Z

Hi,

Were you able to resolve this issue? I am having the exact same issue when using my docker but it worked on the bonnetal docker, For both I use the exact same dataset and config files

ryx2 · 2020-04-22T22:34:03Z

@duda1202 i was able to get this to work, it's a versioning problem. I forget which version changes made it work since this was months ago, but i just pasted my pip freeze here.

`Package Version

absl-py 0.8.1
appdirs 1.4.3
astor 0.8.0
backcall 0.1.0
cycler 0.10.0
decorator 4.4.1
gast 0.3.2
genpy 2016.1.3
grpcio 1.25.0
h5py 2.10.0
imageio 2.6.1
imgaug 0.3.0
ipdb 0.12.3
ipython 7.9.0
ipython-genutils 0.2.0
jedi 0.15.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
Mako 1.1.0
Markdown 3.1.1
MarkupSafe 1.1.1
matplotlib 3.0.3
mock 3.0.5
networkx 2.4
numpy 1.17.4
onnx 1.5.0
opencv-python 3.4.0.12
opencv-python-headless 4.1.2.30
parso 0.5.1
pexpect 4.7.0
pickleshare 0.7.5
Pillow 6.0.0
pip 19.3.1
pkg-resources 0.0.0
prompt-toolkit 2.0.10
protobuf 3.10.0
ptyprocess 0.6.0
pycuda 2019.1.2
Pygments 2.5.2
pyparsing 2.4.5
python-dateutil 2.8.1
pytools 2019.1.1
PyWavelets 1.1.1
PyYAML 5.1
scikit-image 0.15.0
scikit-learn 0.20.3
scipy 0.19.1
setuptools 20.7.0
Shapely 1.6.4.post2
six 1.13.0
tensorboard 1.13.1
tensorflow 1.13.1
tensorflow-estimator 1.13.0
termcolor 1.1.0
torch 1.3.1
torchvision 0.4.2
traitlets 4.3.3
typing 3.7.4.1
typing-extensions 3.7.4.1
wcwidth 0.1.7
Werkzeug 0.16.0
wheel 0.33.6 `

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int64 support for some operations not supported #10

int64 support for some operations not supported #10

ryx2 commented Nov 13, 2019 •

edited

Loading

CE:
config yaml: /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml
log dir /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log
model path /dev/null
eval only False
No batchnorm False

Commit hash (training version): b'5368eed'

ryx2 commented Nov 13, 2019

duda1202 commented Apr 22, 2020

ryx2 commented Apr 22, 2020 •

edited

Loading

int64 support for some operations not supported #10

int64 support for some operations not supported #10

Comments

ryx2 commented Nov 13, 2019 • edited Loading

CE: config yaml: /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml log dir /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log model path /dev/null eval only False No batchnorm False

Commit hash (training version): b'5368eed'

ryx2 commented Nov 13, 2019

duda1202 commented Apr 22, 2020

ryx2 commented Apr 22, 2020 • edited Loading

ryx2 commented Nov 13, 2019 •

edited

Loading

CE:
config yaml: /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml
log dir /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log
model path /dev/null
eval only False
No batchnorm False

ryx2 commented Apr 22, 2020 •

edited

Loading