You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For iterable-style datasets, since each worker process gets a replica of the dataset object, naive multi-process loading will often result in duplicated data. Using torch.utils.data.get_worker_info() and/or worker_init_fn, users may configure
The text was updated successfully, but these errors were encountered:
https://blog.philip-huang.tech/?page=iterable-style-dataset-worker-setting
iterable-style dataset 可以處理巨量訓練資料迭代,但是當使用多個 worker 時,每個 worker 都會有一份相同的資料集副本,PyTorch 需要開發者自己去實現邏輯避免 worker 拿到重複資料。
根據 PyTorch 官方建議,我們可以使用
torch.utils.data.get_worker_info()
進行 worker 配置達到目的。The text was updated successfully, but these errors were encountered: