You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run this using run.sh and trained a classification model using Spark ML. After training, I wanted to save the model.
I tried model.write().overwrite().save('spark-model'). This creates a spark-model directory but only saves the "_SUCCESS" files in it; no actual model fies were saved.
Then I checked if they are in workers' files and they were in /home/jovyan/work in workers' file system:
When I collect the files into one place and tried to load the model using PipelineModel.load, I get this error:
----> [3](vscode-notebook-cell:/home/emre/etiya/stuff/mongo-spark-jupyter/Untitled.ipynb#Y113sZmlsZQ%3D%3D?line=2) pipeline_model = PipelineModel.load('spark-model')
File [/usr/local/spark/python/pyspark/ml/util.py:332](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:332), in MLReadable.load(cls, path)
329 @classmethod
330 def load(cls, path):
331 """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 332 return cls.read().load(path)
File [/usr/local/spark/python/pyspark/ml/pipeline.py:256](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/pipeline.py:256), in PipelineModelReader.load(self, path)
255 def load(self, path):
--> 256 metadata = DefaultParamsReader.loadMetadata(path, self.sc)
257 if 'language' not in metadata['paramMap'] or metadata['paramMap']['language'] != 'Python':
258 return JavaMLReader(self.cls).load(path)
File [/usr/local/spark/python/pyspark/ml/util.py:525](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:525), in DefaultParamsReader.loadMetadata(path, sc, expectedClassName)
514 """
515 Load metadata saved using :py:meth:`DefaultParamsWriter.saveMetadata`
516
(...)
522 If non empty, this is checked against the loaded metadata.
523 """
524 metadataPath = os.path.join(path, "metadata")
--> 525 metadataStr = sc.textFile(metadataPath, 1).first()
526 loadedVals = DefaultParamsReader._parseMetaData(metadataStr, expectedClassName)
527 return loadedVals
File [/usr/local/spark/python/pyspark/rdd.py:1591](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/rdd.py:1591), in RDD.first(self)
1589 if rs:
1590 return rs[0]
-> 1591 raise ValueError("RDD is empty")
ValueError: RDD is empty
How can I save and load the models without issues? Thanks.
The text was updated successfully, but these errors were encountered:
I run this using
run.sh
and trained a classification model using Spark ML. After training, I wanted to save the model.I tried
model.write().overwrite().save('spark-model')
. This creates a spark-model directory but only saves the "_SUCCESS" files in it; no actual model fies were saved.Then I checked if they are in workers' files and they were in
/home/jovyan/work
in workers' file system:When I collect the files into one place and tried to load the model using
PipelineModel.load
, I get this error:How can I save and load the models without issues? Thanks.
The text was updated successfully, but these errors were encountered: