-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data generator with multiple file output #105
Comments
I think this should work, and at runtime the generator will run only once. I suggest to try the latest SNAPSHOT from the master as it fixes some issues related to the setup / teardown logic of systems and dataset materialization. <!--************************************************************************
* Data Generators
*************************************************************************-->
<bean id="datagen.kmeans" class="org.peelframework.flink.beans.job.FlinkJob">
<constructor-arg name="runner" ref="flink-1.0.3"/>
<constructor-arg name="command">
<value><![CDATA[
-v -c org.apache.flink.examples.java.clustering.util.KMeansDataGenerator \
${app.path.datagens}/KMeans.jar \
--points ${datagen.points} \
--k ${datagen.k} \
--output ${system.hadoop-2.path.input}/kmeans
]]>
</value>
</constructor-arg>
</bean>
<!--************************************************************************
* Data Sets
*************************************************************************-->
<bean id="dataset.kmeans.points.generated" class="org.peelframework.core.beans.data.GeneratedDataSet">
<constructor-arg name="src" ref="datagen.kmeans"/>
<constructor-arg name="dst" value="${system.hadoop-2.path.input}/kmeans/points.csv"/>
<constructor-arg name="fs" ref="hdfs-2.7.1"/>
</bean>
<bean id="dataset.kmeans.means.generated" class="org.peelframework.core.beans.data.GeneratedDataSet">
<constructor-arg name="src" ref="datagen.kmeans"/>
<constructor-arg name="dst" value="${system.hadoop-2.path.input}/kmeans/means.csv"/>
<constructor-arg name="fs" ref="hdfs-2.7.1"/>
</bean> |
Thanks for your fast reply!
It seems that the bean cannot used twice. |
Then duplicate the bean definition as well (using the same |
Sorry, my fault! I forgot to change the bean id for each <!--************************************************************************
* Data Generators
*************************************************************************-->
<bean id="datagen.kmeans" class="org.peelframework.flink.beans.job.FlinkJob">
<constructor-arg name="runner" ref="flink-1.0.3"/>
<constructor-arg name="command">
<value><![CDATA[
-v -c org.apache.flink.examples.java.clustering.util.KMeansDataGenerator \
${app.path.apps}/KMeans.jar \
--points ${datagen.points} \
--k ${datagen.k} \
--output ${system.hadoop-2.path.input}/kmeans
]]>
</value>
</constructor-arg>
</bean>
<!--************************************************************************
* Data Sets
*************************************************************************-->
<bean id="dataset.kmeans.points.generated" class="org.peelframework.core.beans.data.GeneratedDataSet">
<constructor-arg name="src" ref="datagen.kmeans"/>
<constructor-arg name="dst" value="${system.hadoop-2.path.input}/kmeans/points:"/>
<constructor-arg name="fs" ref="hdfs-2.7.1"/>
</bean>
<bean id="dataset.kmeans.centers.generated" class="org.peelframework.core.beans.data.GeneratedDataSet">
<constructor-arg name="src" ref="datagen.kmeans"/>
<constructor-arg name="dst" value="${system.hadoop-2.path.input}/kmeans/centers"/>
<constructor-arg name="fs" ref="hdfs-2.7.1"/>
</bean> This is the error message from stdout:
And this is from the log
Have you an idea what could be wrong? Thanks a lot! |
Try
(with an extra |
Actually, can you show me the Java / Scala code that parses the |
Sorry for the delay! We use the KMeans benchmark and the KMeans data generator from the "official" flink examples on GitHub: |
We have a data generator for a KMeans benchmark and want to use it with the PEEL framework.
The generator produces 2 files, points and centers and run as a flink job. We want to save these files in
<hdfs-root-directory >/kmeans
using theGeneratedDataSet
class and then pick these files with the KMeans flink job.My question is: How can we configure PEEL to create the directory
kmeans
in HDFS and then copy the files to that directory? With our current configuration shown below that does not work.The usage of our data generator is similar to the
WordGenetator
except that it produces 2 files instead of just one.Do you have an idea how we could solve this problem with PEEL or do we have to adjust our data generator?
Thanks!
The text was updated successfully, but these errors were encountered: