Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Linear Regression for hybrid_flink mode #4

Closed
akunft opened this issue Mar 3, 2016 · 6 comments
Closed

Implement Linear Regression for hybrid_flink mode #4

akunft opened this issue Mar 3, 2016 · 6 comments
Assignees

Comments

@akunft
Copy link

akunft commented Mar 3, 2016

In order to create the first PR, we should identify the required implementation work for an end-to-end execution of LinregCG / LinregCG and discuss how to split the work here.

It would be best if both of you @fschueler & @carabolic have a look at this (I can only assign one :).

@akunft akunft assigned fschueler and carabolic and unassigned fschueler Mar 3, 2016
@fschueler
Copy link

I am looking into the flink and hybrid_flink execution modes, @carabolic is working on the necessary instructions.

@fschueler fschueler changed the title Initial End-to-End workflow Implement Linear Regression for hybrid_flink mode Mar 9, 2016
@fschueler
Copy link

I added a test that allows us to run the Linear Regression DML script (direct solver). For hybrid_flink mode, the LinearRegDS.dml script already works (only uses Flink reblock instructions).

I think we should also get the GLM-predict.dml script to run for our end-to-end example.
This script uses (in hybrid_spark mode) a couple more instructions that are not yet implemented in Flink:

  • MatrixMatrixArithmeticSPInstruction with operator "-"
  • AggregateUnarySPInstruction with operator "uack+'

The other instructions are MatrixScalarArithmeticInstructions ("*" and "/") that we should already have. We might need to add the ArithmeticInstruction abstraction similar to Spark.

I think getting this to work in hybrid_flink mode should be the first step. We can then add the instruction for pure flink mode.

@fschueler
Copy link

It turns out that the number of Flink instructions increases significantly during recompilation for the GLM-predict.dml script (matrix-indexing, relationalbinary, ...)

Should we add these or make the PR only for the LinearReg*.dml scripts?

@fschueler
Copy link

it could actually be a bug that I introduced... I am investigating! 👓

@fschueler
Copy link

Unfortunately I think it's not a bug, same happens for Spark. So we can either implement all missing instructions for the GLM-predict.dml or have a PR for only the other scripts...

@fschueler
Copy link

I think this is done and we should focus on testing and cleanup now. One thing that we should resolve for the PR is #15 - when running in hybrid_flink mode on a cluster this will probably be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants