Implement Linear Regression for hybrid_flink mode #4

akunft · 2016-03-03T10:55:50Z

In order to create the first PR, we should identify the required implementation work for an end-to-end execution of LinregCG / LinregCG and discuss how to split the work here.

It would be best if both of you @fschueler & @carabolic have a look at this (I can only assign one :).

fschueler · 2016-03-03T17:12:59Z

I am looking into the flink and hybrid_flink execution modes, @carabolic is working on the necessary instructions.

fschueler · 2016-03-09T16:21:41Z

I added a test that allows us to run the Linear Regression DML script (direct solver). For hybrid_flink mode, the LinearRegDS.dml script already works (only uses Flink reblock instructions).

I think we should also get the GLM-predict.dml script to run for our end-to-end example.
This script uses (in hybrid_spark mode) a couple more instructions that are not yet implemented in Flink:

MatrixMatrixArithmeticSPInstruction with operator "-"
AggregateUnarySPInstruction with operator "uack+'

The other instructions are MatrixScalarArithmeticInstructions ("*" and "/") that we should already have. We might need to add the ArithmeticInstruction abstraction similar to Spark.

I think getting this to work in hybrid_flink mode should be the first step. We can then add the instruction for pure flink mode.

fschueler · 2016-03-15T15:18:13Z

It turns out that the number of Flink instructions increases significantly during recompilation for the GLM-predict.dml script (matrix-indexing, relationalbinary, ...)

Should we add these or make the PR only for the LinearReg*.dml scripts?

fschueler · 2016-03-15T16:18:59Z

it could actually be a bug that I introduced... I am investigating! 👓

fschueler · 2016-03-15T16:41:41Z

Unfortunately I think it's not a bug, same happens for Spark. So we can either implement all missing instructions for the GLM-predict.dml or have a PR for only the other scripts...

fschueler · 2016-04-12T14:44:00Z

I think this is done and we should focus on testing and cleanup now. One thing that we should resolve for the PR is #15 - when running in hybrid_flink mode on a cluster this will probably be needed.

akunft assigned fschueler and carabolic and unassigned fschueler Mar 3, 2016

fschueler changed the title ~~Initial End-to-End workflow~~ Implement Linear Regression for hybrid_flink mode Mar 9, 2016

fschueler added this to the First PR with simple LR workflow milestone Apr 7, 2016

fschueler closed this as completed Apr 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Linear Regression for hybrid_flink mode #4

Implement Linear Regression for hybrid_flink mode #4

akunft commented Mar 3, 2016

fschueler commented Mar 3, 2016

fschueler commented Mar 9, 2016

fschueler commented Mar 15, 2016

fschueler commented Mar 15, 2016

fschueler commented Mar 15, 2016

fschueler commented Apr 12, 2016

Implement Linear Regression for hybrid_flink mode #4

Implement Linear Regression for hybrid_flink mode #4

Comments

akunft commented Mar 3, 2016

fschueler commented Mar 3, 2016

fschueler commented Mar 9, 2016

fschueler commented Mar 15, 2016

fschueler commented Mar 15, 2016

fschueler commented Mar 15, 2016

fschueler commented Apr 12, 2016