Skip to content
This repository was archived by the owner on Dec 1, 2021. It is now read-only.

Time measurement support on build. (#319) #330

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ed728
Copy link
Contributor

@ed728 ed728 commented Jun 19, 2019

Motivation and Context

Debug options should not be default, so I made the time measurement feature a togglable option for the blueoil.sh script. For running make inside of the project, you would have to manually append the CXXFLAGS, so make lm_fpga CXXFLAG="-DFUNC_TIME_MEASUREMENT".

This will be followed by proper user documentation for #319.

Description

-Extra optional option added to blueoil.sh.
-Removed timing as a default for lm_fpga target.
-CXXFLAGS now specified via make argument, rather than environment.
-Timing is now only done when specified, not always by default. Debug options should not be default...

How has this been tested?

Jenkins.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature / Optimization (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@ed728
Copy link
Contributor Author

ed728 commented Jun 19, 2019

I am not too confident about what the automated Jenkins tests cover, so someone please poke me in case it is not sufficient. I did run make test-dlk locally, and no errors popped in the beginning, but I am not sure whether it uses these scripts or not...

Also, as far as I am aware we have no tests in CI which rely on the time measurement feature, so nothing should break.

@ed728
Copy link
Contributor Author

ed728 commented Jun 19, 2019

@iizukak Any suggestions for reviewers?

@ed728
Copy link
Contributor Author

ed728 commented Jun 19, 2019

Also @n-nez, since I expect you guys would be ever so slightly affected by the Makefile change.

@primenumber
Copy link
Contributor

@ed728 make test-dlk uses CMake instead of raw Makefile...

@iizukak
Copy link
Member

iizukak commented Jun 20, 2019

run blueoil test

Copy link
Member

@iizukak iizukak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ed728 looks good except I commented issue.

"--timing",
help="enable operation time measurements",
type=int,
default=0,
Copy link
Member

@iizukak iizukak Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value shoud be False or None not integer.

@iizukak
Copy link
Member

iizukak commented Jun 20, 2019

@ed728 tca_support is already merged. Please change base branch to master .

@ed728
Copy link
Contributor Author

ed728 commented Jun 20, 2019

@primenumber Yeah, Jenkins does cover the normal use case though.

@ed728 ed728 changed the base branch from tca_support to master June 20, 2019 00:37
@n-nez
Copy link
Contributor

n-nez commented Jun 20, 2019

-Timing is now only done when specified, not always by default. Debug options should not be the default...

Actually, the only uses of lm_fpga are

  • testing
  • time measurement

So keeping time measurement enabled by default for the target sounds sensible for me actually. Benefits of switching it off, on the other hand, is not clear. 🤔

@ed728
Copy link
Contributor Author

ed728 commented Jun 20, 2019

Hmmm, I mean this stems from #319 which wants documentation for time measurements. And when looking through our documentation, for people wanting to run on FPGA it suggests running blueoil.sh which eventually invokes make lm_fpga... So any customer or external person who would like to use blueoil with an FPGA would go through that build as I see it.

Also, even for testing and time measuring, being able to disable internal debugs for "external"/"real" time measurements from OS is useful I think (i.e. timing how ./lm_fpga.elf ... takes as a whole with not other internal debug options enabled).

Oooor, I am missing something important about how customers are supposed to use this.

Ref.: https://docs.blue-oil.org/usage/convert.html

@n-nez
Copy link
Contributor

n-nez commented Jun 20, 2019

If "extenal" time measurements imply measurement of the execution time of whole binary itself, they do not make much sense to me, as there are lots of "preparation steps" like loading npy files and Network::init(), so measurements will be rather misleading.

Customers aren't supposed to use lm_fpga.elf at all, it's purely diagnostic/testing tool, to check that dlk conversion works as expected.

Customers supposed to use lib_fpga.so in their applications, for which TIME_MEASUREMENT is disabled, so they could do whatever kind of measurements which suits them if required.

@n-nez
Copy link
Contributor

n-nez commented Jun 20, 2019

As for #319 documentation should say how to

  1. obtain npys from training checkpoint
  2. convert pb file
  3. do make lm_fpga
  4. run lm_fpga.elf with the npys and read the report

any mechanism to enable disable measurements in lm_fpga.elf in this case just increases the number of unnecessary work to get the numbers.

@ed728
Copy link
Contributor Author

ed728 commented Jun 20, 2019

It does make sense to measure all loadings and inits for a program that is meant to be used by customers, I think, but that is a different issue.

And I see....but in that case one should be able to enable it for the lm_fpga.so file I think.

Also, the lm_*.elf targets should either all include the measurement flag in the Makefile, or none of them should, I would say.

@n-nez
Copy link
Contributor

n-nez commented Jun 20, 2019

It does make sense to measure all loadings and inits for a program that is meant to be used by customers

Ok, this is a test program, for developers, customers could run it too if they want, but this is all.

Also, the lm_*.elf targets should either all include the measurement flag in the Makefile, or none of them should, I would say.

All of them should include it, I believe, but so far lm_fpga.elf was the only one target performance of which were relevant to us. I would be very appreciated if you add time measurement flag to all other elf targets we have in this PR, instead of removing it from lm_fpga.elf.

For lm_fpga.so I guess it is doable to enable time measurement for them as well, but the current way we measure time is not suitable for measurements across long runs.

That is it adds a new time interval on each function call into an internal data structure.

  • If you run inference a few times it will provide you with useful statistics, so you can see how much time the function takes to do work on each layer.
  • If you run inference million times which supposed to happen when you use lm_fpga.so the structure will grow unlimited and you will run out of memory at the end.
    • thus using a normal profiler seems to be a more sensible approach in this case.

@ruimashita
Copy link
Contributor

@ed728 Thanks 👍
lm_*.elf are only for test and debug. It is no need flag of enable or disable time measurement of these lm_*.elf, It always should measure time 🤔

@iizukak
Copy link
Member

iizukak commented Jun 21, 2019

#330 (comment)
This @n-nez comment is what I want to do in #319 .
Sorry for confusion.

@ed728
Copy link
Contributor Author

ed728 commented Jun 21, 2019

I will change this to enable it on all lm_*.elf.

On a side note, do we measure the time of things like Network::init() separately?

I still believe it should then be available for lm_*.so as a flag. I mean if it is disabled by default, no one will get an overflow just because of ignorance. And people who do know how to use it, may be able to get some nice data out of it (if we add some reasonable documentation to it).
Also, for this overflow issue, should we make an issue and fix it (eventually)?

@n-nez
Copy link
Contributor

n-nez commented Jun 22, 2019

On a side note, do we measure the time of things like Network::init() separately?

Yes we do. You will see both init time and run time in the report. 😏

Also, for this overflow issue, should we make an issue and fix it (eventually)?

I believe priority of fixing it is quite low at the moment, but adding an issue definitely doesn't hurt.

@ed728 ed728 force-pushed the make_edit branch 3 times, most recently from 73127a9 to b3fba51 Compare July 4, 2019 02:01
@ed728 ed728 mentioned this pull request Jul 9, 2019
5 tasks
@iizukak
Copy link
Member

iizukak commented Jul 10, 2019

@ed728 Can you solve conflict?

@hadusam hadusam added the CI: auto-run Run CI automatically label Jul 12, 2019
@ed728
Copy link
Contributor Author

ed728 commented Jul 18, 2019

@iizukak Yes! Will do that as soon as possible.

@yd8534976 yd8534976 mentioned this pull request Jul 30, 2019
5 tasks
Copy link
Member

@iizukak iizukak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ed728
Sorry for review delaying.
Recent, we added TARGETS_X86_AVX target.
Please modify with new target.

Then I'll merge this PR.

@ed728 ed728 force-pushed the make_edit branch 2 times, most recently from 1f6c2c3 to 388ff34 Compare August 15, 2019 01:59
@ed728
Copy link
Contributor Author

ed728 commented Aug 15, 2019

@iizukak Fixed, I think. Waiting for tests.

@ed728 ed728 force-pushed the make_edit branch 2 times, most recently from ae893a6 to 428bf0f Compare August 15, 2019 07:40
-Extra optional option added to blueoil.sh.
-Added timing as a default for lm_* targets .
-CXXFLAGS now specified via make argument, rather than environment.
-Timing can now optionally be built-in for lib_* targets as well.
@ed728
Copy link
Contributor Author

ed728 commented Sep 12, 2019

@iizukak Any chance this can get approved?

@iizukak
Copy link
Member

iizukak commented Sep 13, 2019

@ed728
Can you answer @ruimashita 's comment?

@iizukak
Copy link
Member

iizukak commented Sep 13, 2019

@ed728
Please check my understand.

You added --timing option for lib_* . And always add TIME_MEASUREMENT option for lm_*. Because lm_* is for debug only and lib_* is not debug only, correct?

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

1 similar comment
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI: auto-run Run CI automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants