-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One read multiple report steps #133
base: main
Are you sure you want to change the base?
One read multiple report steps #133
Conversation
Thanks for the pull request! Good to see that you've found your way through the Lab source code. Regarding the three proposed features:
|
OK, the additional arguments options were only light helpers for users but shell expansions seem well enough. They are in another commit, so not including those changes is easy. Respect the failing tests, I surely missed something, since I only focused on fetchers and reports. However, with a bit more refactoring the My use case is simply running several reports over the same data (absolute report, coverage tables, scatter plots, etc.). I think this use case is not rare and other people could benefit of this performance improvement if this feature is merged. Respect the timing differences, they are so large that cProfile is not needed, a quick look at the log is enough. Below are the cut logs (removing intermediate not useful lines) of executing the following reports script with few reports (sometimes scatter plots are useful in several attributes and comparing among several algorithms for example) in a not really big dataset (30 + 10 algorithms over Autoscale optimal benchmark with only basic attributes), so the timing differences can be much higher. Reports script:import platform
from downward.experiment import FastDownwardExperiment
from downward.reports.absolute import AbsoluteReport
from per_domain_comparison import PerDomainComparison
from dominance_comparison import DominanceComparison
NODE = platform.node()
ATTRIBUTES = [
"coverage",
"error",
"expansions",
"expansions_until_last_jump",
"total_time",
"planner_time",
"search_time",
"cegar_abstractions_init_time",
"initial_h_value",
"cost",
"evaluations",
"memory",
"run_dir",
"found_concrete_solution",
"found_concrete_solution_sum",
]
exp_name = "single_bysplit_comparison"
exp = FastDownwardExperiment(exp_name)
exp.add_fetcher("data/single_bysplit_experiments_only_parse")
exp.add_fetcher("data/single_position_experiments_only_parse")
# Add report step (AbsoluteReport is the standard report).
exp.add_report(AbsoluteReport(attributes=ATTRIBUTES), outfile="report.html")
all_variants = [
"sequence_10M_fw(min_cg)",
"sequence_10M_fw(max_cg)",
"sequence_10M_fw(max_refined)",
"sequence_10M_fw(min_refined)",
"sequence_10M_fw(min_hadd)",
"sequence_10M_fw(max_hadd)",
"sequence_10M_fw(min_cost)",
"sequence_10M_fw(max_cost)",
"sequence_10M_fw(max_ref_goal_dist)",
"sequence_10M_bw(min_cg)",
"sequence_10M_bw(max_cg)",
"sequence_10M_bw(max_refined)",
"sequence_10M_bw(min_refined)",
"sequence_10M_bw(min_hadd)",
"sequence_10M_bw(max_hadd)",
"sequence_10M_bw(min_cost)",
"sequence_10M_bw(max_cost)",
"sequence_10M_bw(max_ref_goal_dist)",
"sequence_10M_bd(min_cg)",
"sequence_10M_bd(max_cg)",
"sequence_10M_bd(max_refined)",
"sequence_10M_bd(min_refined)",
"sequence_10M_bd(min_hadd)",
"sequence_10M_bd(max_hadd)",
"sequence_10M_bd(min_cost)",
"sequence_10M_bd(max_cost)",
"sequence_10M_bd(max_ref_goal_dist)",
]
fw_variants = [
"sequence_10M_fw(min_cg)",
"sequence_10M_fw(max_cg)",
"sequence_10M_fw(max_refined)",
"sequence_10M_fw(min_refined)",
"sequence_10M_fw(min_hadd)",
"sequence_10M_fw(max_hadd)",
"sequence_10M_fw(min_cost)",
"sequence_10M_fw(max_cost)",
"sequence_10M_fw(max_ref_goal_dist)",
]
bw_variants = [
"sequence_10M_bw(min_cg)",
"sequence_10M_bw(max_cg)",
"sequence_10M_bw(max_refined)",
"sequence_10M_bw(min_refined)",
"sequence_10M_bw(min_hadd)",
"sequence_10M_bw(max_hadd)",
"sequence_10M_bw(min_cost)",
"sequence_10M_bw(max_cost)",
"sequence_10M_bw(max_ref_goal_dist)",
]
bd_variants = [
"sequence_10M_bd(min_cg)",
"sequence_10M_bd(max_cg)",
"sequence_10M_bd(max_refined)",
"sequence_10M_bd(min_refined)",
"sequence_10M_bd(min_hadd)",
"sequence_10M_bd(max_hadd)",
"sequence_10M_bd(min_cost)",
"sequence_10M_bd(max_cost)",
"sequence_10M_bd(max_ref_goal_dist)",
]
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_all.txt",
attributes=["coverage"],
filter_algorithm=all_variants,
),
name="perdomain_comparison_all",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_fw.txt",
attributes=["coverage"],
filter_algorithm=fw_variants,
),
name="perdomain_comparison_fw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_bw.txt",
attributes=["coverage"],
filter_algorithm=bw_variants,
),
name="perdomain_comparison_bw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_bd.txt",
attributes=["coverage"],
filter_algorithm=bd_variants,
),
name="perdomain_comparison_bd",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_all.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=all_variants,
),
name="perdomain_comparison_found_concrete_solution_all",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_fw.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=fw_variants,
),
name="perdomain_comparison_found_concrete_solution_fw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_bw.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=bw_variants,
),
name="perdomain_comparison_found_concrete_solution_bw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_bd.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=bd_variants,
),
name="perdomain_comparison_found_concrete_solution_bd",
)
fw_bysplit_variants = [
"CEGAR_10M_fw",
"sequence_10M_fw(max_refined)",
"sequence_10M_fw(min_refined)",
"sequence_10M_fw(max_cost)",
"sequence_10M_fw(min_cost)",
"sequence_10M_fw(max_hadd)",
"sequence_10M_fw(min_hadd)",
]
bw_bysplit_variants = [
"CEGAR_10M_bw",
"sequence_10M_bw(max_refined)",
"sequence_10M_bw(min_refined)",
"sequence_10M_bw(max_cost)",
"sequence_10M_bw(min_cost)",
"sequence_10M_bw(max_hadd)",
"sequence_10M_bw(min_hadd)",
]
bd_bysplit_variants = [
"CEGAR_10M_bd",
"sequence_10M_bd(max_refined)",
"sequence_10M_bd(min_refined)",
"sequence_10M_bd(max_cost)",
"sequence_10M_bd(min_cost)",
"sequence_10M_bd(max_hadd)",
"sequence_10M_bd(min_hadd)",
]
exp.add_report(
PerDomainComparison(
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/perdomain_comparison_table_fw.txt",
filter_algorithm=fw_bysplit_variants,
),
name="perdomain_comparison_table_fw",
)
exp.add_report(
PerDomainComparison(
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/perdomain_comparison_table_bw.txt",
filter_algorithm=bw_bysplit_variants,
),
name="perdomain_comparison_table_bw",
)
exp.add_report(
PerDomainComparison(
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/perdomain_comparison_table_bd.txt",
filter_algorithm=bd_bysplit_variants,
),
name="perdomain_comparison_table_bd",
)
bysplit_variants = fw_bysplit_variants + bw_bysplit_variants + bd_bysplit_variants
grouped_algorithms = (fw_bysplit_variants, bw_bysplit_variants, bd_bysplit_variants)
namecolumn = [
r"\strategydefault",
r"\strategyrefined",
r"\strategyminrefined",
r"\strategycost",
r"\strategymincost",
r"\strategyhadd",
r"\strategyminhadd",
]
exp.add_report(
DominanceComparison(
grouped_algorithms=grouped_algorithms,
namecolumn=namecolumn,
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/dominance_comparison.txt",
filter_algorithm=bysplit_variants,
)
)
# Parse the commandline and show or run experiment steps.
exp.run_steps() Execution log using Downward Lab 8.2 (total time: 23 minutes and 34 seconds):
Execution log using this pull request (total time: 11 minutes and 6 seconds):
Note also that these executions contain the fetcher steps, the timing differences would be even higher without them (1 minute against 13 minutes). Basically the savings in this example are 1 minute per report, and other use cases could have dozens of reports with savings of even several hours. Thanks! |
Thanks for the logs! I agree that it's obvious that loading the JSON files is the bottleneck on your machine: it always takes 1 minute to load ~46K runs. I just measured the same thing for a similar experiment on my machine and loading ~17K runs (90 MB) takes 0.5 seconds. So either your machine is much slower, or the simplejson package is missing in your venv, or your runs just have many more attributes or some attributes that have lots of data. My guess is that it's the latter. Probably you've parsed an output that's written for every evaluated state or similar. Could that be the case? How big is the resulting properties file? If the file simply has lots of data, I recommend preprocessing it so that it only contains the data you're interested in. |
Hi. I have Anyway, I think reading the properties file in each run is not very efficient and following the approach of loading and updating the data in the Another complementary improvement would be using https://github.com/ijl/orjson instead of So this pull request can be closed but I hope maintainers give an opportunity to the idea. |
I had a look at orjson as well and it would be nice to see some time measurements for reading and writing Downward Lab JSON files with orjson vs. simplejson. If it's really that much faster for our use case, we should switch :-) |
OK, I have fixed the issue, the problem was just that I considered only one properties file for all experiments, but several evaluation directories with a different properties file may exist. So, it is solved using a dictionary with eval dirs as keys for the |
Fixes #121.
This pull request stores properties in the
Experiment
object, updating it each time the properties file is loaded or a fetcher is executed. This way, the properties file is read only the needed times and it is always updated.To avoid modifications of the properties data when applying filters, instead of modifying the original object a new dictionary is created with references to the original data. Note that this also saves a lot of time and memory compared to the use of
copy.deepcopy
.This pull request saves tons of minutes when running multiple reports over the same properties file, but note:
Experiment
object reference must be set in reports now. This is not a problem in the highest level API because it is done in theadd_report
function, but when usingadd_step
it must be done manually.Fetcher
constructor now receives aExperiment
object as argument. Again,add_fetcher
does it for the user.In addition, this pull request adds a new option to run only reports and more flexibility for the
steps
parameter, that now admits intervals separated by commas, like:2-5,6,8-10
.I hope you find this pull request useful despite #121 was closed. I think this pull request is simple enough and added value to Downward Lab 😄 .