Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement managed raster class in C++ #1597

Merged
Merged
Show file tree
Hide file tree
Changes from 83 commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
5e58f60
compare different implementations of upslope neighbors
emlys May 7, 2024
7c51c08
improve efficiency of managedraster helper functions
emlys Jun 15, 2024
a55d57c
work in progress: rewrite managed raster module in c++
emlys Jun 25, 2024
068d886
work in progress: managed raster in C++
emlys Jul 16, 2024
8576088
C++ managed raster implementation now faster on SDR sample data
emlys Jul 17, 2024
7f2ec80
Merge branch 'feature/routing-refactor' into feature/routing-refactor…
emlys Jul 17, 2024
11ef40f
NDR tests passing with new C++ implementation of managedraster
emlys Jul 18, 2024
c7486f2
all tests using managedraster passing locally
emlys Jul 23, 2024
048e769
debugging: find gdal.h
emlys Jul 23, 2024
cccf0e6
debugging: add incldue path to compile args for windows
emlys Jul 23, 2024
8d104a4
debugging
emlys Jul 23, 2024
6ca0f78
debugging'
emlys Jul 23, 2024
61d1f65
debugging
emlys Jul 23, 2024
d3aebac
debugging
emlys Jul 23, 2024
eb15fa1
debugging
emlys Jul 23, 2024
93ec4b9
debugging
emlys Jul 23, 2024
4506414
debugging
emlys Jul 23, 2024
1ff72b6
debugging
emlys Jul 23, 2024
6acaea6
debugging
emlys Jul 23, 2024
9316d20
use C++20; no need to compile managed_raster as a separate extension
emlys Jul 23, 2024
0d4805a
debugging
emlys Jul 23, 2024
53b15d0
debugging
emlys Jul 23, 2024
06c8648
debugging
emlys Jul 23, 2024
1749eb9
debugging
emlys Jul 23, 2024
e4c234f
debugging
emlys Jul 23, 2024
bbd7c5a
debugging
emlys Jul 23, 2024
7bc4b58
debugging
emlys Jul 23, 2024
efc5527
debugging
emlys Jul 23, 2024
474a89a
debugging
emlys Jul 23, 2024
a63532f
debugging
emlys Jul 23, 2024
05aa968
debugging
emlys Jul 23, 2024
81af8a3
debugging
emlys Jul 23, 2024
a0e5810
debugging
emlys Jul 23, 2024
d5bd998
use setuptools library_dirs parameter
emlys Jul 24, 2024
09a6f8e
debugging
emlys Jul 24, 2024
c669180
organizing options in setup.py
emlys Jul 24, 2024
61883f5
swy core: use libcpp.vector rather than allocating an array
emlys Jul 25, 2024
d236af1
swy core: use libcpp.vector rather than allocating an array
emlys Jul 25, 2024
7e12302
use std::string type to store raster path in ManagedRaster
emlys Jul 30, 2024
c664e43
Merge branch 'main' into feature/routing-refactor-compare
emlys Aug 9, 2024
baac039
Merge branch 'main' into feature/routing-refactor-compare
emlys Aug 22, 2024
8cf1e6b
Merge branch 'feature/routing-refactor' into feature/routing-refactor…
emlys Aug 22, 2024
5932fdb
remove extra changed files
emlys Aug 22, 2024
8fe87ec
clean upg
emlys Aug 22, 2024
d7d3b09
now slightly faster than implementation on main
emlys Aug 26, 2024
4e5e717
add library dir for gdal to setup.py
emlys Aug 26, 2024
5f6a647
typo
emlys Aug 26, 2024
8defb6f
another mistake
emlys Aug 26, 2024
6245b6a
add C++ errors for invalid arguments to managedraster
emlys Sep 3, 2024
364d7ff
Update src/natcap/invest/managed_raster/ManagedRaster.h
emlys Oct 15, 2024
386beb2
remove unneeded file
emlys Oct 15, 2024
7b994e2
remove unused enumeration
emlys Oct 15, 2024
f62a696
inline managedraster set and get functions
emlys Oct 15, 2024
8109a60
write C++ error messages out to error stream
emlys Oct 15, 2024
da23f3e
clean up lru_cache and actualBlockWidths in managedraster.close
emlys Oct 23, 2024
31565e8
inline is_close function
emlys Oct 24, 2024
8c31a12
use gdal-config to get library path on macos
emlys Oct 24, 2024
b88567c
separate C++ classes for each variant of neighbor pixel iteration
emlys Oct 24, 2024
36c26b2
drafting a true C++ iterator
emlys Oct 25, 2024
fc8dd38
implement true C++ iterator for downslope neighbors and use in SDR
emlys Oct 29, 2024
e6a370e
using downslope iterators everywhere
emlys Oct 29, 2024
2cf8f5d
use true iterators for upslope neighbors
emlys Oct 29, 2024
ba37a0a
Merge branch 'feature/routing-refactor' into feature/routing-refactor…
emlys Oct 29, 2024
b957881
debugging
emlys Oct 29, 2024
989a7b6
debugging
emlys Oct 29, 2024
b8e710f
cdef flow_dir_sum
emlys Oct 29, 2024
b532001
debugging
emlys Oct 29, 2024
9421e2c
debugging
emlys Oct 29, 2024
9981569
debugging
emlys Oct 29, 2024
6d556e2
debugging
emlys Oct 31, 2024
c7742e8
debugging
emlys Oct 31, 2024
7b9db1f
debugging
emlys Oct 31, 2024
9071362
debugging
emlys Oct 31, 2024
ba9cebe
debugging
emlys Oct 31, 2024
13cc0dc
debugging
emlys Oct 31, 2024
7a767f5
use const reference parameters to equality fns
emlys Oct 31, 2024
5b8c04e
inherit from Neighbors class
emlys Oct 31, 2024
947900c
inherit operator funcs
emlys Nov 1, 2024
f737225
inherit equality functions from parent class
emlys Nov 1, 2024
3a623eb
run all tests
emlys Nov 1, 2024
420145f
inherit increment operators from parent class
emlys Nov 1, 2024
dde78df
call base class constructors where possible
emlys Nov 1, 2024
11cc0ef
setup.py: use same options for linux as for mac
emlys Nov 5, 2024
25c954a
remove extra comments
emlys Nov 22, 2024
d50a2c9
downgrade compiler version to c++17
emlys Dec 9, 2024
3adae2d
read gdal lib path from env variable on windows
emlys Dec 10, 2024
24d1028
set NATCAP_INVEST_GDAL_LIB_PATH in github actions
emlys Dec 10, 2024
b3d2a55
bump c++ version back to 20
emlys Dec 10, 2024
45ec188
set NATCAP_INVEST_GDAL_LIB_PATH in one more place
emlys Dec 10, 2024
9b95796
temporarily pin pytest-subtests version
emlys Dec 10, 2024
3c9b795
Merge branch 'feature/routing-refactor' into feature/routing-refactor…
emlys Dec 10, 2024
07059b0
remove stdlib flag from compile command; default to libstdc++
emlys Dec 10, 2024
041cd1a
remove managed_raster pyx file
emlys Dec 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,18 @@
# access to a pre-Mavericks mac, so hopefully this won't break on someone's
# older system. Tested and it works on Mac OSX Catalina.
compiler_and_linker_args = []
if platform.system() == 'Darwin':
compiler_and_linker_args = ['-stdlib=libc++']

include_dirs = [numpy.get_include(), 'src/natcap/invest/managed_raster']
if platform.system() == 'Windows':
compiler_args = ['/std:c++20']
library_dirs = [f'{os.environ["CONDA_PREFIX"]}/Library/lib']
include_dirs.append(f'{os.environ["CONDA_PREFIX"]}/Library/include')
else:
library_dirs = []
compiler_args = []
compiler_and_linker_args = ['-stdlib=libc++', '-std=c++20']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is C++20 truly the minimum language version required? I don't think this should be a huge deal since compilers seem to pretty rapidly adopt the new standards, so I think this is mostly a question of curiosity.

Copy link
Member Author

@emlys emlys Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I think I used some C++20 features during development, but now only need C++17. (It still compiles with 11, but I get warnings that the inline variables are a C++17 feature.) Updated in d50a2c9 C++20 is needed on the Windows build only, for some reason, otherwise there are a lot of syntax errors, e.g. https://github.com/natcap/invest/actions/runs/12245867222/job/34160661279

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Thanks for looking into it.

If I had to guess, maybe the lack of an --ffp-contract=off flag (needed for addressing a compilation bug in viewshed) is causing the rest of the errors? If (big if) that's the only thing standing in the way, then the viewshed tests would fail, which isn't worth it if all we need is a recent enough version of the C++ standard.

library_dirs = [subprocess.run(
['gdal-config', '--libs'], capture_output=True, text=True
).stdout.split()[0][2:]] # get the first argument which is the library path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are now building directly against gdal, should we add the new build dependency to pyproject.toml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... the libraries are a dependency of the build, but the gdal python bindings themselves aren't necessary. I'm not sure there's any standardized way to document that kind of dependency?


class build_py(_build_py):
"""Command to compile translation message catalogs before building."""
Expand Down Expand Up @@ -50,13 +59,14 @@ def run(self):
Extension(
name=f'natcap.invest.{package}.{module}',
sources=[f'src/natcap/invest/{package}/{module}.pyx'],
include_dirs=[numpy.get_include()] + ['src/natcap/invest/managed_raster'],
extra_compile_args=compiler_args + compiler_and_linker_args,
include_dirs=include_dirs,
extra_compile_args=compiler_args + package_compiler_args + compiler_and_linker_args,
extra_link_args=compiler_and_linker_args,
language='c++',
libraries=['gdal'],
library_dirs=library_dirs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe GDAL also has a self-configuration utility that might be able to help us find the location of its include directory: https://gdal.org/en/latest/programs/gdal-config.html

I don't know that this will help us out on Windows, but it might at least be nicer on linux and mac.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh neat! Thanks for pointing that out. I'm not sure what the solution would be for Windows, to avoid relying on conda - possibly we'd need to make the GDAL lib location configurable as an env variable

define_macros=[("NPY_NO_DEPRECATED_API", "NPY_1_7_API_VERSION")]
) for package, module, compiler_args in [
('managed_raster', 'managed_raster', []),
) for package, module, package_compiler_args in [
('delineateit', 'delineateit_core', []),
('recreation', 'out_of_core_quadtree', []),
# clang-14 defaults to -ffp-contract=on, which causes the
Expand Down
118 changes: 62 additions & 56 deletions src/natcap/invest/managed_raster/LRUCache.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,66 +8,72 @@
using namespace std;

template <class KEY_T, class VAL_T,
typename ListIter = typename list< pair<KEY_T,VAL_T> >::iterator,
typename MapIter = typename map<KEY_T, ListIter>::iterator > class LRUCache{
private:
// item_list keeps track of the order of which elements have been accessed
// element at begin is most recent, element at end is least recent.
// first element in the pair is its key while the second is the element
list< pair<KEY_T,VAL_T> > item_list;
// item_map maps an element's key to its location in the `item_list`
// used to make lookups O(log n) time
map<KEY_T, ListIter> item_map;
size_t cache_size;
private:
void clean(list< pair<KEY_T, VAL_T> > &removed_value_list){
while(item_map.size()>cache_size){
ListIter last_it = item_list.end(); last_it --;
removed_value_list.push_back(
make_pair(last_it->first, last_it->second));
item_map.erase(last_it->first);
item_list.pop_back();
}
};
public:
LRUCache(int cache_size_):cache_size(cache_size_){
;
};
typename ListIter = typename list<pair<KEY_T,VAL_T>>::iterator,
typename MapIter = typename map<KEY_T, ListIter>::iterator> class LRUCache {
private:
// item_list keeps track of the order of which elements have been accessed
// element at begin is most recent, element at end is least recent.
// first element in the pair is its key while the second is the element
list<pair<KEY_T,VAL_T>> item_list;
// item_map maps an element's key to its location in the `item_list`
// used to make lookups O(log n) time
map<KEY_T, ListIter> item_map;
size_t cache_size;

ListIter begin() {
return item_list.begin();
void clean(list<pair<KEY_T, VAL_T>> &removed_value_list) {
while(item_map.size() > cache_size) {
ListIter last_it = item_list.end();
last_it--;
removed_value_list.push_back(
make_pair(last_it->first, last_it->second));
item_map.erase(last_it->first);
item_list.pop_back();
}
};
public:
LRUCache(int cache_size_):cache_size(cache_size_) {
;
};

ListIter end() {
return item_list.end();
ListIter begin() {
return item_list.begin();
}

ListIter end() {
return item_list.end();
}

// Insert a new key-value pair into the cache.
void put(
const KEY_T &key, const VAL_T &val,
list<pair<KEY_T, VAL_T>> &removed_value_list) {
MapIter it = item_map.find(key);
if(it != item_map.end()){
// it's already in the cache, delete the location in the item
// list and in the lookup map
item_list.erase(it->second);
item_map.erase(it);
}
// insert a new item in the front since it's most recently used
item_list.push_front(make_pair(key, val));
// record its iterator in the map
item_map.insert(make_pair(key, item_list.begin()));
// possibly remove any elements that have exceeded the cache size
return clean(removed_value_list);
};

// Return whether a key exists in the cache.
bool exist(const KEY_T &key) {
return (item_map.count(key) > 0);
};

void put(
const KEY_T &key, const VAL_T &val,
list< pair<KEY_T, VAL_T> > &removed_value_list) {
MapIter it = item_map.find(key);
if(it != item_map.end()){
// it's already in the cache, delete the location in the item
// list and in the lookup map
item_list.erase(it->second);
item_map.erase(it);
}
// insert a new item in the front since it's most recently used
item_list.push_front(make_pair(key,val));
// record its iterator in the map
item_map.insert(make_pair(key, item_list.begin()));
// possibly remove any elements that have exceeded the cache size
return clean(removed_value_list);
};
bool exist(const KEY_T &key){
return (item_map.count(key)>0);
};
VAL_T& get(const KEY_T &key){
MapIter it = item_map.find(key);
assert(it!=item_map.end());
// move the element to the front of the list
item_list.splice(item_list.begin(), item_list, it->second);
return it->second->second;
};
// Return the cached value associated with a key.
VAL_T& get(const KEY_T &key) {
MapIter it = item_map.find(key);
assert(it != item_map.end());
// move the element to the front of the list
item_list.splice(item_list.begin(), item_list, it->second);
return it->second->second;
};
};
#endif
Loading
Loading