Documentation/Resources.txt

Import of germany.osm (~11G) from http://download.geofabrik.de/osm/.

As one can see the following steps are the relevant, most time-consuming
steps:

* Calculation the objects in areas index. "object in areas" calculation must be
  improved by improved the "point is area" algorithm.
* Generation of ways.dat file. This is time consuming because the nodes of a way
  are resolved. Since data is too big to keep everything in memory and
  on-disk algorithm must be used, repeately scanning data files. Instead of this
  the node index file could be used, but we need some performance tests first.

egrep -h "(^\+ Step|^   =>)" <logfile>

+ Step #1 - Preprocess...
   => 128.146 second(s)
+ Step #2 - Generating 'rawnode.idx'...
   => 32.002 second(s)
+ Step #3 - Generating 'rawway.idx'...
   => 8.849 second(s)
+ Step #4 - Generate 'relations.dat'...
   => 67.382 second(s)
+ Step #5 - Generating 'relation.idx'...
   => 1.530 second(s)
+ Step #6 - Generate 'nodes.dat'...
   => 25.975 second(s)
+ Step #7 - Generating 'node.idx'...
   => 1.052 second(s)
+ Step #8 - Generate 'ways.dat'...
   => 752.376 second(s)
+ Step #9 - Generating 'way.idx'...
   => 17.741 second(s)
+ Step #10 - Generate 'area.idx'...
   => 252.107 second(s)
+ Step #11 - Generate 'areanode.idx'...
   => 5.262 second(s)
+ Step #12 - Generate 'region.dat' and 'nameregion.idx'...
   => 774.346 second(s)
+ Step #13 - Generate 'nodeuse.idx'...
   => 229.044 second(s)
+ Step #14 - Generate 'water.idx'...
   => 29.784 second(s)
   => 2325.608 second(s)

The following files have been generated (du --total -h *.idx *.dat):
67M	area.idx
8,0M	areanode.idx
1,5M	nameregion.idx
3,2M	node.idx
88M	nodeuse.idx
98M	rawnode.idx
15M	rawway.idx
204K	relation.idx
20K	water.idx
15M	way.idx
4,0K	bounding.dat
31M	nodes.dat
730M	rawnodes.dat
22M	rawrels.dat
331M	rawways.dat
21M	region.dat
110M	relations.dat
1,2M	wayblack.dat
635M	ways.dat
2,2G	insgesamt

Of these the following are necessary at application runtime:
67M	area.idx
8,0M	areanode.idx
1,5M	nameregion.idx
3,2M	node.idx
88M	nodeuse.idx
204K	relation.idx
20K	water.idx
15M	way.idx
4,0K	bounding.dat
31M	nodes.dat
21M	region.dat
110M	relations.dat
635M	ways.dat

Where the nodeuse.idx currently is only interesting for the internal routing
solution. However some code change is required to do not make the database
class require it. The water.idx is not finished and will likely increase
in future (not not in a way that it changes the overall calculation
drastically).

So the overall size required on disk for a gemrany map without routing
would be:

du --total -h area.idx areanode.idx bounding.dat nameregion.idx node.idx \
nodes.dat region.dat relation.idx relations.dat water.idx way.idx ways.dat

68M	area.idx
8,5M	areanode.idx
4,0K	bounding.dat
1,5M	nameregion.idx
3,2M	node.idx
31M	nodes.dat
21M	region.dat
216K	relation.idx
114M	relations.dat
20K	water.idx
16M	way.idx
646M	ways.dat
907M	insgesamt

Memory usage:
Open the germany map, search for Bonn, search for Dortmund
(top -b | grep <processname>):

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11244 tim       20   0 1031m  81m  34m S    0  2.7   0:05.38 lt-TravelJinni
11415 tim       20   0  975m  79m  36m S    0  2.6   0:09.19 lt-OSMScout

Most of the memory useage however comes from opening the data files using memory
maped files.

The same, if no memory maped files are used:

 9276 tim       20   0  219m 127m  13m S    0  4.2   0:02.97 lt-TravelJinni

This is the result of the internal statistics dump. Memory usage is rather
high (but was much higher in the past). There is not yet any real measurement
regarding optimal/minimal cache sizes, so possibly by just reducing the cache
sizes some memory can be saved.

Problem is not the in memory cache for nodes, ways/areas and relations but the
various indexes. The currently biggest index is the area node index and should
be next target for optimizations. After that most memory saving can be gained by
improving the NumericIndex that is used for accessing all date files. Possibly
using caching to hold only used index pages in memory is possible here, too.

nodes.dat entries: 330, memory 28000
Index node.idx: 113 entries, memory 4469
ways.dat entries: 1000, memory 121600
Index way.idx: 192 entries, memory 1800
relations.dat entries: 884, memory 79392
Index relation.idx: 268 entries, memory 1868
area.idx entries: 1000, memory 316888
areanode.idx entries: 385, memory 27560
AdminRegion size 77942, locations size 0, memory 2182376
WaterIndex size 18795, memory 18795