Skip to content

A program that reduces the number of Manhattan roads in New Jork City while optimizing various characteristics.

License

Notifications You must be signed in to change notification settings

wadiim/manhattan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manhattan

A program that reduces the number of Manhattan roads in New Jork City while optimizing various characteristics, such as the total length of routes. It uses heuristic algorythm, so the solutions are not necessarily globally optimal.

Prerequisites

Running

There are two ways to run the program. The simpler way (in terms of the number of commands that need to be entered) is to execute the make command in the project's root directory:

$ make [mode]

The mode can be either cli or gui. In case no mode is given, the program will run in cli mode. This command (when run the first time) downloads the datasets, preprocess them, and runs the program, including building the cache. It may take significant amount of time, but the subsequent runs will be much faster. The downside of this way of running the program is that there is no way of specifying the algorithm's parameters, such as iteration count, weights, etc.

Another way is to execute the following commands:

$ ./getdata.sh
$ python -m venv .venv
$ . .venv/bin/activate # platform dependent
$ pip install -r requirements.txt
$ python -m manhattan [options]

The third command is dependent on the operating system and the system shell. For more information, visit https://docs.python.org/3/library/venv.html#how-venvs-work. A list of available options with their descriptions is available by executing the python -m manhattan --help command.

Program details

Datasets

The following datasets are used by the program:

  • CSCL: A dataset containing information about New York City's road infrastrukture, such as location and shape of each street, traffic directions, and road types. Although this dataset covers the road infrastrukture of all of New York City, only roads located in the Manhattan district are considered (the rest is filtered out in the preprocessing step).
  • YTTD: A dataset generated from taxi trips, including information on e.g. the pick-up and drop-off locations and the length of each route. The pick-up and drop-off locations are given as taxi zone identifiers. The records do not contain information about the exact routes of the trips, i.e. what roads were traversed during a given trip.
  • TZ and TZL: Datasets containing information on taxi zones. These zones are disjoint and allow to unambiguously determine the belonging of any geographic point to each of them. Not every geographic point lying within New York City boundries belongs to any of the zones, e.g. bridges connecting Manhattan with other districts of New York are not covered by them.

Data extraction

The data extraction process involves extracting from the datasets only the information that is needed to generate solutions. All the resulting data is saved in the CSV format.

In case of the CSCL dataset, data on the shape of each road, its width, and the traffic direction are extracted. In addition, records for roads that are not within Manhattan, have not yet been build, are impassable, or are of a type other than street, highway, bridge, tunnel, driveway, or U-turn are removed.

From the YTTD dataset, routes whose start or end location is outside the Manhattan borough are filtered out, as well as those where the two locations (specified by taxi zone IDs) are the same. Data such as tip amount or payment type are also filtered out.

Taxi zone identifiers, names, and geometries are extracted from the TZ and TZL datasets. Data on zones outside the Manhattan borough are ignored.

Road infrastructure model generation

Based on the CSCL, TZ, and TZL datasets, a directed graph is generated, representing the road infrastructure of Manhattan. The vertices in this graph represent points of the roads and contain information about the road identifier, geographic coordinates, and the taxi zone to which the point belongs (if any). When two or more points from the CSCL dataset have the same coordinates, they are mapped to the same vertex, in effect creating a representation of an intersection. Such vertices are assigned to multiple roads simultaneously. Edges represent road segments. An edge belongs to the road to which both of its vertices belong.

Visualization of the graph that is a model of the Manhattan road infrastructure.

Road infrastructure complexity reduction

The graph generated in the previous step contains a redundant number of vertices, which is due, among other things, to the fact that the CSCL dataset maps the shape of the roads and not just the connections between them, which is unnecessary for this program. In order to reduce the time and memory complexity without significantly affecting the results, the number of vertices is reduced in a way that meets the following requirements:

  1. The length of any length-optimal taxi route does not change by more than half the sum of the length of the start road (i.e., the road containing the edge connecting the start vertex to the next vertex of the route) and the end road (i.e., the road containing the edge connecting the second-to-last route's vertex to the end vertex).
  2. The sequence of taxi zones, which include consecutive segments (i.e., sequences of consecutive vertices from a specific route) of any length-optimal route, does not change. This means that a taxi following the length-optimal route between vertices A and B, after the reduction process (assuming that the vertices were not removed during the process), passes through the same zones in the same order as if it had followed the length-optimal route between the same vertices before the reduction process. If either the starting or ending vertex has been removed, the new starting or ending vertex becomes the vertex closest to the removed vertex among those belonging to the same zone.
  3. The sequence of roads containing successive segments of any of the length-optimal routes does not change. This means that a taxi following the optimal route between vertices A and B after the reduction process passes through the same roads in the same order as if it had followed the optimal route between the same vertices before the reduction process.
  4. Removed vertices and their nearest neighbours always belong to the same road (and only to it) and the same zone, which guarantees that if the start or end vertex of a certain route has been removed, the updated route still starts and ends on the same roads. This prevents the removal of all vertices of a certain road and the removal of vertices that constitute intersections. This applies only to those vertices that can be part of the optimal route.

As a result of this process, the number of vertices decreases from 20617 to 10516 (a decrease of 99.05%). In the above description of the requirements, by length-optimal route is meant all optimal routes that can exist, not just the routes from the data set used.

Visualization of a fragment of the infrastructure model graph showing roads having a shape that is not a straight line.

Vertice collapsion

The collapsing process involves replacing groups of vertices with single vertices and updating edges accordingly, without violating complexity reduction requirements. As a result of this process, the total length of all edges does not change (not including inaccuracies due to the limited precision of floating point numbers). The vertex groups that can be collapsed are called collapsable groups. For a vertex group to be collapsable, the following criteria must be met:

  1. There must exist a valid route that contains all the vertices belonging to a given group, while not additionally containing any vertex that does not belong to that group. This ensures that, logically, a given group represents a continuous road section.
  2. All vertices in the group must have exactly two neighbours. This condition guarantees that none of the vertices in the group is an intersection representation.
  3. All vertices in the group must belong to the same road and taxi zone.
  4. All vertices in the group must have the same number of edges directed to neighbouring vertices (which do not have to belong to a given group), and this number can be either one or two.
  5. If each vertex in the group has one edge directed to any of its neighbouring vertices, then the neighbouring vertex cannot have an edge directed to that vertex. If condition 1 is additionally satisfied, this ensures that the group, along with its outer neighbours, represents a one-way road segment.
  6. If each vertex in the group has two edges facing neighbouring vertices, then each of the neighbours of any of these vertices must be connected by exactly one edge facing that vertex. If condition 1 is additionally satisfied, this ensures that the group, along with its outer neighbours, represents a two-way road segment.

Criteria 1, 2, and 3 ensure that vertice collapsion does not violate complexity reduction conditions, while criteria 4, 5, and 6 ensure that the collapsed vertices and their outer neighbours form a valid road. A valid road is a sequence of vertices in which the count and the directionality of the edges between each pair of directly consecutive vertices is the same, which means that a road can be either bidirectional or unidirectional, with the direction remaining the same throughout the whole segment in the case of a unidirectional road.

The group is collapsed in such a way that all vertices in the group along with the edges to which they are connected (regardless of the directionality of these edges) are removed, after which a new vertex and edges of the same total length are inserted, connected to the new vertex and those vertices outside the group that were adjacent to any of the removed vertices, without changing the number or directionality of the edges of the vertices outside the group.

Visualization of the effect of the vertex collapsion process on a fragment of the infrastructure model.

Removal of isolated vertices

In this step, isolated groups of vertices are removed. Isolated group is a group of vertices in which no vertex is connected to any vertex outside the group and the cardinality of the group does not exceed a certain fixed value. In the program, 128 was taken as the maximum count of an isolated group. Edges whose vertices belong to a given group are also removed in this step.

Removal of dendrites

Redundancy in the number of vertices is also caused by dendrites, i.e., unisolated groups of connected vertices in which no vertex can be part of any length-optimal taxi route.

A group cannot be part of an optimal route if none of the vertices belong to any taxi zone (which ensures that none of the vertices in the group is a start or end location) and one of the following criteria is met:

  • A single inner vertex (i.e., a vertex belonging to a given group) is connected to exactly one outer vertex (i.e., a vertex not belonging to a given group). All other inner vertices are connected only to other inner vertices. The outer vertex does not have to belong to the taxi zone. This criterion guarantees that if there is a certain taxi route to which a certain subset of vertices from a given group belong, the route has a start or end location that is one of the vertices of the group, or the route is not length-optimal (because the outer vertex would occur twice in the route). A group that meets the given criterion is an alpha group. The figure below shows an alpha dendrite. The inner vertices and edges are shown in black, while the rest are shown in grey.

  • Many inner vertices are connected to outer vertices, with all edges being incoming edges, i.e. edges directed to the inner vertex. This guarantees that if there is a certain taxi route that includes a certain subset of vertices from a given group, the end location of the route must be one of the vertices from the group. The group that meets the criterion is the beta group. The figure below shows a beta dendrite. The inner vertices and edges are shown in black, while the rest are shown in grey.

  • Many inner vertices are connected to outer vertices, with all edges being outgoing edges, i.e. edges directed to the outer vertex. This guarantees that if there is a certain taxi route to which a subset of the vertices in a given group belong, the starting location of that route must be one of the vertices in the group. The group that satisfies the given criterion is the gamma group. The figure below shows a gamma dendrite. The inner vertices and edges are shown in black, while the rest are shown in grey.

Dendrites are identified and removed in a greedy manner. Adding any external vertex to a group classified as a dendrite would make that group no longer a dendrite.

Traffic modeling

Based on the created infrastructure model and data from the YTTD dataset, interpolation of taxi routes is carried out, which involves determining a specific route. The starting and ending vertices are selected so that they are in the starting and ending taxi zones, respectively, and so that the length of the shortest route between them is as close as possible to that in the dataset. The result of the interpolation is the shortest route between the start and end vertices determined by the Dijkstra algorithm.

Quality measure

The quality of a particular solution is the degree to which its cost is close to zero. The cost of a solution is calculated as follows:

$$\text{cost} = w_{L} \cdot L_N + w_{A} \cdot A_N + w_{D} \cdot D_N + w_{R} \cdot R_N\text{,}$$

where

  • $L$ - total road length in miles;
  • $A$ - total road area in square miles;
  • $D$ - average route length in miles;
  • $R$ - average road load;
  • $L_N$, $A_N$, $D_N$, $R_N$ - normalized values of the parameters $L$, $A$, $D$, and $R$ respectively;
  • $w_{L}$, $w_{A}$, $w_{D}$, $w_{R}$ - parameter weights, respectively: $L_N$, $A_N$, $D_N$ and $R_N$.

Road load is the ratio of the number of routes containing any vertex of a given road to the number of all roads.

The values of the weights are user-defined, and their sum must be $1.0$ ($\pm 10^{-8}$). The values of the $L_N$ and $A_N$ parameters are normalized by dividing them by their initial values, i.e., values calculated for the state before the start of optimization. In the case of the $D_N$ and $R_N$ parameters, the resulting quotient is additionally substracted from 1 and the result is clipped to a range from 0 to 1. Clipping is neccessary because if the road that is being removed contains start or end location of some route, that location must be updated, which can reduce the length of that route and the number of vertices it consists of, which in turn reduces the values of the $D_N$ and $R_N$ parameters by a negligible amount, contrary to the assumed trend.

The described normalization in mathematical notation has the following form:

$$\begin{align} L_N &= \frac{\sum_i l_i}{L_0} \\\ A_N &= \frac{\sum_i a_i}{A_0} \\\ D_N &= \mathrm{clamp}\left(1 - \frac{\sum_i d_i}{D_0}\right) \\\ R_N &= \mathrm{clamp}\left(1 - \frac{\sum_i r_i}{R_0}\right)\text{,} \end{align}$$

where:

  • $l_i$ - length of the $i$th road;
  • $a_i$ - area of the $i$th road;
  • $d_i$ - length of the $i$th route;
  • $r_i$ - load of the $i$th road;
  • $L_0$, $A_0$, $D_0$, $R_0$ - initial values of the parameters $L$, $A$, $D$, and $R$ respectively; and
$$\mathrm{clamp}(x) = \begin{cases} \\\ 0 & \text{if } x < 0 \\\ 1 & \text{if } x > 1 \\\ x & \text{if } x \in [0, 1] \end{cases}$$

Road infrastructure optimization

Road network optimization involves iteratively reducing the number of roads at the highest possible drop in cost. In addition, the optimization proceeds in a way that reduces the likelihood of eliminating all accesses to any of the buildings. This is accomplished by not including in the selection of roads for removal those roads that contained a vertex in common with any of the roads removed in previous iterations.

Each iteration of the algorithm consists of the following steps:

  1. A random selection of a certain user-determined number of roads. The selection is carried out in a way that excludes the appearance of duplicates. Roads containing a vertex that is part of any of the roads removed in previous iterations are not taken into account.
  2. Cost calculation of removing each of the selected roads separately. This requires re-interpolating taxi routes that contain a section of the road which removal cost is being calculated. In the case that either the starting or ending vertex is part of that road, a new starting or ending vertex is determined, respectively, that belongs to the same taxi zone and is closest to the vertex being replaced. Once the end vertices are determined, the shortest (in terms of length) route between them is determined, replacing the previous route. If it becomes impossible to update any of the routes after removing a particular road, the calculation of the cost of its removal is skipped and the road is removed from the pool of considered roads (it will not be considered in subsequent iterations). The operations performed in this step are temporary, i.e. they do not modify the original structures such as the graph or taxi trips.
  3. Removal of the road that gives the least-cost solution. All vertices that belong to a given road and only that road are removed. The vertices that represent the intersection of a given road with another are not removed. In addition, those edges whose at least one vertex is removed are eliminated.
  4. Taxi trips update that replaces the original routes with those determined in step 2 for the road removed in the previous step.
  5. Removal of roads containing a vertex that belonged to the removed road from the pool of considered roads.

License

MIT

About

A program that reduces the number of Manhattan roads in New Jork City while optimizing various characteristics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published