Skip to content

Develop a machine learning model capable of predicting the outcome of flood events based on their initial conditions. Course Machine Learning attended at Politecnico di Milano

Notifications You must be signed in to change notification settings

IreneCaridi/flooding_urban_area

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

In Houston, TX, we are facing a challenge with flooding in an urban area. To tackle this issue, we are looking to develop a machine learning model capable of predicting the outcome of flood events based on their initial conditions. We have a collection of 3,000 simulated flood incidents (training set), each serving as a historical data point to train our model. The assignment's goal is to build a data-driven model to capture the simulator's logic that generated these incidents and to replicate its predictions. Each simulated incident is unique, starting with different initial conditions, leading to varied outcomes. However, the geographical layout, including the street network and the location coordinates of each street segment, remains unchanged throughout all simulations. This consistent geographic data is available in the ‘edge_info.csv’ file. Additionally, specific parameters that influence how the flood unfolds in each simulation are set at the beginning and differ from one incident to another. These parameters are detailed in the ‘training_parameters.csv’ and ‘test_parameters.csv’ files for the training and testing datasets, respectively. Some basic definitions:

  • Nodes: Intersections or endpoints of streets. Each node has a unique 9-digit identifier. Like: 152356047
  • Edges: Street segments linking two nodes, defined by 'head_id' and 'tail_id'.

The selected urban area is composed of 191 edges.

DATASET OVERVIEW

You received 3,000 observations, each detailing the initial and final states of separate observations. These observations represent flood progression in a hypothetical urban layout, determined by street connectivity, elevation, and infrastructure. Simulation parameters for each observation are retrievable from ./training_parameters.csv and ./test_parameters.csv. These parameters include:

  • ObservationIndex: an identifier for the observation.
  • SurfaceType: Type of urban surface.
  • Waterflow: Intensity and duration of water flow.
  • InitialWaterLevels: Pre-simulation underground water level.
  • DrainageSystemCapacity: Indicator of drainage efficiency.
  • GreenSpaceRatio: Proportion of greenery in the urban area.

Within the ./training and ./test directories, CSV files correspond to the observations. Each file, named after the observation index, records various edges with attributes:

  • head_id: ID of the head node.
  • tail_id: ID of the tail node.
  • flooded_init: Whether the edge was initially flooded.
  • flooded_final: Whether the edge was flooded after the simulation (this is the target variable!).

Extra information about the edges can be found inside edge_info.csv:

  • longitude: Longitude of the edge's center.
  • latitude: Latitude of the edge's center.
  • altitude: the elevation of the edge’s center.

OBJECTIVE

Your task is to devise a predictive model that uses this data to predict the flooded_final column for each sample in the test set (./test).

About

Develop a machine learning model capable of predicting the outcome of flood events based on their initial conditions. Course Machine Learning attended at Politecnico di Milano

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published