diff --git a/code/python_foundation/13_creating_spatial_data.ipynb b/code/python_foundation/13_creating_spatial_data.ipynb index 7f892e56..2abc74a8 100644 --- a/code/python_foundation/13_creating_spatial_data.ipynb +++ b/code/python_foundation/13_creating_spatial_data.ipynb @@ -1,251 +1,251 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "C-1GwDd2PuuM" - }, - "source": [ - "# Creating Spatial Data\n", - "\n", - "A common operation in spatial analysis is to take non-spatial data, such as CSV files, and creating a spatial dataset from it using coordinate information contained in the file. GeoPandas provides a convenient way to take data from a delimited-text file, create geometry and write the results as a spatial dataset.\n", - "\n", - "We will read a tab-delimited file of places, filter it to a feature class, create a GeoDataFrame and export it as a GeoPackage file.\n", - "\n", - "![](https://github.com/spatialthoughts/python-foundation-web/blob/master/images/python_foundation/geonames_mountains.png?raw=1)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e9RmLSUPPuuN" - }, - "outputs": [], - "source": [ - "import os\n", - "import pandas as pd\n", - "import geopandas as gpd" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "jU6wKHEQPuuO" - }, - "outputs": [], - "source": [ - "data_pkg_path = 'data/geonames/'\n", - "filename = 'US.txt'\n", - "path = os.path.join(data_pkg_path, filename)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_aBjuXsNPuuO" - }, - "source": [ - "## Reading Tab-Delimited Files\n", - "\n", - "The source data comes from [GeoNames](https://en.wikipedia.org/wiki/GeoNames) - a free and open database of geographic names of the world. It is a huge database containing millions of records per country. The data is distributed as country-level text files in a tab-delimited format. The files do not contain a header row with column names, so we need to specify them when reading the data. The data format is described in detail on the [Data Export](https://www.geonames.org/export/) page.\n", - "\n", - "We specify the separator as **\\\\t** (tab) as an argument to the `read_csv()` method. Note that the file for USA has more than 2M records." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "4bzIQHC0PuuP" - }, - "outputs": [], - "source": [ - "column_names = [\n", - " 'geonameid', 'name', 'asciiname', 'alternatenames',\n", - " 'latitude', 'longitude', 'feature class', 'feature code',\n", - " 'country code', 'cc2', 'admin1 code', 'admin2 code',\n", - " 'admin3 code', 'admin4 code', 'population', 'elevation',\n", - " 'dem', 'timezone', 'modification date'\n", - "]\n", - "\n", - "df = pd.read_csv(path, sep='\\t', names=column_names)\n", - "print(df.info())" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "VshmRaM9PuuP" - }, - "source": [ - "## Filtering Data\n", - "\n", - "The input data as a column `feature_class` categorizing the place into [9 feature classes](https://www.geonames.org/export/codes.html). We can select all rows with the value `T` with the category *mountain,hill,rock...*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "la68cvnEPuuP" - }, - "outputs": [], - "source": [ - "mountains = df[df['feature class']=='T']\n", - "print(mountains.head()[['name', 'latitude', 'longitude', 'dem','feature class']])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "xT_e9-RoPuuP" - }, - "source": [ - "## Creating Geometries\n", - "\n", - "GeoPandas has a conveinent function `points_from_xy()` that creates a Geometry column from X and Y coordinates. We can then take a Pandas dataframe and create a GeoDataFrame by specifying a CRS and the geometry column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "M1CvEn3iPuuP" - }, - "outputs": [], - "source": [ - "geometry = gpd.points_from_xy(mountains.longitude, mountains.latitude)\n", - "gdf = gpd.GeoDataFrame(mountains, crs='EPSG:4326', geometry=geometry)\n", - "print(gdf.info())" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "f3JryST0PuuP" - }, - "source": [ - "## Writing Files\n", - "\n", - "We can write the resulting GeoDataFrame to any of the supported vector data format. Here we are writing it as a new GeoPackage file." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "GIlU7-bXPuuQ" - }, - "source": [ - "You can open the resulting geopackage in a GIS and view the data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "iIHrl0MUPuuQ" - }, - "outputs": [], - "source": [ - "output_dir = 'output'\n", - "output_filename = 'mountains.gpkg'\n", - "output_path = os.path.join(output_dir, output_filename)\n", - "\n", - "gdf.to_file(driver='GPKG', filename=output_path, encoding='utf-8')\n", - "print('Successfully written output file at {}'.format(output_path))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "GNEo8XYlPuuQ" - }, - "source": [ - "## Exercise\n", - "\n", - "The data package contains multiple geonames text files from different countries in the `geonames/` folder. We have the code below that reads all the files, extract the mountain features and merges them in a single DataFrame using the `pd.concat()` function.\n", - "\n", - "The exercise is to convert the merged DataFrame to a GeoDataFrame as save it as a shapefile." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "HA4-XWeRPuuQ" - }, - "outputs": [], - "source": [ - "import os\n", - "import pandas as pd\n", - "import geopandas as gpd\n", - "\n", - "data_pkg_path = 'data/geonames/'\n", - "files = os.listdir(data_pkg_path)\n", - "\n", - "column_names = [\n", - " 'geonameid', 'name', 'asciiname', 'alternatenames',\n", - " 'latitude', 'longitude', 'feature class', 'feature code',\n", - " 'country code', 'cc2', 'admin1 code', 'admin2 code',\n", - " 'admin3 code', 'admin4 code', 'population', 'elevation',\n", - " 'dem', 'timezone', 'modification date'\n", - "]\n", - "\n", - "dataframes = []\n", - "for file in files:\n", - " path = os.path.join(data_pkg_path, file)\n", - " df = pd.read_csv(path, sep='\\t', names=column_names)\n", - " mountains = df[df['feature class']=='T']\n", - " dataframes.append(mountains)\n", - "\n", - "merged = pd.concat(dataframes)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "od1oRTKEQYrX" - }, - "outputs": [], - "source": [ - "merged" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Ym_GrrbFPuuQ" - }, - "source": [ - "----" - ] - } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.0" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "C-1GwDd2PuuM" + }, + "source": [ + "# Creating Spatial Data\n", + "\n", + "A common operation in spatial analysis is to take non-spatial data, such as CSV files, and creating a spatial dataset from it using coordinate information contained in the file. GeoPandas provides a convenient way to take data from a delimited-text file, create geometry and write the results as a spatial dataset.\n", + "\n", + "We will read a tab-delimited file of places, filter it to a feature class, create a GeoDataFrame and export it as a GeoPackage file.\n", + "\n", + "![](https://github.com/spatialthoughts/python-foundation-web/blob/master/images/python_foundation/geonames_mountains.png?raw=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e9RmLSUPPuuN" + }, + "outputs": [], + "source": [ + "import os\n", + "import pandas as pd\n", + "import geopandas as gpd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jU6wKHEQPuuO" + }, + "outputs": [], + "source": [ + "data_pkg_path = 'data/geonames/'\n", + "filename = 'US.txt'\n", + "path = os.path.join(data_pkg_path, filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_aBjuXsNPuuO" + }, + "source": [ + "## Reading Tab-Delimited Files\n", + "\n", + "The source data comes from [GeoNames](https://en.wikipedia.org/wiki/GeoNames) - a free and open database of geographic names of the world. It is a huge database containing millions of records per country. The data is distributed as country-level text files in a tab-delimited format. The files do not contain a header row with column names, so we need to specify them when reading the data. The data format is described in detail on the [Data Export](https://www.geonames.org/export/) page.\n", + "\n", + "We specify the separator as **\\\\t** (tab) as an argument to the `read_csv()` method. Note that the file for USA has more than 2M records." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4bzIQHC0PuuP" + }, + "outputs": [], + "source": [ + "column_names = [\n", + " 'geonameid', 'name', 'asciiname', 'alternatenames',\n", + " 'latitude', 'longitude', 'feature class', 'feature code',\n", + " 'country code', 'cc2', 'admin1 code', 'admin2 code',\n", + " 'admin3 code', 'admin4 code', 'population', 'elevation',\n", + " 'dem', 'timezone', 'modification date'\n", + "]\n", + "\n", + "df = pd.read_csv(path, sep='\\t', names=column_names)\n", + "print(df.info())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VshmRaM9PuuP" + }, + "source": [ + "## Filtering Data\n", + "\n", + "The input data as a column `feature_class` categorizing the place into [9 feature classes](https://www.geonames.org/export/codes.html). We can select all rows with the value `T` with the category *mountain,hill,rock...*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "la68cvnEPuuP" + }, + "outputs": [], + "source": [ + "mountains = df[df['feature class']=='T']\n", + "print(mountains.head()[['name', 'latitude', 'longitude', 'dem','feature class']])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xT_e9-RoPuuP" + }, + "source": [ + "## Creating Geometries\n", + "\n", + "GeoPandas has a conveinent function `points_from_xy()` that creates a Geometry column from X and Y coordinates. We can then take a Pandas dataframe and create a GeoDataFrame by specifying a CRS and the geometry column." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "M1CvEn3iPuuP" + }, + "outputs": [], + "source": [ + "geometry = gpd.points_from_xy(mountains.longitude, mountains.latitude)\n", + "gdf = gpd.GeoDataFrame(mountains, crs='EPSG:4326', geometry=geometry)\n", + "print(gdf.info())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f3JryST0PuuP" + }, + "source": [ + "## Writing Files\n", + "\n", + "We can write the resulting GeoDataFrame to any of the supported vector data format. The format is inferred from the file extension. Use `.shp` if you want to save the results as a shapefile. Here we are writing it as a new GeoPackage file so we use the `.gpkg` extension." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GIlU7-bXPuuQ" + }, + "source": [ + "You can open the resulting geopackage in a GIS and view the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iIHrl0MUPuuQ" + }, + "outputs": [], + "source": [ + "output_dir = 'output'\n", + "output_filename = 'mountains.gpkg'\n", + "output_path = os.path.join(output_dir, output_filename)\n", + "\n", + "gdf.to_file(filename=output_path, encoding='utf-8')\n", + "print('Successfully written output file at {}'.format(output_path))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GNEo8XYlPuuQ" + }, + "source": [ + "## Exercise\n", + "\n", + "The data package contains multiple geonames text files from different countries in the `geonames/` folder. We have the code below that reads all the files, extract the mountain features and merges them in a single DataFrame using the `pd.concat()` function.\n", + "\n", + "The exercise is to convert the merged DataFrame to a GeoDataFrame as save it as a shapefile." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HA4-XWeRPuuQ" + }, + "outputs": [], + "source": [ + "import os\n", + "import pandas as pd\n", + "import geopandas as gpd\n", + "\n", + "data_pkg_path = 'data/geonames/'\n", + "files = os.listdir(data_pkg_path)\n", + "\n", + "column_names = [\n", + " 'geonameid', 'name', 'asciiname', 'alternatenames',\n", + " 'latitude', 'longitude', 'feature class', 'feature code',\n", + " 'country code', 'cc2', 'admin1 code', 'admin2 code',\n", + " 'admin3 code', 'admin4 code', 'population', 'elevation',\n", + " 'dem', 'timezone', 'modification date'\n", + "]\n", + "\n", + "dataframes = []\n", + "for file in files:\n", + " path = os.path.join(data_pkg_path, file)\n", + " df = pd.read_csv(path, sep='\\t', names=column_names)\n", + " mountains = df[df['feature class']=='T']\n", + " dataframes.append(mountains)\n", + "\n", + "merged = pd.concat(dataframes)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "od1oRTKEQYrX" + }, + "outputs": [], + "source": [ + "merged" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ym_GrrbFPuuQ" + }, + "source": [ + "----" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file