Wordcloud: Presidential conferences analysis (México)

With so many presidential conferences in México for about two years, have you ask what words are the most spoken? Who has been participated on them and what they said? Let´s go to analize them...

This proyect is based on the exercise from Prof. Jorge Luis Novelo and his post on LinkeId

Project stages:

Extract: Finish, main branch (some improvements on the way)
Transform: WIP, develop and feature_* branch
Load: To build

Setup

Clone this repository

git clone git@github.com:FernandoTorresL/scraping-conferencias.git

Prepare the environment

# Create and activate the environment
python3 -m venv venv
source venv/bin/activate

python3 -m pip install --upgrade pip

pip3 install wheel
pip3 install jupyter
pip3 install bs4 requests numpy
pip3 install pandas
pip3 install nltk wordcloud stop-words

or use the requirements.txt file:

# Create and activate the environment
python3 -m venv venv
source venv/bin/activate

python3 -m pip install --upgrade pip

pip install -r requirements.txt

Use the Jupyter Notebook

Change to the notebook directory

cd notebook

Execute Jupyter Notebook (with active environment)

./venv/bin/jupyter notebook

Open scraping-conferencias.ipynb

Use the Scrapy spider

cd project_live/extract/wordcloud_conferences
scrapy crawl get_transcriptions

It create a data_{timestamp*}.json and .csv files on project_live/extract/data_extracted/

*UTC date and time, in ISO format

Transform Stage:

WIP (Work in Progress)

Future updates and automatization in progress (May 09, 2021)

Follow me

fertorresmx.dev

🌐 Twitter, Instagram: @fertorresmx

Wordcloud: análisis de las conferencias presidenciales (México)

Con tantas conferencias presidenciales en México desde hace dos años, ¿te has preguntado qué palabras son las que más se mencionan en ellas? ¿Quiénes han participado y qué han dicho? Vamos a analizarlo...

Éste proyecto está inspirado por el ejercicio del Prof. Jorge Luis Novelo y su post en LinkeId

Etapas del proyecto:

Extract (Extracción): Finalizada, main branch (con algunas mejoras en progreso)
Transform (Transformación): WIP, develop and feature_* branch
Load (Carga): Pendiente

Setup

Clonar el repositorio

git clone git@github.com:FernandoTorresL/scraping-conferencias.git

Preparar el ambiente

# Crear y activar el ambiente
python3 -m venv venv
source venv/bin/activate

python3 -m pip install --upgrade pip

pip3 install wheel
pip3 install jupyter
pip3 install bs4 requests numpy
pip3 install pandas
pip3 install nltk wordcloud stop-words
pip3 install wheel scrapy autopep8

o utilicemos el archivo requirements.txt:

# Crear y activar el ambiente
python3 -m venv venv
source venv/bin/activate

python3 -m pip install --upgrade pip

pip install -r requirements.txt

Utilizar archivo Jupyter Notebook

Cambia al directorio notebook

cd notebook

Ejecuta Jupyter Notebook (dentro del ambiente)

./venv/bin/jupyter notebook

Abre el notebook scraping-conferencias.ipynb

Utilizar spider de Scrapy

cd project_live/extract/wordcloud_conferences
scrapy crawl get_transcriptions

Se generará el archivo data_{timestamp*}.json y .csv en la carpeta project_live/extract/data_extracted/
Con fecha y tiempo UTC, en formato ISO

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
notebook		notebook
project_live/extract/wordcloud_conferences		project_live/extract/wordcloud_conferences
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wordcloud: Presidential conferences analysis (México)

Project stages:

Setup

Use the Jupyter Notebook

Use the Scrapy spider

Transform Stage:

WIP (Work in Progress)

Follow me

fertorresmx.dev

🌐 Twitter, Instagram: @fertorresmx

Wordcloud: análisis de las conferencias presidenciales (México)

Etapas del proyecto:

Setup

Utilizar archivo Jupyter Notebook

Utilizar spider de Scrapy

Etapa de Transformación de datos:

WIP (Work in Progress)

Follow me

🌐 Twitter, Instagram: @fertorresmx

🌐 Twitter, Instagram: @fertorresmx

About

Releases

Packages

Languages

FernandoTorresL/scraping-conferencias

Folders and files

Latest commit

History

Repository files navigation

Wordcloud: Presidential conferences analysis (México)

Project stages:

Setup

Use the Jupyter Notebook

Use the Scrapy spider

Transform Stage:

WIP (Work in Progress)

Follow me

fertorresmx.dev

🌐 Twitter, Instagram: @fertorresmx

Wordcloud: análisis de las conferencias presidenciales (México)

Etapas del proyecto:

Setup

Utilizar archivo Jupyter Notebook

Utilizar spider de Scrapy

Etapa de Transformación de datos:

WIP (Work in Progress)

Follow me

🌐 Twitter, Instagram: @fertorresmx

🌐 Twitter, Instagram: @fertorresmx

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages