CV

Hello, there 👋🏾!

I'm an economist and data scientist with over 7 years of professional experience and more than 10 years working with data. I specialize in using quantitative methods to process, analyze, and visualize data for decision-making purposes.

Throughout my career, I have worked on various projects involving the processing and visualization of open data to generate economic and social indicators used for diagnosing and evaluating public policies. I have also participated in projects that required a range of skills, including developing statistical and econometric models, processing geospatial information, evaluating experimental treatments, monitoring media using web scraping, and creating web platforms and APIs, among others involving technology and analysis. More details are in the next sections (below the CV PDF).

I have a passion for teaching programming, and for over 3 years, I have had the privilege of delivering courses on introductory data management, information visualization, and machine learning. Currently, I'm a data science teacher at Le Wagon.

CV_Juan_Santos english.pdf

Specialized skills

My main expertise is data analysis, visualization, and statistical modeling, and throughout the years I have gained some extra specialized skills such as:

Web Scrapping

These are some of the projects that I have developed:

While working at LNPP-CIDE:
- - I was in charge of media monitoring, so I created a platform to automate custom web searches and look for mentions of the institution, teachers, and research centers in digital newspapers. I developed a script able to scrape more than one hundred Mexican newspapers and gather the text of the news and their metadata (date, author, title). The platform allowed final users (the PR department) to retrieve the information and export it as Excel files for their reports.
  - I created a platform to scrape every day the website of the Mexican federal government public procurement. With this information, I used regular expressions to search for contracts where we as an institution could participate in making a proposal.
  - I created the backend for a platform that scrapped around 50 specialized newspapers to find news about innovation and new technologies in relevant economic industries to the State of Guanajuato Government.
As a freelancer:
- - I have scrapped thousands of news and created datasets about violence against journalists, homicides, lynchings, and mentions of public institutions.
  - I created an executable Windows program (.exe) to scrape baseball statistics.
- As a personal project, I scraped the Diario Oficial de la Federación (DOF) and created a public dataset. The DOF is the official government gazette where official documents, laws, decrees, regulations, and notices from the Mexican federal government are published.
- Script to scrape the results for searching approved medicines by the Mexican medicines regulator (COFEPRIS): code
- I created this repo for scraping product categories of the Walmart Mexico website, and this one for scraping product information (This was part of a technical interview).
- More web scrapping projects on my gist account.

For these projects, I use Python and libraries as requests and BeautifulSoup, and sometimes with the Django framework to create web platforms.

Statistical modeling & Machine Learning

I took the Deep Learning Diploma (in-person) offered by the CIMAT-INAOE consortium for Artificial Intelligence, and also the Deep Learning Specialization (online) by Coursera. I have also taught about machine learning models when in CIDE and also on Le Wagon.

Some of the projects where I applied statistical models and machine learning techniques are:

I worked with Kavak employees to train a regression model to calculate the expected value they had to invest when they buy a used car.
I used panel data models to identify anomalies in the number of beneficiaries of social programs previous to the 2018 national and state elections in Mexico.
Me, and my teammates, won second place in the Anticorruption Datathon using machine learning models to detect anomalies in the asset declaration of public employees.
Me, and my teammate, won first place in the Brewing Data Cup using a combination of linear programming algorithms and unsupervised learning models to optimize the product distribution of one of the most important beer companies in Mexico. repo
Me, and my teammate, won a gold medal in the Ecobici data visualization challenge by training a machine learning model that predicts the status of the bike stations for the next 20 minutes.
Me, and my teammate, won first place in the Mexico City mobility challenge creating a model to predict the expected time for trips (from origin to destination) of the RPT system, Metrobus.
Me, and my teammates, won the first place of the Prison Datathon by creating an index for measuring the living conditions in prisons.
Me, and my teammates, won the first in the Anticorruption Datathon by creating a index for measuring irregularities in public procurement using PCA.

Complex surveys analysis

I have worked with microdata from complex survey designs to calculate customized estimations and econometric models. When using these datasets, it's important to consider some elements of the survey design as the strata, FPC, and weights. Some of the surveys I have worked with are:

Population Census, Mexico. Look at this example.
Encuesta Nacional de Ingresos y Gastos de los Hogares, ENIGH, Mexico. I used this survey to participate in the Data Mexico Challenge.
Encuesta Nacional de Población Privada de la Libertad, ENPOL, Mexico. I used this survey to participate in the Prison Datathon
Encuesta Nacional de Ocupación y Empleo, ENOE, Mexico
Encuesta Nacional de Inclusión Financiera, ENIF, Mexico.
Encuesta de Movilidad Social, ESRU-EMOVI, Mexico. I used this survey for my Master's thesis.
Demographic and Health Survey, DHS, Colombia. I used this survey for my undergraduate thesis.

When working with complex surveys, I usually use the Stata software or the R programming language, although I have also used Python.

Geospatial information

I have experience using GIS tools as QGIS, ArcGIS and GeoDa to analyze and visualize geospatial data. I have also worked on projects where I have used Python to automate GIS processes, saving time and increasing efficiency.

As a data scientist, I have had the opportunity to work on projects where I have made extensive use of geospatial calculations, vector operations, and map visualizations. Specifically, I have experience in tasks such as calculating distances, areas, spatial autocorrelation, clustering, and conducting vector operations such as centroids, buffers, unions, intersections, and differences, among others. I have also developed skills in map visualization techniques such as choropleths, dot density, heatmaps, animations, and interactive visualizations. Additionally, I have experience with raster operations, including georeferencing images, conducting zonal statistics, and analyzing differences over time.

These are some of my most relevant works related to GIS that are publicly available:

My blog entries about geospatial analysis:

Point maps with Python
Introduction to PyQGIS (Python + QGIS)
Symbology and Choropleth maps in Python
PyQGIS integration with Jupyter Lab using Anaconda
Tiles for maps in Geopandas
All other entries are related to data analysis and visualization.

Mexico City urban evolution: code and video.
All trips in the Mexico City public bike sharing system during one day: code and video.
Comparing flight patterns after redesigning Mexico City's airspace: video
Uber trips in Mexico City.
Mexico City travel distances using the urban network
Comparing Mexico City satellite images from 1993 and 2019
Poverty estimation on census areas of metropolitan urban areas in Mexico
Individual bike trips on Mexico City: code and video.
Benito Juarez Municipality average land value: code and video
Brewing Data Cup 2020 datathon.
Ratser zonal statistics calculation.
I made the maps of these working papers:

Manejo ambiental en Seaflower, reserva de biosfera en el archipiélago de San Andrés, Providencia y Santa Catalina

Causal inference

As an Economist, I took specialized courses on causal inference, using experimental and quasi-experimental techniques (difference-in-difference, propensity score matching, synthetic controls, panel data, and instrumental variables).

When working as a consultant for CLEAR-LAC, I had to evaluate the technical quality and statistical robustness of more than 20 impact evaluation reports delivered by external consultants to the IDB (Inter-American Development Bank). These reports included experimental and quasi-experimental methodologies.
When working at LNPP-CIDE, I was in charge of the statistical analysis to evaluate an experimental policy that aimed to increase COVID-19 testing in the Mexico City population sending different SMS messages.
As a freelancer, I did the calculations of sample size and posterior statistical impact analysis of a political experiment ran using Facebook campaigns.

Text analytics

I have worked extensively with texts, analyzing thousands of documents in tasks like:

Preprocess and clean text corpus.
Visualize the most frequent terms.
Search patterns like proper names, addresses, mentions, events, and specific grammatical structures using regular expressions.
Find main topics in a corpus of texts using machine learning techniques like LDA and clustering.

You can check my public repository (in Spanish) for the workshop that I presented to employees of the prosecutor's office of the State of Hidalgo:

Web development

I have used Flask, Django, and Ruby On Rails to make some web development projects:

As mentioned before, I created a platform for media monitoring for CIDE and, also, another one for finding public procurement.
When working at LNPP-CIDE, I created a platform for updating and querying social and economic indicators and their metadata.
When I did my web development bootcamp, I contributed to the backend and frontend of our final project, which was developed with Ruby on Rails and ReactJS.

Cloud computing

When working at LNPP-CIDE:
- I was the only developer(-ish) using AWS services such as EC2, EBS, RDS, S3, Batch Jobs, Lambda Functions, and more.
- I used Azure services to automate the transcription of audio interviews to text using services such as speech recognition, buckets, and step functions.
As a teacher in Le Wagon, I teach about cloud services using GCP such as VM instances, GCS, Cloud Run, BigQuery, and more.

Teaching

When working at LNPP-CIDE, I taught Python programming for data management, visualization, and machine learning for 3 years.
- In this period, I taught these topics to employees of reputed institutions such as Banco de Mexico (Banxico) , Comisión Federal de Competencia Económica (COFECE), and the prosecutor's office of the State of Hidalgo.
I currently teach the full sequence of Le wagon data science bootcamp, including complex topics such as Deep learning and cloud services.

Page updated

Google Sites

Report abuse