Herman Castillo Rendón
Data Analyst

Master of Business Administration, Universitat de València
Electronic Engineer, Universidad de Nariño
Math for programming, PLATZI
DataScience, PLATZI

Services

DATA SCRAPING

If there is not enough information, It may be required to acquire some data from other sources through the usage of web crawlers, and the storage of this data into databases or spreadsheets. This process is part of the data mining process, and some Python tools used with this purpose are SELENIUM, SCRAPY, REQUESTS, BS4 and some languages like XPATH.

DATA ANALYSIS

Through this process, the data is inspected, cleaned, transformed and modelated to extract useful information that helps the decision making. "Data is... analyzed to answer questions, test hypotheses, or disprove theories". Statistical knowledge, Python tools to explore and transform data like pandas, and tools like SciPy or Scikit-learn help to conclude and communicate the findings inside the data.

DATA VIZ

This process aims to show the findings obtained through the data analysis, using a practical method relying on the target audience.
Some tools used for technical topics are matplotlib and seaborn; but for a better dataviz can be used Plotly, Tableau, DataStudio or PowerBI.

About me

Download CV

Hi! I'm Herman, an electronic engineer and master of business administration.

My experience managing projects, taught me the importance of transforming data into information, to improve the performance in the company processes, operations and projects.

This inspired me to fully dedicate myself to data analysis, giving me the opportunity to help companies in their success and to guide them into a data driven philosophy.

Portfolio

Logistic Regression Model for Imbalanced Distributed Data

This project looks for a classification model to identify if a user would stop paying a loan (defaulter user is identified with the label "1"). Metrics like "recall" or "F1" will show how trustable the model is, furthermore the Logistic Regression model will allow us to visualize which variables are more important to identify a defaulter user.

Data Treatment and Supply Chain Forecasting

This project shows step by step the process from EDA to modeling a model to forecast the sales of a product in a period of time.

It contains vectorization of data, clustering, and forecasting models like Decission trees, Random forest and XGBoost.

Scraping and Clustering Properties from a Website

This project shows how to get some data from the internet to transform it into useful information that can help in the decision making processes. The final result is a visualization in Tableau letting the user interact with the findings of the data.
For this project it was used Python, selenium, descriptive statistics and clustering algorithms.

Bayes Theorem applied to Data Analysis

This is a brief notebook which shows the implementation of probability basics to make an analysis about a problem, using some basic programming tools in Python.

Python Script to Scrape and Analize Jobs from Linkedin

This project aims to scrape with Selenium a maximum of 500 vacancies from Linkedin, and then create a spreadsheet with the respective description of the job vacancies.
Furthermore, this spreadsheet file is processed by another Python script that will create a dataviz to show which abilities are most demanded based on the data of the file.

Artificial Neural Networks to infer behavior of a population

This project tries to estimate if a target variable is affected by some other predictor variables, to get a conclusion.
This project is under construction.