View on GitHub

LEE_Portfolio

Project 2: Cyclistic Bike-Share

Project Overview: https://www.kaggle.com/mollayo/bikeshare-riding-pattern

An optional case study at the end of Google Data Analytics Professional Program.
This is an opportunity to analyze historical bicycle trip data in order to identify trends, and it is important to understand how casual riders behave differently from annual membership riders.

This analysis will help executives to make decisions about digital marketing programs and strategies to convert casual riders to members. As a junior data analyst at Cyclistic(fictional entity), I completed this project by going through the 6 phases of the data analysis process: ASK, PREPARE, PROCESS, ANALYZE, SHARE and ACT

Phase 1. ASK

Clearly identify business task:

Identify stakeholders:

Phase 2. PREPARE

Collect or download data:

Store data:

Identify data format:

Phase 3. PROCESS

Choose cleaning tool:

Ensure data’s integrity:

Document the cleaning log:

Phase 4. ANALYZE

Code and Resources used:

Debug the query: to make sure it is valid, complete, and clean before we begin any analysis

Exploratory Data Analysis:

Phase 5. SHARE

Create effective data visualization to present the findings above:

Phase 6. ACT

Draw conclusion:

Provide marketing recommendations:

Additional recommendations:

Project 1: Healthcare Market Research

Project Overview: https://github.com/ahdfks/project_HC_Asia

Code and Resources used

Python version: 3.8.5
Packages: pandas, numpy, seaborn, matplotlib.pyplot
Tableau: 2020.4

LinkedIn company profile scraping

Scraped 1500+ companies in pharmaceuticals, biotechnology, and medical device industry in the Greater China Region.
With each company profile, we got the following:

Data Cleaning

After scraping the data, it is necessary to clean it up.
I made the following changes and created the following variables:

Exploratory Data Analysis

Created the distributions of cleaned data and the value counts for the various categorical variables. Below are a few highlights from the pivot tables:

thera type

Data Visualisation

Because Juypyter Notebook has a limitation on data visualisation, I used Tableau to optimize research results by comparing data by country/city/other specific combined levels. The dashboard is made up of following several sections:

dashboard