Portfolio

Titanic Model

About

THe Sinking Of Titanic Is One Of The Most Infamous Shipwrecks In History On April 15 , 1912. The Titanic Sank After Colliding With An Iceberg . Main Aim In This Model Is To Predict The Survival Of Passengers.You Are Given A Training And Testing Data Set In Csv Format As Well As A Data Dictionary

Approach

Random Forest Classifier

Data Analysis

Import Essential Libraries Numpy, Pandas, Matplotlib. Pyplot, Seaborn

Target Salesprice

Identifying Null Values

Dropping The Cabin Column Because It Has More Than 75% Of Missing Values

Filling The Missing Values

Performing Feature Engineering

Converting The Categorical Type Of Data Into Dummy Variables

Splitting The Test And Train Data

Random Forest Gives An Accuracy Of 89%

After Testing It Gives An Accuracy Of 85%

House Price Prediction

About

The Main Aim Is To Predict Sales Price For Homes. You Are Given A Training And Testing Data Set In Csv Format As Well As A Data Dictionary

Approach

Using Vector Machine.

Data Analysis

Import Essential Libraries Numpy, Pandas, Matplotlib. Pyplot, Seaborn

Target Salesprice

Identifying Null Values

Dropping The Null Values Which Have Number More Than 20%

Filling The Missing Values

Selecting The Features Which Are Coorelated By More Than 50% To Salesprice.

Converting Object Type Of Data Into Numerical Form

After Seeing The Skewness Of Data, We Will Reduce It By Using Log Function

Cross Validation

Support Vector Machine Gives An Accuracy Of 88.97%

Breast Cancer Detection

About

The Main Aim Is To Predict Diagnoses Of Cancer As Malignant (Cancerous) And Benign (Non-Cancerous). You Are Given A Training And Testing Data Set In Csv Format As Well As A Data Dictionary

Approach

Using Different Models Like Random Forest, Logistic Regression, Decision Tree.

Data Analysis

Import Essential Libraries Numpy, Pandas, Matplotlib. Pyplot, Seaborn

Target Column Diagnoses

Identifying Null Values

Dropping The Column Unamed That Have All The Null Values

Getting The Count Of Malignant And Benign Cells Using Countplot Filling

Label Encoding(Converting The Value Of M And B Into 1 And 0)

Checking The Coorelation Among The Columns Using Heatmap.

Splitting The Test And Train Data

Random Forest (Accuracy-99.7%), Logistic Regression (Accuracy- 99.12%, Decision Tree (Accuracy-100%) )

After Testing Accuracy

Random Forest Gives Max. Accuracy (Accuracy-97.3%)

Project