Titanic Model
About
THe Sinking Of Titanic Is One Of The Most Infamous Shipwrecks In History On April 15 , 1912. The Titanic Sank After Colliding With An Iceberg . Main Aim In This Model Is To Predict The Survival Of Passengers.You Are Given A Training And Testing Data Set In Csv Format As Well As A Data Dictionary
Approach
Random Forest Classifier
Data Analysis
Import Essential Libraries Numpy, Pandas, Matplotlib. Pyplot, Seaborn
Target Salesprice
Identifying Null Values
Dropping The Cabin Column Because It Has More Than 75% Of Missing Values
Filling The Missing Values
Performing Feature Engineering
Converting The Categorical Type Of Data Into Dummy Variables
Splitting The Test And Train Data
Random Forest Gives An Accuracy Of 89%
After Testing It Gives An Accuracy Of 85%
House Price Prediction
About
The Main Aim Is To Predict Sales Price For Homes. You Are Given A Training And Testing Data Set In Csv Format As Well As A Data Dictionary
Approach
Using Vector Machine.
Data Analysis
Import Essential Libraries Numpy, Pandas, Matplotlib. Pyplot, Seaborn
Target Salesprice
Identifying Null Values
Dropping The Null Values Which Have Number More Than 20%
Filling The Missing Values
Selecting The Features Which Are Coorelated By More Than 50% To Salesprice.
Converting Object Type Of Data Into Numerical Form
After Seeing The Skewness Of Data, We Will Reduce It By Using Log Function
After Seeing The Skewness Of Data, We Will Reduce It By Using Log Function
Cross Validation
Support Vector Machine Gives An Accuracy Of 88.97%
Breast Cancer Detection
About
The Main Aim Is To Predict Diagnoses Of Cancer As Malignant (Cancerous) And Benign (Non-Cancerous). You Are Given A Training And Testing Data Set In Csv Format As Well As A Data Dictionary
Approach
Using Different Models Like Random Forest, Logistic Regression, Decision Tree.
Data Analysis
Import Essential Libraries Numpy, Pandas, Matplotlib. Pyplot, Seaborn
Target Column Diagnoses
Identifying Null Values
Dropping The Column Unamed That Have All The Null Values
Getting The Count Of Malignant And Benign Cells Using Countplot Filling
Label Encoding(Converting The Value Of M And B Into 1 And 0)
Checking The Coorelation Among The Columns Using Heatmap.
Splitting The Test And Train Data
Splitting The Test And Train Data
Random Forest (Accuracy-99.7%), Logistic Regression (Accuracy- 99.12%, Decision Tree (Accuracy-100%) )
After Testing Accuracy
Random Forest Gives Max. Accuracy (Accuracy-97.3%)