sl

Stock Data Analysis

The project aimed to extract and visualize stock data for Tesla and GameStop. This involved:

Data Extraction: Using yfinance to obtain stock information and webscraping to extract revenue data for both companies.

Data Visualization: Creating visual representations of the stock data to analyze trends and patterns.

Key Libraries and Techniques:

Stock Data Extraction: yfinance was used to retrieve stock data for Tesla and GameStop.

Webscraping: A webscraping library (e.g., Beautiful Soup, Selenium) was used to extract revenue data from relevant websites.

Data Visualization: Matplotlib or Seaborn could be used to create plots and charts.

Analyzing wildfire activities in Australia

The project aimed to analyze historical Australian wildfire data to understand trends, patterns, and relationships between various wildfire characteristics.

Key Libraries and Techniques:

Data Analysis: Pandas was used for data manipulation and analysis.

Data Visualization: Matplotlib and Seaborn were employed to create plots and charts.

Geographical Mapping: Folium was used to create maps with geographic markers.

Key Findings:

Temporal Trends: The average estimated fire area exhibited a significant peak between 2010 and 2013.

Seasonal Patterns: When analyzing the estimated fire area by year and month, seasonal trends were observed.

Regional Variations: The mean estimated fire brightness varied across different regions, with some regions experiencing higher levels of fire intensity than others.

Pixel Count Distribution: The distribution of the count of pixels for presumed vegetation fires was uneven across regions, with some regions having a higher proportion of fire activity.

Fire Brightness Distribution: The distribution of mean estimated fire brightness was analyzed using histograms and revealed insights into the overall fire intensity.

Regional Fire Brightness: The relationship between estimated fire brightness and regions was explored using seaborn's functionality, providing a visual representation of regional variations.

Correlation Analysis: The correlation between mean estimated fire radiative power and mean confidence level was investigated to identify potential relationships.

Geographical Mapping: The seven regions were marked on a map of Australia using Folium to visualize their locations.



Multiple Machine Learning Model Applications

Telecom Customer Churn Prediction:

Goal: Predict which customers of a telecommunications company are likely to churn (cancel their service) and switch to a competitor.

Data: Historical customer data including demographics, service subscriptions, account details, and churn information.

Model: Logistic Regression with different solver and regularization parameters.

Evaluation: Jaccard Index, Confusion Matrix, Log Loss.


Cancer Cell Classification:

Goal: Classify human cell samples as benign or malignant based on their characteristics.

Data: Public dataset containing cell features and their classifications.

Model: Support Vector Machines (SVM) with Radial Basis Function (RBF) kernel.

Evaluation: Classification Report, Confusion Matrix.


Customer Segmentation:

Goal: Segment customers into groups based on their similarities to create targeted marketing campaigns.

Data: Historical customer data containing demographics and potentially other relevant information.

Model: K-means Clustering algorithm.

Evaluation: Visualization of customer distribution based on chosen features.


Fuel Consumption Prediction:

Goal: Predict a car's CO2 emissions based on engine size, cylinders, and fuel consumption in city and highway driving.

Data: Dataset containing car models, fuel consumption data, and CO2 emissions.

Model: Multiple Linear Regression with different combinations of features.

Evaluation: Mean Squared Error (MSE), Explained Variance Score.


Credit Card Fraud Detection Model

This project aimed to develop machine learning models for detecting fraudulent credit card transactions. Key aspects of the project included:

Data Preparation:

Dataset Acquisition: A dataset containing anonymized credit card transaction data was obtained.

Data Preprocessing: The data was analyzed and preprocessed to ensure it was suitable for modeling.

Class Imbalance Handling: Techniques were implemented to address the imbalance in the target variable (fraudulent vs. legitimate transactions).

Model Development:

Decision Tree: Decision Tree models were built using both Scikit-Learn and Snap ML libraries.

Support Vector Machine: SVM models were also developed using Scikit-Learn and Snap ML.

Model Training: The models were trained on the preprocessed dataset, incorporating class weights to address the imbalance.

Model Evaluation:

Performance Metrics: ROC-AUC score was used to evaluate the models' performance on the test set.

Comparison: The performance of Scikit-Learn and Snap ML models was compared.

Key Findings:

Snap ML Performance: Snap ML demonstrated significant speedup in training both Decision Tree and SVM models compared to Scikit-Learn.

Model Accuracy: Both Scikit-Learn and Snap ML models achieved similar performance on the test set, as measured by ROC-AUC score.

Hinge Loss: The hinge loss metric was calculated for both models to further assess their performance.

Conclusion:

This project successfully developed and evaluated machine learning models for credit card fraud detection. The results highlight the benefits of using Snap ML for accelerating model training while maintaining compatibility with Scikit-Learn tools. Future work could explore additional models or techniques to further improve fraud detection accuracy.