Portfolio

Projects & Work

13+ projects spanning RAG systems, machine learning, NLP, and web apps — all with source code.

Showing all projects
AI / RAG
RAG Precision Enhancement

FAISS · Cross-Encoder · Reranking

Comparative study showing why retrieval alone isn't enough. Two-stage retrieval + reranking pipeline with noise injection to stress-test semantic search performance.

  • Noise injection with 20+ hard-negative documents
  • Cross-encoder reranking (BAAI/bge-reranker-base)
  • Evaluated via Hit Rate@k and MRR metrics
GitHub
Web App
ChurnShield

Flask · Scikit-learn · SHAP · SQLite

Flask web app for real-time churn prediction with user authentication, admin dashboard, CSV export, and SHAP-powered explainability. Automated retention strategy generator included.

  • Real-time prediction with Random Forest pipeline
  • SHAP-based feature importance per prediction
  • Automated retention strategy generator
GitHub
Machine Learning
IEEE Fraud Detection

XGBoost · SHAP · Imbalanced Learning · Optuna

Transaction classification on 1M+ Kaggle IEEE-CIS records. Achieved ROC-AUC 0.95 and F1 0.66 on the imbalanced fraud class with Optuna hyperparameter tuning.

  • Processed 1M+ transaction records
  • Feature reduction via Spearman correlation
  • Hyperparameter tuning with Optuna
GitHub
Machine Learning
Time Series Forecasting

LightGBM · Prophet · Feature Engineering

SKU-level supermarket price forecasting on 1.7M+ rows using LightGBM and Prophet with lag, rolling window, and calendar features.

  • Processed 1.7M+ retail records
  • TimeSeriesSplit cross-validation
  • Evaluated via RMSE, MAPE, SMAPE
GitHub
Machine Learning
Retail Price Optimization

Random Forest · Optuna · SHAP

Competition-aware ML workflow integrating historical sales and competitor pricing. Achieved R² ≈ 0.92 with Random Forest and SHAP explainability for business decisions.

  • Full ML pipeline with EDA
  • Hyperparameter tuning with Optuna
  • Profit and revenue optimization analysis
GitHub
Machine Learning
Anomaly Detection

Isolation Forest · LOF · KMeans · DBSCAN

Multi-method anomaly detection pipeline combining model-based, cluster-based, and statistical approaches on fraud and retail datasets with PCA visualization.

  • Multiple detection algorithms benchmarked
  • Evaluation via ROC-AUC and precision/recall
  • PCA visualization of anomalies
GitHub
NLP
Text Embeddings Explorer

sentence-transformers · FAISS · Visualization

Scripts and experiments for exploring text embedding models, visualizing embedding spaces, and benchmarking retrieval quality across different encoder models.

GitHub
AI / RAG
Semantic Search with FAISS

FAISS · sentence-transformers · Python

End-to-end semantic search implementation using FAISS for fast approximate nearest neighbor search with sentence-transformers for dense embedding generation.

GitHub
AI / RAG
Qdrant Vector DB Experiments

Qdrant · Python · Vector Search

Experiments with Qdrant vector database — collection management, payload filtering, hybrid search, and performance benchmarks for RAG applications.

GitHub
NLP
Text Chunking Strategies

Python · LangChain · Benchmarking

Comparative study of text chunking strategies — fixed-size, sentence-based, recursive, and semantic chunking — evaluating their impact on RAG retrieval quality.

GitHub
Machine Learning
Regression Portfolio

Scikit-learn · XGBoost · LightGBM · Optuna

Collection of regression projects across different domains — housing prices, demand forecasting, and insurance costs — with thorough EDA, feature engineering, and evaluation.

GitHub
Machine Learning
Customer Segmentation

KMeans · DBSCAN · PCA · Silhouette Analysis

Unsupervised learning project exploring clustering algorithms for customer segmentation. Includes PCA for dimensionality reduction and silhouette analysis for optimal cluster selection.

GitHub

No projects found for this filter.