Data Scientist & AI Engineer
Building production-grade AI systems | RAG pipelines | LLM integrations | Data science at scale
ML Projects
Rows Processed
GitHub Repos
Self-taught Data Scientist with hands-on experience building end-to-end ML solutions across classification, regression, clustering, anomaly detection, and time series forecasting.
Currently working as a Data Scientist Intern at CR Equity AI, Inc., where I design and implement production-grade RAG pipelines, integrate LLM APIs, and build FastAPI services for document intelligence. My work spans from classical ML to cutting-edge AI systems, with a strong focus on security, scalability, and real-world deployment.
I've completed over 20 structured ML projects, handling large-scale datasets with 1M+ rows for fraud detection and 1.7M+ rows for forecasting. My approach combines rigorous evaluation, model explainability through SHAP, and reproducible research workflows.
CR Equity AI, Inc.
Remote (Tallahassee, Florida, United States)
Contributing to LLM-based RAG systems and backend AI services for document intelligence. Collaborating remotely with a US-based engineering team using Agile development practices.
A selection of my recent work in ML, AI, and data science
FastAPI • FAISS • Qdrant • Groq • LLMs
Modular RAG API built from scratch with FastAPI for document upload and natural language querying. Features switchable vector backends (FAISS/Qdrant), multiple chunking strategies, cross-encoder reranking, and Groq-powered generation with Llama-3.3-70B.
FAISS • Cross-Encoder • Reranking
Comparative study demonstrating why retrieval alone isn't enough. Built two-stage retrieval + reranking pipeline with noise injection to stress-test semantic search performance.
Flask • Scikit-learn • SHAP • SQLite
Flask web app for real-time churn prediction with authentication, admin dashboard, and CSV export. Integrated Random Forest pipeline with SHAP explainability.
XGBoost • SHAP • Imbalanced Learning
Transaction classification on 1M+ Kaggle IEEE-CIS records using XGBoost. Achieved ROC-AUC 0.95 and F1 0.66 on imbalanced fraud class.
LightGBM • Prophet • Feature Engineering
SKU-level supermarket price forecasting on 1.7M+ rows using LightGBM and Prophet with lag, rolling, and calendar features.
Random Forest • Optuna • SHAP
Competition-aware ML workflow integrating historical sales and competitor pricing. Achieved R² ≈ 0.92 with Random Forest and SHAP explainability.
Isolation Forest • LOF • KMeans • DBSCAN
Multi-method anomaly detection pipeline combining model-based, cluster-based, and statistical approaches on fraud and retail datasets.
Preprint • 2024
Explores interpretability and optimization techniques for churn modeling, demonstrating SHAP values for transparent feature importance and iterative tuning for enhanced model performance on real-world telecom data.
I'm always interested in discussing new opportunities, collaborations, or innovative projects
bijaybeezoe@gmail.com
+977-9767645335
Kathmandu, Nepal