Data Scientist & AI Engineer

Bijaya Kumar Pariyar

Building production-grade AI systems | RAG pipelines | LLM integrations | Data science at scale

20+

ML Projects

1M+

Rows Processed

50+

GitHub Repos

About Me

Self-taught Data Scientist with hands-on experience building end-to-end ML solutions across classification, regression, clustering, anomaly detection, and time series forecasting.

Currently working as a Data Scientist Intern at CR Equity AI, Inc., where I design and implement production-grade RAG pipelines, integrate LLM APIs, and build FastAPI services for document intelligence. My work spans from classical ML to cutting-edge AI systems, with a strong focus on security, scalability, and real-world deployment.

I've completed over 20 structured ML projects, handling large-scale datasets with 1M+ rows for fraud detection and 1.7M+ rows for forecasting. My approach combines rigorous evaluation, model explainability through SHAP, and reproducible research workflows.

Education: BCA, Tribhuvan University
Location: Kathmandu, Nepal
Role: Data Scientist Intern
Company: CR Equity AI, Inc.

Experience

Data Scientist Intern

CR Equity AI, Inc.

Remote (Tallahassee, Florida, United States)

Dec 2025 - Present

Contributing to LLM-based RAG systems and backend AI services for document intelligence. Collaborating remotely with a US-based engineering team using Agile development practices.

Key Responsibilities & Achievements:
  • Designed and implemented end-to-end RAG pipelines including document chunking, embedding generation, vector retrieval, reranking, and prompt construction
  • Integrated and managed LLM APIs with retry logic, timeout handling, prompt/token constraints, and response validation
  • Built production-grade FastAPI services for document processing, querying, and AI inference
  • Worked with vector databases (FAISS, Qdrant, PostgreSQL pgvector) for scalable semantic search
  • Implemented asynchronous processing pipelines using Redis-backed job tracking and webhook-based callbacks
  • Applied security best practices including JWT authentication, SSRF protection, rate limiting, CORS, and trusted host validation
  • Containerized backend services using Docker and implemented structured logging, metrics collection, and observability
  • Conducting research on SOC practices and security hardening for production AI and backend systems
Python FastAPI RAG LLMs FAISS Qdrant Docker Redis PostgreSQL JWT

Featured Projects

A selection of my recent work in ML, AI, and data science

RAG Precision Enhancement

FAISS • Cross-Encoder • Reranking

Comparative study demonstrating why retrieval alone isn't enough. Built two-stage retrieval + reranking pipeline with noise injection to stress-test semantic search performance.

  • Noise injection with 20+ hard-negative documents
  • Cross-encoder reranking (BAAI/bge-reranker-base)
  • Evaluated via Hit Rate@k and MRR metrics
GitHub
ChurnShield

Flask • Scikit-learn • SHAP • SQLite

Flask web app for real-time churn prediction with authentication, admin dashboard, and CSV export. Integrated Random Forest pipeline with SHAP explainability.

  • Real-time prediction with Random Forest
  • SHAP-based feature importance
  • Automated retention strategy generator
GitHub
IEEE Fraud Detection

XGBoost • SHAP • Imbalanced Learning

Transaction classification on 1M+ Kaggle IEEE-CIS records using XGBoost. Achieved ROC-AUC 0.95 and F1 0.66 on imbalanced fraud class.

  • Processed 1M+ transaction records
  • Feature reduction via Spearman correlation
  • Hyperparameter tuning with Optuna
GitHub
Time Series Forecasting

LightGBM • Prophet • Feature Engineering

SKU-level supermarket price forecasting on 1.7M+ rows using LightGBM and Prophet with lag, rolling, and calendar features.

  • Processed 1.7M+ retail records
  • TimeSeriesSplit cross-validation
  • Evaluated via RMSE, MAPE, SMAPE
GitHub
Retail Price Optimization

Random Forest • Optuna • SHAP

Competition-aware ML workflow integrating historical sales and competitor pricing. Achieved R² ≈ 0.92 with Random Forest and SHAP explainability.

  • Full ML pipeline with EDA
  • Hyperparameter tuning with Optuna
  • Profit and revenue optimization
GitHub
Anomaly Detection

Isolation Forest • LOF • KMeans • DBSCAN

Multi-method anomaly detection pipeline combining model-based, cluster-based, and statistical approaches on fraud and retail datasets.

  • Multiple detection algorithms compared
  • Evaluation via ROC-AUC and precision/recall
  • PCA visualization of anomalies
GitHub

Technical Skills

Programming
Python SQL Java JavaScript PHP
Machine Learning
Classification Regression Clustering Anomaly Detection Time Series Feature Engineering Hyperparameter Tuning SHAP
AI & NLP
LLMs RAG Systems Vector Databases Embeddings Semantic Search Reranking
Frameworks & Libraries
Scikit-learn Pandas NumPy LightGBM XGBoost Prophet sentence-transformers
Databases & Vector Stores
FAISS Qdrant PostgreSQL (pgvector) Redis SQLite MySQL
Deployment & DevOps
FastAPI Flask Docker Git/GitHub JWT Auth Rate Limiting

Research

SHAP-Based Feature Selection and Iterative Hyperparameter Tuning for Customer Churn Prediction in Telecommunication Datasets

Preprint • 2024

Explores interpretability and optimization techniques for churn modeling, demonstrating SHAP values for transparent feature importance and iterative tuning for enhanced model performance on real-world telecom data.

SHAP Hyperparameter Tuning Churn Prediction Telecom

Get In Touch

I'm always interested in discussing new opportunities, collaborations, or innovative projects

Email

bijaybeezoe@gmail.com

Phone

+977-9767645335

Location

Kathmandu, Nepal