Shubham Sharma

About Me

I am an AI researcher and engineer specializing in multimodal AI and large language model–based systems, with a focus on building models that jointly reason over text, vision, audio, video, and structured data. I graduated Rank 1 with a B.E. (Honors) in Computer Science & Engineering from Panjab University (CGPA: 9.23/10) and have conducted research across six labs in three countries, leading to publications at EMNLP, ECIR, ICDAR, and Elsevier’s Computers & Security. My work spans educational AI, multilingual multimodal NLP, biomedical imaging, and self-supervised learning, including research at Carnegie Mellon University and IIT Kharagpur, where I focused on advancing multimodal and large-scale AI systems. Currently, I work as a Machine Learning Engineer at Recursive Softpro, designing production-grade multimodal RAG systems and multi-agent LLM workflows, bridging research and deployment. My long-term goal is to contribute to next-generation multimodal foundation models, particularly for education and scientific reasoning, to make high-quality learning accessible at scale.

Basic Information

Age:

Pronouns:

He/Him

Location:

India

Languages Known:

English, Hindi

Email:

amnour.rajsubham@gmail.com

ORCID:

orcid.org/0009-0008-1313-4063

LinkedIn:

linkedin.com/in/shubh8434

GitHub:

github.com/Shubh8434

Google Scholar:

scholar.google.com

Work Experience

Jan 2025 – Present

Machine Learning Engineer I

Recursive Softpro (Bengaluru)

AI-Driven RAG and Multi-Agent Systems

Building production-grade AI assistants and retrieval systems that blend multimodal search, orchestration, and observability.

Designed a multimodal RAG pipeline using CLIP embeddings and ChromaDB to retrieve text, images, audio, and video frames, pairing them with GPT-4o for context-aware answers and tuned latency/accuracy trade-offs.
Engineered a Swarm-driven multi-agent router with OpenAI function calling, dynamic tool registry, Pinecone-backed knowledge bases, and Django/DRF persistence plus analytics for chat health.
Built an AI workflow marketplace with a unified LLM layer (OpenAI/Anthropic/Llama), multi-tenant API management, and LangGraph agent templates with evaluation harnesses.

Domain: GenAI, Multi-Agent Systems, Retrieval-Augmented Generation, Backend Engineering

July 2024 – Dec 2024

Language Research Analyst

Indian Institute of Technology-Kharagpur

Advisor: Prof. Somak Aditya

LLM-Based Educational Video Question Answering System

Enhancing student learning with LLMs that understand long educational videos end-to-end.

Built a data pipeline combining manual annotation and Gemini 1.5 Flash synthesis to create 4,300+ high-quality Q&A pairs across lecture domains.
Benchmarked multimodal LLMs (Video-LLaMA, mPLUG-Owl, etc.) across zero/few-shot and fine-tuned settings, evaluating BLEU, ROUGE, METEOR, and qualitative student feedback.
Co-authored the EMNLP 2025 paper on educational multimodal video question answering.

Domain: Multimodal AI, Natural Language Processing, Education Technology

Research Experience

Sept 2023 – Jul 2024

Undergraduate Research

Carnegie Mellon University (CMU)

Advisor: Prof. Min Xu

3D Cryo-ET Classification Using Self-Supervised Learning

Self-supervised pipelines for robust 3D Cryo-ET analysis in open-set environments.

Integrated Momentum Contrast with ViVIT/ViT (inflated weights) for 3D Cryo-ET classification and improved representation quality.
Designed a 3D processing workflow that boosted performance and training efficiency, achieving 70.28 F1 in open-set settings.
Contributed experiments and writing for an ICCV 2025 manuscript and a book chapter on feature semantic segmentation.

Domain: Computer Vision, Self-Supervised Learning, Computational Biology

May 2023 – Sept 2023

Undergraduate Research

Indian Institute of Technology-Patna (ACM IKDD Uplink Research Internship 2023)

Advisor: Prof. Sriparna Saha

IndicBART: Multimodal Summarization in Diverse Indian Languages

The main focus of this project is the development of IndicBART, a multimodal summarization framework that integrates text and image data to produce regionally contextualized summaries in Indian languages. The process involves the following steps:

Collected a multimodal dataset (Hindi, Tamil, Bengali, Marathi) with paragraphs, summaries, and images from Large Scale Multi-Lingual Multi-Modal Summarization Dataset (M3LS).
Extended the BART model with a visual-aware encoder to generate regionally contextualized summaries using both text and image data.
Achieved a ROUGE-1 score of 0.266 on the Hindi dataset and 44.1% image precision using an image pointer for multi-output prediction.
Work accepted at ICDAR 2024 based on multimodal summarization research with Indian regional languages.

Domain: Natural Language Processing, Multimodal AI, Summarization, Machine Learning

Jan 2023 – May 2023

Undergraduate Research

Jadavpur University

Advisor: Prof. Ram Sarkar

Pneumonia Detection Using Meta-learner With Deep Feature Extractors

This project aims to develop an effective pneumonia detection system by combining advanced image analysis techniques, achieving high accuracy in classification. The workflow encompasses the following steps:

Compiled and preprocessed a Mendeley dataset with three classes (Covid-19, Pneumonia, Normal), applying data augmentation techniques to address overfitting.
Extracted image features using Vision Transformer (ViT) and ResNet50 models.
Developed an ensemble model by combining features from both ViT and ResNet50, and used XGBoost for classification, achieving 96.19% accuracy.
Contributed to writing the results section of the paper, which was accepted at CPAMCS-2023.

Domain: Medical Imaging, Computer Vision, Machine Learning, Image Classification

Aug 2022 – Jan 2023

Undergraduate Research

Indian Institute of Technology-Patna

Advisor: Prof. Sriparna Saha

Multimodal Aspect Based Complaint Detection

This project develops a multimodal framework that combines text and image data to detect complaints in customer reviews. The process involves the following steps:

Collected a multimodal dataset of product images, customer reviews, and aspects from Amazon website.
Utilized BERT for text data, and VGG16/ResNet models for visual embedding.
Developed a multimodal interaction model to learn the relationship between text, image, and product aspects.
Identified complaint eligibility by analyzing multimodal relationships, with the paper published in ECIR 2023.

Domain: Natural Language Processing, Multimodal AI, Sentiment Analysis, Image Processing

Jan 2022 – Aug 2022

Undergraduate Research

National University of Singapore

Advisor: Prof. Luo Wei

Vessel Collision and Trajectory Prediction

Predicting vessel paths and collision risk from noisy AIS streams.

Cleaned AIS data (SOG, COG, MMSI, etc.) and clustered vessel patterns with DBSCAN/K-Means to remove outliers.
Built sequence-to-sequence baselines (RNN, LSTM, GRU variants) for trajectory forecasting and risk assessment.

Domain: Time Series Modeling, Machine Learning, Maritime Analytics

Oct 2021 – Jan 2022

Undergraduate Research

Indian Institute of Technology-BHU

Advisor: Prof. Rajesh Kumar

Concrete Mix Design with Machine Learning

Optimizing concrete strength and durability through data-driven mix modeling.

Experimented with sand fines, cement, and plasticizer ratios, comparing regression models (MLR, SVR, Decision Trees, ANN) for strength prediction.
Achieved top accuracy for 7-day and 28-day compressive strength using Decision Tree Regression.

Domain: Applied Machine Learning, Regression Modeling, Materials Engineering

Education

2020 - 2024

Bachelor of Engineering (Hons.)

Panjab University, Chandigarh

Department of Computer Science & Engineering

CGPA: 9.23 / 10

Ranked 1st in the department out of 65 students and was the batch topper for the 2020-24 cohort.

Publications

1. S. Ray^1*, Shubham Sharma^1*, S. Aditya, P. Goyal, “EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos”, EMNLP 2025 (Core A*) [Paper] [Code]

2. A. Singh^1*, V. Gangwar^1*, Shubham Sharma², S. Saha, “Knowing What and How: A Multi-modal Aspect-Based Framework for Complaint Detection”, ECIR 2023 (Core A) [Paper] [Code]

3. Shubham Sharma^1*, S. Mukherjee^1*, D. Kaplun², R. Sarkar, “Pneumonia Detection in Chest X-Rays using XGBoost based Meta-learner with Deep Feature Extractors”, CPAMCS 2023, Springer, Cham [Paper] [Code]

4. D. Prakash^1*, Raghvendra K^1*, Shubham Sharma², S. Saha, “IndicBART alongside Visual Element: Multimodal Summarization in Diverse Indian Languages”, ICDAR 2024 (Core A) [Paper] [Code]

5. Shubham Sharma¹, G. K. Walia^2*, K. Singh^2*, V. Batra³, A. K. Sekhon, A. Kumar, K. Rawal, D. Ghai, “Forecasting of Crop Yield Using Various Machine Learning Approaches: A Comparison”, Rural Sustainability Research, Vol. 52 (347), 2024 [Paper] [Code]

6. S. Bamber¹, A. Katkuri^2*, Shubham Sharma^2*, M. Angurala, “A Hybrid CNN-LSTM Approach for Intelligent Cyber Intrusion Detection System”, Computers & Security (Elsevier, IF: 5.2, Q1), 2024 [Paper] [Code]

Portfolio

Technical Skills:

Languages: Python, C/C++, SQL, HTML, CSS, JavaScript, MATLAB, LaTeX
Frameworks & Libraries: PyTorch, TensorFlow, Hugging Face, LangChain, LangGraph, FastAPI, Django, Keras, Streamlit, OpenCV, scikit-learn, NumPy, Pandas, Matplotlib, Seaborn, SQLAlchemy
Environment/Tools: Docker, Git/GitHub, Linux, CUDA, Anaconda, Google Colab, Jupyter, Tableau, MLflow, Weights & Biases
Databases & Vector Stores: PostgreSQL, ChromaDB, Pinecone, FAISS, Weaviate

Coursework:

Machine Learning
Deep Learning
Database Management
Data Structures
Analysis & Design of Algorithms

Operating System
OOPS
Probability & Statistics
Linear Algebra
Natural Language Processing

Calculus
Digital Image Processing
Artificial Intelligence
Computer Graphics
Principles of Programming Languages

Recipient of Medal & Prizes for consistently securing the highest CGPA in the department since the first year.
Selected for the prestigious ACM IKDD Uplink Research Internship 2023 (Acceptance Rate: 2%).
Accepted to attend IIIT Hyderabad’s CVIT and Amazon Machine Learning Summer School in 2023.
Achieved State Rank 1143 out of 1 million students in the Hindustan Olympiad.
Received a scholarship from the Macquarie EdX Group, out of 35,000 applicants.
Advanced to the Pre-Elimination Round in the Codechef Smackdown Coding Competition 2021.

Jan 2022 - Dec 2023

Google Developer Student Club

Machine Learning Lead

Led a 20+ member team to run 10+ machine learning workshops and events, benefiting 500+ students.
Mentored 10+ technical projects, helping students apply ML models to real-world problems.

Feb 2023 - Jan 2024

National Service Scheme (NSS)

Volunteer

Coordinated community service projects with 100+ students to address local societal needs.
Organized health, education, and environmental awareness campaigns reaching 200+ residents.