-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
Abhranil Chandra
abhranil14
AI & ML interests
Reinforcement Learning, Deep Unsupervised Learning, NLP and Bayesian Deep Learning
Recent Activity
updated
a model
6 days ago
abhranil14/Gemma2B_FF_on_qwen14B_wrong_2130_batch256_lr10e-6_warmup0.1_30_epoch_linear_lr
updated
a model
6 days ago
abhranil14/Qwen1.5B_FF_on_human_gold_7500_batch256_lr10e-6_warmup0.1_linear_lr
published
a model
6 days ago
abhranil14/Qwen1.5B_FF_on_human_gold_7500_batch256_lr10e-6_warmup0.1_linear_lr