Abhinav Shukla

I am an Applied Scientist at Amazon Robotics in the Vulcan Robotic Stow team.

I broadly work on applying multimodal machine learning to the physical world. This includes making fundamental advances in areas like spatiotemporal representation learning, multimodal robot perception, neural compression, and efficient long-context multimodal inference.

I was a Researcher in Multimodal Machine Learning at General Robotics from 2023 to 2025 and a Research Engineer at Meta from 2022 to 2023. Before that, I was a PhD student at iBUG (Intelligent Behaviour Understanding Group) at Imperial College London where I was supervised by Prof. Maja Pantic and worked on self-supervised audiovisual representation learning and affective computing.

I completed my Bachelors (Honours) and Masters by Research in Computer Science from IIIT Hyderabad in 2017 and 2018 respectively.

CV / Google Scholar / Twitter / LinkedIn / Github

News

[Jun '25] Started working as an Applied Scientist at Amazon Robotics in the Vulcan Robotic Stow team.

[May '24] Recognized as an Outstanding Reviewer at CVPR 2024.

[Sep '23] Started working as a Researcher in Multimodal Machine Learning at Scaled Foundations.

[May '23] Recognized as an Outstanding Reviewer at CVPR 2023.

[Mar '23] Paper accepted at CVPR 2023.

[Mar '22] Started working as a Research Engineer in the Reality Labs Research organization at Meta.

[Mar '21] Paper accepted in IEEE Transactions on Affective Computing.

[Sep '20] Started working with Anurag Kumar at an internship in the Audio team at Facebook Reality Labs (FRL) Research.

[Jul '20] Presented my work on audiovisual self-supervised learning of speech representations at ICML 2020.

Publications

I have worked on a variety of problems in multimodal machine learning (audio, video, text, EEG, eye tracking), with applications in representation learning (for tasks like egocentric and audiovisual scene understanding, speech recognition), computer vision, multimedia and affective computing.

For an updated and complete list of publications, see Google Scholar

MatMamba: A Matryoshka State Space Model
Abhinav Shukla, Sai Vemprala, Aditya Kusupati, Ashish Kapoor
arXiv, 2024
[pdf] [code] [checkpoints]

GRID: A Platform for General Robot Intelligence Development
Sai Vemprala, Shuhang Chen, Abhinav Shukla, Dinesh Narayanan, Ashish Kapoor
arXiv, 2023
[website] [pdf] [bib]

Egocentric Auditory Attention Localization in Conversations
Fiona Ryan, Hao Jiang, Abhinav Shukla, James Matthew Rehg, Vamsi Krishna Ithapu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[demo] [pdf] [bib]

Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
Abhinav Shukla, Stavros Petridis, Maja Pantic
IEEE Transactions on Affective Computing, 2021
[pdf] [bib]

Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision
Abhinav Shukla, Stavros Petridis, Maja Pantic
ICML Workshop - Self-Supervision in Audio and Speech, 2020
[pdf] [bib]

Visual Self-Supervision by Facial Reconstruction for Speech Representation Learning
Abhinav Shukla, Stavros Petridis, Maja Pantic
CVPR Workshop - Sight and Sound, 2020
[pdf] [bib]

Visually Guided Self Supervised Learning of Speech Representations
Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020, (Oral)
[pdf] [bib]

Learning Self-Supervised Multimodal Representations of Human Behaviour
Abhinav Shukla
Doctoral Symposium at ACM International Conference on Multimedia (ACM MM), 2020
[pdf] [bib]

Recognition of Advertisement Emotions with Application to Computational Advertising
Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Mohan Kankanhalli, Ramanathan Subramanian
IEEE Transactions on Affective Computing, 2020
[pdf] [bib]

Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements
Abhinav Shukla, Harish Katti, Mohan Kankanhalli, Ramanathan Subramanian
ACM International Conference on Multimodal Interaction (ICMI), 2018, (Oral, 15.4% acceptance rate)
[pdf] [bib]

Evaluating Content-Centric vs. User-Centric Ad Affect Recognition
Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, Ramanathan Subramanian
ACM International Conference on Multimodal Interaction (ICMI), 2017
[pdf] [bib]

Affect Recognition in Ads with Application to Computational Advertising
Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, Ramanathan Subramanian
ACM International Conference on Multimedia (ACM MM), 2017, (Oral, 7.5% acceptance rate)
[pdf] [bib]

Experience

Amazon Robotics
Applied Scientist
Seattle, WA · 2025 - present

Working in the Vulcan Robotic Stow team.

General Robotics
Researcher, Multimodal Machine Learning
Redmond, WA · 2023 - 2025

Worked on a little bit of everything at a pre-seed startup. Led MatMamba.

Meta
Research Engineer
Redmond, WA · 2022 - 2023

Worked in the Audio team at Reality Labs Research. Developed infrastructure for large scale training, benchmarking, and optimizing multimodal/audiovisual models for AR glasses. Conducted research on egocentric audiovisual machine learning and computer vision. Published at CVPR 2023.

Internships

Facebook Reality Labs (FRL) Research
Research Intern with Anurag Kumar
Redmond, WA (Remote) · Sep 2020 - May 2021

Worked on visually guided self-supervised learning of audio representations.

Imperial College London
Research Assistant with Prof. Maja Pantic
London, United Kingdom · Oct 2018 - March 2019

Worked as a research assistant in the iBUG group funded by the EU Horizon 2020 DE-ENIGMA project. Assisted in collecting data of autistic children interacting with a social robot. Performed research for learning speech representations for emotion recognition.

National University of Singapore
Research Intern with Prof. Mohan Kankanhalli
Singapore · Sep 2017 - May 2018

Worked on multimodal (audio, video, EEG, eye tracking) affect recognition from advertisement videos at the SeSaMe (Sensor Enhanced Social Media) Centre. Published in IEEE Transactions on Affective Computing and at ICMI 2018.

Google Summer of Code
Student Developer
Remote · Summer 2016 and Summer 2017

2017: Worked with Prof. Francis Steen from UCLA and Prof. Mark Turner from CWRU for the Red Hen Lab organization.
2016: Developed a system to extract burned-in subtitles from videos into caption files for the CCExtractor organization. Supervised by Carlos Fernandez (org admin and CEO of Subtix Inc).

Education

Imperial College London
PhD in Computer Science
London, United Kingdom · 2018 - 2022
Thesis title: "Learning Self-Supervised Representations of Audiovisual Human-Centric Data"

IIIT Hyderabad
BTech & MS by Research in Computer Science · 8.78/10.00
Hyderabad, India · 2013-2018
I finished my thesis, "Multimodal Emotion Recognition from Advertisements with Application to Computational Advertising" advised by Prof. Ramanathan Subramanian at the Center for Visual Information Technology.

Awards and Recognition

Samsung PhD Fellowship , 2019-2020

IIIT Hyderabad Fast-track Masters thesis (for high quality papers in reputed venues), 2018

IIIT Hyderabad Research award (for publishing as an undergraduate), 2018

ACM SIGCHI Gary Marsden Student Development Fund (to attend ICMI 2018), 2018

Google India Travel Grant (to attend ACM MM 2017), 2017

ACM ICMI 2017 Travel Grant (to attend ICMI 2017), 2017

Dean's Merit List Award for excellence in academics (6 consecutive semesters), 2014-2017

Academic Service

Conference Reviewing: CVPR, ICCV, WACV, ICASSP, FG, ICMI, ACII

Journal Reviewing: IEEE Transactions on Affective Computing (TAFFC), IJCV, TPAMI

Volunteer: ACII 2019, ICMI 2018, ICMI 2017

Last updated: June 2025. Template by Jon Barron, with some sections from Noveen Sachdeva's website.