Abhinav Shukla
I am a Researcher in Multimodal Machine Learning at Scaled Foundations.
My interests include self-supervised learning, embodied multimodal perception, large-scale multimodal (especially audiovisual) representation learning, and efficient multimodal inference.
I was a Research Engineer at Meta from 2022 to 2023. Before that, I was a PhD student at iBUG (Intelligent Behaviour Understanding Group) at Imperial College London
where I was supervised by Prof. Maja Pantic and worked on self-supervised audiovisual representation learning and affective computing.
I completed my Bachelors (Honours) and Masters by Research in Computer Science from IIIT Hyderabad in 2017 and 2018 respectively.
CV  / 
Google Scholar  / 
Twitter  / 
LinkedIn  / 
Github
|
|
[May '24] Recognized as an Outstanding Reviewer at CVPR 2024.
[Sep '23] Started working as a Researcher in Multimodal Machine Learning at Scaled Foundations.
[May '23] Recognized as an Outstanding Reviewer at CVPR 2023.
[Mar '23] Paper accepted at CVPR 2023.
[Mar '22] Started working as a Research Engineer in the Reality Labs Research organization at Meta.
[Mar '21] Paper accepted in IEEE Transactions on Affective Computing.
[Sep '20] Started working with Anurag Kumar at an internship in the Audio team at Facebook Reality Labs (FRL) Research.
[Jul '20] Presented my work on audiovisual self-supervised learning of speech representations at ICML 2020.
|
Publications
I have worked on a variety of problems in multimodal machine learning (audio, video, text, EEG, eye tracking), with applications in representation learning
(for tasks like egocentric and audiovisual scene understanding, speech recognition), computer vision, multimedia and affective computing.
For an updated and complete list of publications, see Google Scholar
|
|
|
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
Abhinav Shukla,
Stavros Petridis,
Maja Pantic
IEEE Transactions on Affective Computing, 2021
[pdf]
[bib]
|
|
Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision
Abhinav Shukla,
Stavros Petridis,
Maja Pantic
ICML Workshop - Self-Supervision in Audio and Speech, 2020
[pdf]
[bib]
|
|
Visual Self-Supervision by Facial Reconstruction for Speech Representation Learning
Abhinav Shukla,
Stavros Petridis,
Maja Pantic
CVPR Workshop - Sight and Sound, 2020
[pdf]
[bib]
|
|
|
Learning Self-Supervised Multimodal Representations of Human Behaviour
Abhinav Shukla
Doctoral Symposium at ACM International Conference on Multimedia (ACM MM), 2020
[pdf]
[bib]
|
|
|
Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements
Abhinav Shukla,
Harish Katti,
Mohan Kankanhalli,
Ramanathan Subramanian
ACM International Conference on Multimodal Interaction (ICMI), 2018, (Oral, 15.4% acceptance rate)
[pdf]
[bib]
|
|
|
|
|
Scaled Foundations
Researcher, Multimodal Machine Learning
Redmond, WA · 2023 - present
Working on large-scale multimodal representation learning.
|
|
|
Meta
Research Engineer
Redmond, WA · 2022 - 2023
Worked in the Audio team at Reality Labs Research.
Developed infrastructure for large scale training, benchmarking, and optimizing multimodal/audiovisual models for AR glasses.
Conducted research on egocentric audiovisual machine learning and computer vision. Published at CVPR 2023.
|
|
|
Facebook Reality Labs (FRL) Research
Research Intern with Anurag Kumar
Redmond, WA (Remote) · Sep 2020 - May 2021
Worked on visually guided self-supervised learning of audio representations.
|
|
Imperial College London
Research Assistant with Prof. Maja Pantic
London, United Kingdom · Oct 2018 - March 2019
Worked as a research assistant in the iBUG group funded by the EU Horizon 2020 DE-ENIGMA project.
Assisted in collecting data of autistic children interacting with a social robot. Performed research for learning speech representations for emotion recognition.
|
|
National University of Singapore
Research Intern with Prof. Mohan Kankanhalli
Singapore · Sep 2017 - May 2018
Worked on multimodal (audio, video, EEG, eye tracking) affect recognition from advertisement videos at the SeSaMe (Sensor Enhanced Social Media) Centre.
Published in IEEE Transactions on Affective Computing and at ICMI 2018.
|
|
Google Summer of Code
Student Developer
Remote · Summer 2016 and Summer 2017
2017: Worked with Prof. Francis Steen from UCLA and Prof. Mark Turner from CWRU for the Red Hen Lab organization.
2016: Developed a system to extract burned-in subtitles from videos into caption files for the CCExtractor organization.
Supervised by Carlos Fernandez (org admin and CEO of Subtix Inc).
|
|
|
Imperial College London
PhD in Computer Science
London, United Kingdom · 2018 - 2022
Thesis title: "Learning Self-Supervised Representations of Audiovisual Human-Centric Data"
|
|
Samsung PhD Fellowship , 2019-2020
IIIT Hyderabad Fast-track Masters thesis (for high quality papers in reputed venues), 2018
IIIT Hyderabad Research award (for publishing as an undergraduate), 2018
ACM SIGCHI Gary Marsden Student Development Fund (to attend ICMI 2018), 2018
Google India Travel Grant (to attend ACM MM 2017), 2017
ACM ICMI 2017 Travel Grant (to attend ICMI 2017), 2017
Dean’s Merit List Award for excellence in academics (6 consecutive semesters), 2014-2017
|
Conference Reviewing: CVPR, ICCV, WACV, ICASSP, FG, ICMI, ACII
Journal Reviewing: IEEE Transactions on Affective Computing (TAFFC), IJCV, TPAMI
Volunteer: ACII 2019, ICMI 2018, ICMI 2017
|
|