Abhinav Shukla

I am a Researcher in Multimodal Machine Learning at Scaled Foundations. My interests include self-supervised learning, embodied multimodal perception, large-scale multimodal (especially audiovisual) representation learning, and efficient multimodal inference.

I was a Research Engineer at Meta from 2022 to 2023. Before that, I was a PhD student at iBUG (Intelligent Behaviour Understanding Group) at Imperial College London where I was supervised by Prof. Maja Pantic and worked on self-supervised audiovisual representation learning and affective computing.

I completed my Bachelors (Honours) and Masters by Research in Computer Science from IIIT Hyderabad in 2017 and 2018 respectively.

CV  /  Google Scholar  /  Twitter  /  LinkedIn  /  Github

News
  • [May '24] Recognized as an Outstanding Reviewer at CVPR 2024.
  • [Sep '23] Started working as a Researcher in Multimodal Machine Learning at Scaled Foundations.
  • [May '23] Recognized as an Outstanding Reviewer at CVPR 2023.
  • [Mar '23] Paper accepted at CVPR 2023.
  • [Mar '22] Started working as a Research Engineer in the Reality Labs Research organization at Meta.
  • [Mar '21] Paper accepted in IEEE Transactions on Affective Computing.
  • [Sep '20] Started working with Anurag Kumar at an internship in the Audio team at Facebook Reality Labs (FRL) Research.
  • [Jul '20] Presented my work on audiovisual self-supervised learning of speech representations at ICML 2020.
  • Publications

    I have worked on a variety of problems in multimodal machine learning (audio, video, text, EEG, eye tracking), with applications in representation learning (for tasks like egocentric and audiovisual scene understanding, speech recognition), computer vision, multimedia and affective computing.

    For an updated and complete list of publications, see Google Scholar

    GRID: A Platform for General Robot Intelligence Development
    Sai Vemprala, Shuhang Chen, Abhinav Shukla, Dinesh Narayanan, Ashish Kapoor
    arXiv, 2023
    [website] [pdf] [bib]
    Egocentric Auditory Attention Localization in Conversations
    Fiona Ryan, Hao Jiang, Abhinav Shukla, James Matthew Rehg, Vamsi Krishna Ithapu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    [demo] [pdf] [bib]
    Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
    Abhinav Shukla, Stavros Petridis, Maja Pantic
    IEEE Transactions on Affective Computing, 2021
    [pdf] [bib]
    Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision
    Abhinav Shukla, Stavros Petridis, Maja Pantic
    ICML Workshop - Self-Supervision in Audio and Speech, 2020
    [pdf] [bib]
    Visual Self-Supervision by Facial Reconstruction for Speech Representation Learning
    Abhinav Shukla, Stavros Petridis, Maja Pantic
    CVPR Workshop - Sight and Sound, 2020
    [pdf] [bib]
    Visually Guided Self Supervised Learning of Speech Representations
    Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
    International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020, (Oral)
    [pdf] [bib]
    Learning Self-Supervised Multimodal Representations of Human Behaviour
    Abhinav Shukla
    Doctoral Symposium at ACM International Conference on Multimedia (ACM MM), 2020
    [pdf] [bib]
    Recognition of Advertisement Emotions with Application to Computational Advertising
    Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Mohan Kankanhalli, Ramanathan Subramanian
    IEEE Transactions on Affective Computing, 2020
    [pdf] [bib]
    Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements
    Abhinav Shukla, Harish Katti, Mohan Kankanhalli, Ramanathan Subramanian
    ACM International Conference on Multimodal Interaction (ICMI), 2018, (Oral, 15.4% acceptance rate)
    [pdf] [bib]
    Evaluating Content-Centric vs. User-Centric Ad Affect Recognition
    Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, Ramanathan Subramanian
    ACM International Conference on Multimodal Interaction (ICMI), 2017
    [pdf] [bib]
    Affect Recognition in Ads with Application to Computational Advertising
    Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, Ramanathan Subramanian
    ACM International Conference on Multimedia (ACM MM), 2017, (Oral, 7.5% acceptance rate)
    [pdf] [bib]
    Experience
    Scaled Foundations
    Researcher, Multimodal Machine Learning  
    Kirkland, WA   ·   2023 - present

    Working on large-scale multimodal representation learning.

    Meta
    Research Engineer  
    Redmond, WA   ·   2022 - 2023

    Worked in the Audio team at Reality Labs Research. Developed infrastructure for large scale training, benchmarking, and optimizing multimodal/audiovisual models for AR glasses. Conducted research on egocentric audiovisual machine learning and computer vision. Published at CVPR 2023.

    Internships
    Facebook Reality Labs (FRL) Research
    Research Intern   with   Anurag Kumar
    Redmond, WA (Remote)   ·   Sep 2020 - May 2021

    Worked on visually guided self-supervised learning of audio representations.

    Imperial College London
    Research Assistant   with   Prof. Maja Pantic
    London, United Kingdom   ·   Oct 2018 - March 2019

    Worked as a research assistant in the iBUG group funded by the EU Horizon 2020 DE-ENIGMA project. Assisted in collecting data of autistic children interacting with a social robot. Performed research for learning speech representations for emotion recognition.

    National University of Singapore
    Research Intern   with   Prof. Mohan Kankanhalli
    Singapore   ·   Sep 2017 - May 2018

    Worked on multimodal (audio, video, EEG, eye tracking) affect recognition from advertisement videos at the SeSaMe (Sensor Enhanced Social Media) Centre. Published in IEEE Transactions on Affective Computing and at ICMI 2018.

    Google Summer of Code
    Student Developer  
    Remote   ·   Summer 2016 and Summer 2017

    2017: Worked with Prof. Francis Steen from UCLA and Prof. Mark Turner from CWRU for the Red Hen Lab organization.
    2016: Developed a system to extract burned-in subtitles from videos into caption files for the CCExtractor organization. Supervised by Carlos Fernandez (org admin and CEO of Subtix Inc).

    Education
    Imperial College London
    PhD in Computer Science  
    London, United Kingdom   ·   2018 - 2022
    Thesis title: "Learning Self-Supervised Representations of Audiovisual Human-Centric Data"
    IIIT Hyderabad
    BTech & MS by Research in Computer Science  ·   8.78/10.00
    Hyderabad, India   ·   2013-2018
    I finished my thesis, “Multimodal Emotion Recognition from Advertisements with Application to Computational Advertising” advised by Prof. Ramanathan Subramanian at the Center for Visual Information Technology.
    Awards and Recognition
  • Samsung PhD Fellowship , 2019-2020
  • IIIT Hyderabad Fast-track Masters thesis (for high quality papers in reputed venues), 2018
  • IIIT Hyderabad Research award (for publishing as an undergraduate), 2018
  • ACM SIGCHI Gary Marsden Student Development Fund (to attend ICMI 2018), 2018
  • Google India Travel Grant (to attend ACM MM 2017), 2017
  • ACM ICMI 2017 Travel Grant (to attend ICMI 2017), 2017
  • Dean’s Merit List Award for excellence in academics (6 consecutive semesters), 2014-2017
  • Academic Service
  • Conference Reviewing: CVPR, ICCV, WACV, ICASSP, FG, ICMI, ACII
  • Journal Reviewing: IEEE Transactions on Affective Computing (TAFFC), IJCV, TPAMI
  • Volunteer: ACII 2019, ICMI 2018, ICMI 2017
  • Invited Talks
  • Self-Supervised Audiovisual Representation Learning, Apple Video Engineering, 2023
  • Learning Self-Supervised Multimodal Representations of Human Behavioural Data, Facebook Reality Labs Research, 2021
  • Learning Self-Supervised Multimodal Representations of Human Behavioural Data, Mitsubishi Electric Research Laboratories, 2021
  • Self-Supervised Representation Learning in Audiovisual Speech, University of Nottingham, 2019
  • Automatic Understanding of News Videos & CCExtractor, Universität Osnabrück, 2017
  • Personal
  • I enjoy kicking balls! I play soccer and have recently been working on becoming a better American Football kicker. My 2024 target is to kick a 50-yard field goal (at the time of writing, I can make one from around 44 yards).
  • If you are a IIIT Hyderabad student/alum who is looking for any advice (e.g. on applying to PhD programs, life/work in the USA/Europe, interviewing for research roles), please feel free to reach out to me.

  • Last updated: May 2024. Template by Jon Barron, with some sections from Noveen Sachdeva's website.