Abhinav Shukla

I am a Research Engineer at Meta. I work on efficient multimodal machine learning. My research interests include self-supervised learning, egocentric multimodal perception, on-device multimodal machine learning, and machine learning for augmented reality. I am currently particulary interested in exploring new ways to do multimodal (especially audiovisual) self-supervised learning that can leverage multimodal complementarity (which multimodal contrastive learning does in a very unsatisfying way).

Before starting at Meta, I was a PhD student at iBUG (Intelligent Behaviour Understanding Group) at Imperial College London where I was supervised by Prof. Maja Pantic and worked on self-supervised audiovisual representation learning and affective computing.

I completed my Bachelors (Honours) and Masters by Research in Computer Science from IIIT Hyderabad in 2017 and 2018 respectively.

CV  /  Google Scholar  /  Twitter  /  LinkedIn  /  Github

News
  • [Mar '22] Started working as a Research Engineer in the Reality Labs Research organization at Meta.
  • [Mar '21] Paper accepted in IEEE Transactions on Affective Computing
  • [Sep '20] Started working with Anurag Kumar at an internship in the Audio team at Facebook Reality Labs (FRL) Research.
  • [Jul '20] Presented my work on audiovisual self-supervised learning of speech representations at ICML 2020.
  • Publications

    I have worked on a variety of problems in multimodal machine learning (audio, video, text, EEG, eye tracking), with applications in representation learning (for tasks like audiovisual scene understanding, speech recognition), computer vision and affect-sensitive HCI.

    For an updated and complete list of publications, see Google Scholar

    Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
    Abhinav Shukla, Stavros Petridis, Maja Pantic
    IEEE Transactions on Affective Computing, 2021
    [pdf] [bib]
    Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision
    Abhinav Shukla, Stavros Petridis, Maja Pantic
    ICML Workshop - Self-Supervision in Audio and Speech, 2020
    [pdf] [bib]
    Visual Self-Supervision by Facial Reconstruction for Speech Representation Learning
    Abhinav Shukla, Stavros Petridis, Maja Pantic
    CVPR Workshop - Sight and Sound, 2020
    [pdf] [bib]
    Visually Guided Self Supervised Learning of Speech Representations
    Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
    International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020, (Oral)
    [pdf] [bib]
    Learning Self-Supervised Multimodal Representations of Human Behaviour
    Abhinav Shukla
    Doctoral Symposium at ACM International Conference on Multimedia (ACM MM), 2020
    [pdf] [bib]
    Recognition of Advertisement Emotions with Application to Computational Advertising
    Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Mohan Kankanhalli, Ramanathan Subramanian
    IEEE Transactions on Affective Computing, 2020
    [pdf] [bib]
    Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements
    Abhinav Shukla, Harish Katti, Mohan Kankanhalli, Ramanathan Subramanian
    ACM International Conference on Multimodal Interaction (ICMI), 2018, (Oral, 15.4% acceptance rate)
    [pdf] [bib]
    Evaluating Content-Centric vs. User-Centric Ad Affect Recognition
    Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, Ramanathan Subramanian
    ACM International Conference on Multimodal Interaction (ICMI), 2017
    [pdf] [bib]
    Affect Recognition in Ads with Application to Computational Advertising
    Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, Ramanathan Subramanian
    ACM International Conference on Multimedia (ACM MM), 2017, (Oral, 7.5% acceptance rate)
    [pdf] [bib]
    Experience
    Meta
    Research Engineer  
    Redmond, WA   ·   Mar 2022 - present

    Working on efficient multimodal machine learning in the Audio team at Reality Labs Research.

    Internships
    Facebook Reality Labs (FRL) Research
    Research Intern   with   Anurag Kumar
    Redmond, WA (Remote)   ·   Sep 2020 - May 2021

    Worked on visually guided self-supervised learning of audio representations.

    Imperial College London
    Research Assistant   with   Prof. Maja Pantic
    London, United Kingdom   ·   Oct 2018 - March 2019

    Worked as a research assistant in the iBUG group funded by the EU Horizon 2020 DE-ENIGMA project. Assisted in collecting data of autistic children interacting with a social robot. Performed research for learning speech representations for emotion recognition.

    National University of Singapore
    Research Intern   with   Prof. Mohan Kankanhalli
    Singapore   ·   Sep 2017 - May 2018

    Worked on multimodal (audio, video, EEG, eye tracking) affect recognition from advertisement videos at the SeSaMe (Sensor Enhanced Social Media) Centre. Published in IEEE Transactions on Affective Computing and at ICMI 2018.

    Google Summer of Code
    Student Developer  
    Remote   ·   Summer 2016 and Summer 2017

    2017: Worked with Prof. Francis Steen from UCLA and Prof. Mark Turner from CWRU for the Red Hen Lab organization.
    2016: Developed a system to extract burned-in subtitles from videos into caption files for the CCExtractor organization. Supervised by Carlos Fernandez (org admin and CEO of Subtix Inc).

    Education
    Imperial College London
    PhD in Computer Science  
    London, United Kingdom   ·   2018 - 2022
    Thesis title: "Learning Self-Supervised Representations of Audiovisual Human-Centric Data"
    IIIT Hyderabad
    BTech & MS by Research in Computer Science  ·   8.78/10.00
    Hyderabad, India   ·   2013-2018
    I finished my thesis, “Multimodal Emotion Recognition from Advertisements with Application to Computational Advertising” advised by Prof. Ramanathan Subramanian at the Center for Visual Information Technology.
    Awards and Recognition
  • Samsung PhD Fellowship , 2019-2020
  • IIIT Hyderabad Fast-track Masters thesis (for high quality papers in reputed venues), 2018
  • IIIT Hyderabad Research award (for publishing as an undergraduate), 2018
  • ACM SIGCHI Gary Marsden Student Development Fund (to attend ICMI 2018), 2018
  • Google India Travel Grant (to attend ACM MM 2017), 2017
  • ACM ICMI 2017 Travel Grant (to attend ICMI 2017), 2017
  • Dean’s Merit List Award for excellence in academics (6 consecutive semesters), 2014-2017
  • Academic Service
  • Journal Reviewing: IEEE Transactions on Affective Computing (TAFFC), IJCV, TPAMI
  • Conference Reviewing: CVPR, ICASSP, FG, ICMI, ACII
  • Volunteer: ACII 2019, ICMI 2018, ICMI 2017
  • Invited Talks
  • Learning Self-Supervised Multimodal Representations of Human Behavioural Data, Facebook Reality Labs Research, 2021
  • Learning Self-Supervised Multimodal Representations of Human Behavioural Data, Mitsubishi Electric Research Laboratories, 2021
  • Self-Supervised Representation Learning in Audiovisual Speech, University of Nottingham, 2019
  • Automatic Understanding of News Videos & CCExtractor, Universität Osnabrück, 2017
  • Personal
  • I enjoy kicking balls! I play soccer and have recently been working on becoming a better American Football kicker. My 2022 target is to kick a 50-yard field goal (at the time of writing, I can make one from around 42 yards).
  • I like keeping up with sports analytics (especially in soccer and the NFL). I have a lot of ideas but no time to work on them. If you would like to work on one of these (e.g. for a Big Data Bowl submission or a side-project on sports analytics), please get in touch!
  • If you are a IIIT Hyderabad student/alum who is looking for any advice (e.g. on applying to PhD programs, life/work in the USA/Europe, interviewing for research roles), please feel free to reach out to me on Messenger or Twitter.

  • Last updated: June 2022. Template by Jon Barron, with some sections from Noveen Sachdeva's website.