Research Engineer, Institute of High Performance Computing (IHPC)

Agency for Science, Technology and Research (A*STAR), Singapore


ZHANG Hao is a research engineer at Institute of High Performance Computing (IHPC), A*STAR, Singapore. His research interests include natural language processing, visual grounding, reinfor-cement learning for robotics and machine learning methods.
Meanwhile, ZHANG Hao is pursuing his Ph.D. in computer science at Nanyang Technological University (NTU) since August 2019, and his supervisor is Associate Prof. Aixin SUN (NTU) and cosupervisor is Dr. Joey Tianyi ZHOU (A*STAR).

Email: hzhang26 AT outlook DOT com


  • Vision-and-Language
  • Natural Language Processing
  • Computational Linguistics
  • Artificial Intelligence
  • Machine & Deep Learning


  • Ph.D. in Computer Science, 2019-Present

    Nanyang Technological University (NTU)

  • M.Sc. in Communications Engineering, 2015-2016

    Nanyang Technological University (NTU)

  • B.Eng. in Communications Engineering, 2011-2015

    Dalian University of Technology (DUT)



Research Engineer

Institute of High Performance Computing (IHPC), A*STAR

Jul 2016 – Present Singapore
  • Social & Cognitive Computing (SCC) Department (07/2016-05/2018)
  • Human-Centric AI Group at Artificial Intelligence Initiative (A*AI) (06/2018-07/2020)
  • Computational AI Group at Computing & Intelligence (CI) Department (08/2020-Present)


Quickly discover relevant content by filtering publications.

COSY: COunterfactual SYntax for Cross-Lingual Understanding

Pre-trained multilingual language models, e.g., multilingual-BERT, are widely used in cross-lingual tasks, yielding the …

Parallel Attention Network with Sequence Matching for Video Grounding

Given a video, video grounding aims to retrieve a temporal moment that semantically corresponds to a language query. In this work, we …

Video Corpus Moment Retrieval with Contrastive Learning

Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval (VCMR) is to retrieve a temporal moment (i.e., a …

Interventional Video Grounding with Dual Contrastive Learning

Video grounding aims to localize a moment from an untrimmed video for a given textual query. Existing approaches focus more on the …

Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

Natural Language Video Localization (NLVL) aims to locate a target moment from an untrimmed video that semantically corresponds to a …

GDPNet: Refining Latent Multi-View Graph for Relation Extraction

Relation Extraction (RE) is to predict the relation type of two entities that are mentioned in a piece of text, e.g., a sentence or a …

Deep N-ary Error Correcting Output Codes

Ensemble learning consistently improves the performance of multi-class classification through aggregating a series of base classifiers. …

Span-based Localizing Network for Natural Language Video Localization

Given an untrimmed video and a text query, natural language video localization (NLVL) is to locate a matching span from the video that …

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

Multiple-choice question answering (MCQA) is one of the most challenging tasks in machine reading comprehension since it requires more …

RoboCoDraw: Robotic Avatar Drawing with GAN-based Style Transfer and Time-efficient Path Optimization

Robotic drawing has become increasingly popular as an entertainment and interactive tool. In this paper we present RoboCoDraw, a …

Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning

Learning-based methods have been used to program robotic tasks in recent years. However, extensive training is usually required not …

Dual Adversarial Transfer for Sequence Labeling

We propose a new architecture for addressing sequence labeling, termed Dual Adversarial Transfer Network (DATNet). Specifically, the …

Dual Adversarial Neural Transfer for Low-resource Named Entity Recognition

We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity …

RoSeq: Robust Sequence Labeling

In this paper, we mainly investigate two issues for sequence labeling, namely label imbalance and noisy data which are commonly seen in …

Learning With Annotation of Various Degrees

In this paper, we study a new problem in the scenario of sequences labeling. To be exact, we consider that the training data are with …

Removing Backscatter to Enhance the Visibility of Underwater Object

Underwater vision enhancement via backscatter removing is widely used in ocean engineering. With increasing ocean exploration, …


Human-Robot Collaborative Al for Advanced Manufacturing and Engineering (AME)

Work Package 3 - Human-like Concept and Task Learning.

PrimeNet: Human-inspired Framework for Commonsense Knowledge Representation and Reasoning

A human-inspired framework for commonsense knowledge representation and reasoning.

MARACANA: Behavioural Understanding and Narrative Descriptions from Videos

An effectively and flexibly multimodal system to analyze and narrate real world events captured in video.

Removing Backscatter to Enhance the Visibility of Underwater Object

A novel and effective method based on underwater optical model for underwater object visibility enhancement.


Best Paper Award for the paper: Deep N-ary Error Correcting Output Codes

See certificate

1st Runner Up in Artificial Intelligence (AI) Hackathon

See certificate