This paper introduces a new video understanding dataset which can be utilised for the related problems of event recognition, localisation and description in video. Our dataset consists of dense, well structured event annotations in untrimmed video of tennis matches. We also include highly detailed commentary style descriptions, which are heavily dependent on both the occurrence as well as the sequence of particular events. We use general deep learning techniques to acquire some initial baseline results on our dataset, without the need for explicit domain-specific assumptions.
Dataset Available Soon