Tennis

A Tennis dataset and models for event detection & commentary generation. Discussed in:

“TenniSet: A Dataset for Dense Fine-Grained Event Recognition, Localisation and Description”


The Dataset

The tennis dataset consists of 5 matches and has manually annotated temporal events and commentary captions.

Type Attributes # Events # Frames Avg. Frames per Event
match winner 5 786,455 157,291
set winner, score 11 765,738 69,613
game winner, score, server 118 588,759 4,989
point winner, score 746 159,494 214
serve near/far, in/fault/let 1,017 68,385 67
hit near/far, left/right 2,551 73,564 29

Splits

Due to the limited size of the dataset, there are two varieties of train, validation and testing splits. The first (01) uses the the entire V010 as the validation and test while the second (02) splits across all videos evenly.

0102V006V007V008V009V010trainingvalidationtesting
Class # Events – S01 # Frames – S01    # Events – S02 # Frames – S02
train val test train val test train val test train val test
OTH 2,507 133 198 573,394 28,538 49,648 2,079 160 608 470,963 36,932 143,685
SFF 342 11 29 20,114 772 1,925 296 22 64 17,716 1,402 3,693
SFF 117 2 5 7,962 153 333 95 7 22 6,430 577 1,441
SFL 25 0 1 1,596 0 72 21 1 4 1,380 38 250
SNI 293 24 29 17,186 1,762 1,994 242 18 86 14,876 992 5,074
SNF 111 7 10 7,312 578 772 88 8 32 6,020 473 2,169
SNL 10 2 0 656 126 0 9 1 2 543 65 174
HFL 533 22 45 16,520 648 1,419 432 33 135 13,530 1,037 4020
HFR 576 39 41 16,858 1,096 1,150 474 37 145 13,878 1,037 4,189
HNL 602 29 39 16,196 811 1,076 514 37 119 13,879 1,036 3,168
HNR 546 31 48 15,605 882 1,303 448 33 144 12,686 920 4,184

Captions

There is one commentary style caption for each of the 746 points, as well as another 10817 captions not aligned to any imagery. Some examples are:

Point ID Caption
P00000001 high kick serve fp returns a ls return short rally fp cross-court rs lands out-side the court
P00000012 quick serve is an ace
P00000036 np serves down the t fp returns a ls return brief rally np fails to keep a cross-court ls in the play
P00000051 np hits a good serve fp struggles with it returning it long
P00000155 cannon serve down the t is an ace
P00000172 sharp angled slice serve np returns a rs return fp whips a rs cross-court winner

Both groups of captions are utilised to generate a word embedding for the 250 unique words in the vocabulary. The embedding is generated utilising a SkipGram model. Below the 100 dimensional word embedding is visualised post t-SNE.

Download

The main data can be downloaded from my Google Drive with the links below. The directory structure should be:

Tennis/
└── data/
    ├── annotations (9.5 MB)
    ├── features (13.2 GB)
    ├── flow (217 GB)
    ├── frames (217 GB)
    ├── splits (36.3 MB)
    └── videos (11.1 GB)
  • annotations stores .json files for each video generated by the annotator, as well as other annotation and commentary .txt files.
  • features stores .npy feature files for frames for each video in subdirectories.
  • flow stores .jpg image files for flow frames for each video in subdirectories.
  • frames stores .jpg image files for RGB frames for each video in subdirectories.
  • splits stores .txt files for each split (train, val, test).
  • videos stores the original video files as .mp4 files.

More information can be found on the Github.


The Models

Models can be downloaded from my Google Drive. More information can be found on the Github.

Event Detection

I experimented with a number of models to determine the framewise event class:

  • Framewise CNN – A DenseNet-121 model ran on individual frames (uses no temporal information)
  • Two-Stream Nets – Two DenseNet-121 CNNs, one for flow and one for RGB
  • R(2+1)D CNN – A R(2+1)D model with a temporal window of 8 frames
  • Temporal Pooling – Temporal max pooling on the original framewise model over a window of 15 frames
  • CNN-RNN – Applies a GRU RNN across the original framewise model

The table below shows the F1 scores per class on the test set for some of the different models:

Model Classwise F1 Score
OTH SFI SFF SFL SNI SNF SNL HFL HFR HNL HNR AVG
Framewise CNN 0006 97.0 57.9 17.7 13.0 62.9 21.6 0.0 74.8 76.3 77.5 78.0 52.4
Two-Stream Nets 0010 97.2 67.4 14.6 13.4 67.0 19.4 0.0 81.8 83.5 79.0 86.2 55.4
R(2+1)D 0031 90.8 24.4 6.4 1.7 37.4 3.9 0.0 39.6 44.9 43.7 41.8 30.4
Temporal Pooling 0028 97.5 62.0 19.6 14.1 65.6 21.6 0.0 77.1 78.9 81.0 80.3 54.3
CNN-RNN 0042 97.6 65.0 13.4 13.5 66.2 27.9 0.0 80.6 83.0 80.3 84.8 55.7

Captioning

The captioning model is that from Google.

BLEU@1 BLEU@2 BLEU@3 BLEU@4 METEOR ROUGE-L CIDEr
46.7 307 22.1 16.4 22.6 43.9 96.4

The table below shows some example generated captions on the test split, the underline marks errors.

Caption Ground Truths (G) and Predictions (P)
01 G
P
“high kick serve fp returns a ls return short rally fp cross-court rs lands out-side the court”
fp serves a good one np delivers a rs return fp sends a ls out of the court”
02 G
P
“good serve aimed at t np only reaches to it hitting the return long”
“fp arrows a good serve at t np is unable to return it”
03 G
P
“good serve in the middle np returns a quick ls return short rally np cross-court fails to clear the net in the middle
“good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court
04 G
P
“fp serves a high kick serve np returns a quick ls return brief rally np rs catches the net
“fine serve placed out wide np returns a ls return short rally fp strokes a rs cross-court winner
05 G
P
“quick serve np crafts a rs return fp goes for a ls down the line but catches the net
“fine serve np shoots a rs return winner
06 G
P
“double fault”
“double fault”
07 G
P
“good serve np generates a rs return fp then returns one into the net”
“good serve in the middle np returns a ls return fp cross-court rs catches the net”
08 G
P
“fp serves a high kick serve np delivers a high ls return fp produces a ls winner coming to net
“fp serves a good one np returns a quick rs return fp struggles to keep a cross-court rs in a rally
09 G
P
“quick serve np returns a quick rs return fp ls is unable to clear the net
“nice serve by fp np faces difficulty in returning it
10 G
P
“fp aims a high kick serve np returns a ls return fp hits a rs cross-court winner
“fp serves a good one np returns a quick rs return fp struggles to keep a cross-court rs in a rally
11 G
P
“good serve in the middle np crafts a ls return short rally np hits a rs cross-court but it fails to clear the net
“good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court
12 G
P
“fp hits a bodyline serve np has no answer to it”
“fp arrows a good serve at t np is unable to return it”
13 G
P
double fault
good serve aimed at t np only reaches to it
14 G
P
“double fault”
“double fault”
15 G
P
“fp aims a high kick serve np crafts a ls return good rally fp sends a rs cross-court out of the court”
“good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court”
16 G
P
“fp arrows a bodyline serve np struggles with it”
“fp arrows a good serve at t np is unable to return it”