Hang-Time HAR: A Benchmark Dataset for Basketball Activity Recognition using Wrist-worn Inertial Sensors

Alexander Hoelzemann¹ Julia Lee Romero² Marius Bock¹ Kristof Van Laerhoven¹ Qin Lv²
University of Siegen¹ and University of Colorado Boulder²

Abstract

We present a benchmark dataset for evaluating physical human activity recognition methods from wrist-worn sensors, for the specific setting of basketball training, drills, and games. Basketball activities lend themselves well for measurement by wrist-worn inertial sensors, and systems that are able to detect such sport-relevant activities could be used in applications toward game analysis, guided training, and personal physical activity tracking. The dataset was recorded for two teams from separate countries (USA and Germany) with a total of 24 players who wore an inertial sensor on their wrist, during both repetitive basketball training sessions and full games. Particular features of this dataset include an inherent variance through cultural differences in game rules and styles as the data was recorded in two countries, as well as different sport skill levels, since the participants were heterogeneous in terms of prior basketball experience. We illustrate the dataset's features in several time-series analyses and report on a baseline classification performance study with two state-of-the-art deep learning architectures.


Results at a glance

The Dataset

The study protocol is divided into two parts. The first part is designed to collect controlled data by having participants complete a sequence of predefined activities for a defined period of time, while this first part is controlled, it also simulates real-world basketball drills in practice sessions where players repeatedly practice a certain activity (e.g., layups, shooting, dribbling, running). The second part is a basketball game between two teams each with five players per team on the court, and extra players rotated into the game. Video cameras were set up along the sidelines of the court in order to record each participant’s activities for the labeling process.

Study design

Our study design used 24 subjects with 13 subjects living in Germany and 11 subjects living in the United States of America. In each study, the players simultaneously performed the drills and game while the entire basketball court was monitored using two wide-angle cameras. After the study, the camera footage was used for detailed annotation of all activity-relevant data.

Class samples

Hang-Time HAR study protocol as executed at both locations. The German recording is ~110 min and the American recording ~76 min long.

Participants Meta Information

Meta Information

Meta information as given through the study questionnaire by all participants, 13 from Germany, Europe (eu) and 11 from USA, North America (na). A total of 3 participants were female and 21 were male. The players were between 18 and 39 years old. Through self-assessment, in which participants were asked to evaluate their experience in basketball, 8 players responded with novice and 16 with expert. Two people were left-handed. Additional about the anthropomorphy of our participants are excluded due to restrictions given by the Ethical Council of our university.

Dataset Characteristics

  • Number of Participants: 24
  • Sampling Rate: 50 Hz, ±8g
  • Sensors: Wrist-worn 3D Accelerometer (Bangle.js 1 Smart Watch)
  • Challenges: Imbalanced dataset that contains periodic, spontaneous and complex classes from a controlled and uncontrolled recording environment.

Preprocessing: We decided to keep the preprocessing on the raw data from the smartwatches to a minimum, as these were already provided with a timestamp and in the g unit. The smartwatch’s accelerometer samples’ timestamps contained slight (<2%) deviations, so we adjusted the time-series by resampling to ensure that all data maintains exact 50 Hz equidistant timestamps. Other common methods of preprocessing inertial data for activity recognition, such as rescaling or normalization, were not applied.

Usage

The dataset is saved in CSV format, with each player having an individual file. It can be easily loaded using the read_csv from the Pandas library, which is commonly used for data manipulation and analysis in Python. Once the dataset is loaded, the labels are stored in four different columns: coarse, locomotion, basketball, and in/out. The coarse column separates the samples into different sessions, including warm-up, drills (sitting, standing, walking, running, dribbling, penalty shots, two-point shots, and three-point shots), game, and in/out. The "game" label indicates when a game was played. The German study comprises two game sessions, each lasting approximately 10 minutes, while the study conducted in the USA consists of one session lasting approximately 22 minutes. The basketball and locomotion tiers contain labels corresponding to different classes mentioned in the table below, as well as the "not_labeled" label. The "not_labeled" label is assigned when the specific activity of a player couldn't be observed in the ground truth video or between sessions. The In/Out tier is only relevant during the game session and indicates whether a player is on the court or not.

Combining Classes: The layers provided in our dataset make it possible to extend it with additional and more challenging classes. For example, shots can be distinguished between penalty_shots, two_point_shots, and three_point_shots by taking into account the coarse layer. The locomotion layer holds the information if the activity dribbling was performed while the player was standing, walking, or running. Therefore, the class definitions in following table only contain the basic classes and can be extended individually, depending on the requirements of one’s project.

Classes

Meta Information

Detailed class description for every class included in the dataset. The dataset is multi-tier labeled with 4 different layers (I) coarse, (II) locomotion, (III) basketball, and (IV) in/out. The coarse layer is not listed, since it is meant to indicate to which session an activity belongs. Relevant classes are classes 2–13. However, the classes in and out were not used in our validation.

Class samples

Exemplar time-series data for the included activities. The examples shown for the periodic activities sitting , standing, walking, running, and dribbling contain 1200 samples (approx. 24 s). In order to better represent the complex activities shot and layup as well as the micro-activities pass and rebound. Jumps are marked in classes where the activity occurs. Such short periods were summarized in the activity jumping.

Class distribution

Class distribution of the Hang-Time HAR dataset. Total number of samples per class are: sitting : 383,622 (~2.1 h), standing: 368,189 (~2.0 h), walking: 1,885,644 (~10.5 h), running: 1,100,942 (~6.1 h), jumping: 96,857 (~0.53 h), dribbling: 878,514 (~4,8 h), shot: 149,040 (~0.82 h), layup: 62,393 (~0.34 h), pass: 86,291 (~0.47 h), and rebound: 18,886 (~0.10 h). In total: 5,030,378 labeled samples or ~27.7 h of data

Void Class: We originally included a void class for miscellaneous movements outside of the primary labeled ones, such as drinking from a water bottle or tying shoes. These were mostly performed during rest breaks. The samples annotated as void resulted in an irrelevant small class, which could not be recognized by our classifier because they are most often performed in conjunction with one of the locomotion classes. We ultimately decided against including this void class, since it was very rare that players were not performing one of the 10 classes of locomotion or basketball activities. However, the data that is not annotated as one of the aforementioned classes are categorized as not_labeled. This class can be seen as a very noisy but realistic void class that can be used by researchers whom focus on deeper insights in the NULL-class problem.

Results

During our experiments, we are investigating how well our network generalizes in two regards:
  • Subject-independent generalization: As with almost any activity, basketball players tend to have their own specific traits in performing each basketball-related activity. Within these test cases, we investigate how well our network generalizes across subjects by performing a LOSO cross-validation on the drill and warm-up data of all subjects. During each validation step, the activities of a previously unseen subject are predicted, and thus the experiments will determine how well our network generalizes across subjects and whether subject-independent patterns can be learned by our architecture.
  • Session-independent generalization: Data recorded during an actual basketball game can heavily differ from "artificial" data recorded during the drill and warm-up sessions, as subjects did not have to adhere to any (experimental) protocol. Thus, the session-independent test cases investigate how well our network predicts the same activities performed by already-seen subjects during an actual game. Within these experiments, we train our network using data recorded by all subjects during the drill and warm-up sessions and try to predict the game data of said subjects. These type of experiments will give a sense of how well our network is able to generalize specifically to real-world data and simulates the transition from a controlled to an uncontrolled environment. The network learns player-specific patterns from the warm-up and drill sessions and tries to classify the more dynamic game subset.
Deep Learning Overall results of the deep learning experiments using a shallow DeepConvLSTM (blue) and Attend-and-Discriminate architecture (orange). Both models were trained with a 1-layered recurrent part with 1024 hidden units and a sliding window of 1 second with 50% overlap. The left plot (a) shows the per-class LOSO results obtained from training on the drill and warm-up data. The right plot (b) shows the per-class results predicting the game data when trained on the drill and warm-up data. All results are averages across 3 runs using a set of 3 random seeds. Both architectures suffer a significant loss in predictive performance when being applied to in-game data, i.e., data recorded in an uncontrolled environment.

We refer to the paper for a comprehensive dataset description and more detailed results.

All important links can be found at the top of the page.

Acknowledgments

We would like to thank the basketball players from the teams TuS Fellinghausen from Kreuztal, Germany, and the University of Colorado Boulder students for participating in our study.

BibTeX

@article{hoelzemannHangtimeHARBenchmark2023,
	title = {Hang-time HAR: A benchmark dataset for basketball activity recognition using wrist-worn inertial sensors},
	volume = {23},
	url = {https://doi.org/10.3390/s23135879},
	number = {13},
	journal = {Sensors},
	author = {Hoelzemann, Alexander and Romero, Julia L. and Bock, Marius and Van Laerhoven, Kristof and Lv, Qin},
	year = {2023},
}