XRF55: A Radio Frequency Dataset for Human Indoor Action Analysis

1 Xi'an Jiaotong University    2 Zhejiang University

ACM IMWUT/Ubicomp 2024

Paper Code Download dataset   Hardware Setup Tutorial   Q&A

First Image
Fourth Image

Abstract

Radio frequency (RF) devices such as Wi-Fi transceivers, radio frequency identification tags, and millimeter-wave radars have appeared in large numbers in daily lives. The presence and movement of humans can affect the propagation of RF signals, further, this phenomenon is exploited for human action recognition. Compared to camera solutions, RF approaches exhibit greater resilience to occlusions and lighting conditions, while also raising fewer privacy concerns in indoor scenarios. However, current works have many limitations, including the unavailability of datasets, insufficient training samples, and simple or limited action categories for specific applications, which seriously hinder the growth of RF solutions, presenting a significant obstacle in transitioning RF sensing research from the laboratory to a wide range of everyday life applications. To facilitate the transitioning, in this paper, we introduce and release a large-scale multiple radio frequency dataset, named XRF55, for indoor human action analysis. XRF55 encompasses 42.9K RF samples and 55 action classes of human-object interactions, human-human interactions, fitness, body motions, and human-computer instructions, collected from 39 subjects within 100 days. These actions were meticulously selected from 19 RF sensing papers and 16 video action recognition datasets. Each action is chosen to support various applications with high practical value, such as elderly fall detection, fatigue monitoring, domestic violence detection, etc. Moreover, XRF55 contains 23 RFID tags at 922.38MHz, 9 Wi-Fi links at 5.64GHz, one mmWave radar at 60-64GHz, and one Azure Kinect with RGB+D+IR sensors, covering frequency across decimeter wave, centimeter wave, and millimeter wave. In addition, we apply a mutual learning strategy over XRF55 for the task of action recognition. Unlike simple modality fusion, under mutual learning, three RF modalities are trained collaboratively and then work solely. We find these three RF modalities will promote each other. It is worth mentioning that, with synchronized Kinect, XRF55 also supports the exploration of action detection, action segmentation, pose estimation, human parsing, mesh reconstruction, etc., with RF-only or RF-Vision approaches. All data will be made publicly available soon.


Hardwares

WiFi

RFID

mmWave

Kinect

  • Wi-Fi transceivers: We use one Thinkpad X201 laptop with one Intel 5300 wireless network card as the Wi-Fi transmitter, and use three other sets as the Wi-Fi receivers. The laptops are positioned at the four corners of a rectangle as shown in first figure, inspired by the placement strategy outlined in Widar3.0. This arrangement creates a larger rectangular sensing area. Their height is set at 1.2 meters, based on observations from ARIL, which is found to be effective for recognizing full-body actions. The transmitter is set to broadcast packets with a speed of 200 packets per second through one antenna under High Throughput (IEEE 802.11n) bitrates at channel 128 (5.64GHz). Every receiver monitors this channel with three antennas, thus there are 9 Wi-Fi links in total. We install a Wi-Fi tool in the transceivers to conduct channel estimation and obtain the channel state information (CSI) of 30 orthogonal frequency-division multiplexing (OFDM) subcarriers, leading to the recorded CSI with the size of (200𝑡) × 1 × 3 × 3 × 30, where 𝑡 is for the recording time in seconds.
  • RFID devices: We use an Impinj Speedway R420 RFID reader with an RFMax S9028PCLJ directional antenna to broadcast the QUERY command 30 times per second at the frequency of 928.33MHz. We deploy 23 passive RFID tags branded Hansense to backscatter their Electronic Product Code (EPC) to the reader. The reader then can obtain the phase of each tag with the backscattered EPC, leading to the phase series with the size of (30𝑡) × 23, where 𝑡 is for the recording time in seconds.
  • Millimeter-wave radar: We use a TI IWR6843ISK radar to generate frequency-modulated continuous wave (FMCW) signals at the frequency of 60GHz-64GHz. The radar has three transmitting antennas and four receiving antennas. The transmitting parameter is set to 20 frames per second and 64 chirps in each frame. The ADC sampling rate is 256. Meanwhile, we mount the radar on a TI DCA1000EVM board to record raw radar data in real time, leading to the raw radar data with the size of (20𝑡) × 3 × 4 × 256 × 64. Then we apply Doppler Fast Fourier Transform (FFT) and Angle FFT over the raw data to obtain Range-Doppler Heatmaps and Range-Angle Heatmaps with the size of (20𝑡) × 256 × 64. After that, we concatenate these heatmaps along the range dimension to (20𝑡) × 256 × 128.
  • Azure Kinect: The Kinect records RGB, depth, and infrared images with 30 frames per second in 720P.


Dataset statistics

We recruited 39 subjects and let them repeat each action 20 times at the sensing area. We collect XRF55 in four different scenes. In scene 1, we record action samples of RFID, Wi-Fi, mmWave radar, and Kinect clips from 30 subjects. Besides, we record action clips from 3 subjects in the other three scenes respectively.We set the action-conducting window as 5 seconds for subjects to finish every repeat. These settings result in XRF55 with 42.9K samples that last 59h35min.


Each sample is a quadruple comprised of Wi-Fi ∈ 1000 × 1 × 3 × 3 × 30, RFID ∈ 150 × 23, mmWave radar ∈ 100 × 256 × 128, and corresponding synchronized videos from the Kinect. Further, we reshape or downsample the quadruple to reduce the training overhead to dimensions as shown in right table.


We designate the first 14 trials of each action performed by each subject as training samples, while reserving the last 6 trials for testing purposes. This results in XRF55 having 30.0K training samples and 12.9K test samples, respectively.


Multimodal visualization samples

Human-Object Interaction Actions: carrying weight

WiFi-CSI-RX1

WiFi-CSI-RX2

WiFi-CSI-RX3

mmWave-RangeAngle

mmWave-RangeDoppler

RFID

RGB

Depth

IR


Human-Human Interaction Actions: shaking hands

WiFi-CSI-RX1

WiFi-CSI-RX2

WiFi-CSI-RX3

mmWave-RangeAngle

mmWave-RangeDoppler

RFID

RGB

Depth

IR


Fitness Actions: boxing

WiFi-CSI-RX1

WiFi-CSI-RX2

WiFi-CSI-RX3

mmWave-RangeAngle

mmWave-RangeDoppler

RFID

RGB

Depth

IR


Body Motion Actions: turning

WiFi-CSI-RX1

WiFi-CSI-RX2

WiFi-CSI-RX3

mmWave-RangeAngle

mmWave-RangeDoppler

RFID

RGB

Depth

IR


Human-Computer Interaction Actions: pushing

WiFi-CSI-RX1

WiFi-CSI-RX2

WiFi-CSI-RX3

mmWave-RangeAngle

mmWave-RangeDoppler

RFID

RGB

Depth

IR



XRF55 human indoor action classes

In all, XRF55 includes 55 human indoor action classes which we categorize into 5 types: Human-Object Interaction, Human-Human Interaction, Fitness, Body Motion, and Human-Computer Interaction.
  • 15 Human-Object Interaction Actions. Whole-home daily: carrying weight, mopping the floor, using a phone, throwing something, picking something, putting something on the table; Kitchen: cutting something; Dress: wearing a hat, putting on clothing; Bathroom: blowing dry hair, combing hair, brushing teeth; Healthcare: drinking, eating, and smoking.
  • 7 Human-Human Interaction Actions. Social actions: shaking hands, hugging, handing something to someone; Violence actions for applications of domestic violence and invasion detection: kicking someone, hitting someone with something, choking someone’s neck, and pushing someone.
  • 8 Fitness Actions. With equipment: hula hooping, weightlifting, jumping rope; Without equipment: body weight squats, Tai Chi, boxing, jumping jack, and high leg lifting.
  • 14 Body Motion Actions. Whole-home daily: waving, clapping hands, jumping, walking, turning, running, sitting down, standing up; Healthcare: falling on the floor, stretching, patting on the shoulder; Musical instruments: playing Er-Hu, playing Ukulele, playing drum.
  • 11 Human-Computer Interaction Actions. Hand gestures: pushing, pulling, swiping left, swiping right, swiping up, swiping down, drawing a circle, drawing a cross; When hands are not free: foot stamping, shaking head, and nodding.

Human-Object Interaction Human-Human Interaction Fitness Body Motion Human-Computer Interaction
carrying weight shaking hands hula hooping waving pushing
mopping the floor hugging weightlifting clapping hands pulling
using a phone handing something to someone jumping rope jumping swiping left
throwing something kicking someone body weight squats walking swiping right
picking something hitting someone with something Tai Chi turning swiping up
putting something on the table choking someone's neck boxing running swiping down
cutting something pushing someone jumping jack sitting down drawing a circle
wearing a hat high leg lifting standing up drawing a cross
putting on clothing falling on the floor foot stamping
blowing dry hair stretching shaking head
combing hair patting on the shoulder nodding
brushing teeth playing Er-Hu
drinking playing Ukulele
eating playing drum
smoking

carrying weight

mopping the floor

cutting something

wearing a hat

using a phone


throwing something

putting something on the table

putting on clothing

picking something

drinking


smoking

eating

brushing teeth

blowing dry hair

combing hair


shaking hands

hugging

handing something to someone

kicking someone

hitting someone with something


choking someone’s neck

pushing someone

body weight squats

Tai Chi

boxing


weightlifting

hula hooping

jumping rope

jumping jack

high leg lifting


waving

clapping hands

falling on the floor

jumping

running


sitting down

standing up

turning

walking

stretching


patting on the shoulder

playing Er-Hu

playing Ukulele

playing drum

foot stamping


shaking head

nodding

drawing a circle

drawing a cross

pushing


pulling

swiping left

swiping right

swiping up

swiping down



Statistics


Download our dataset

XRF55 dataset part1 XRF55 dataset part2 XRF55 WiFi&RFID rawdataset

Due to Kaggle upload size limitations, we split the dataset into two parts:

XRF55 dataset part1 includes action samples of 3 individuals in scenes 2, 3, and 4, and 11 individuals in scene 1.

XRF55 dataset part2 includes the action samples of 19 individuals in scene 1.

XRF55 WiFi&RFID rawdataset includes action samples for all scenes.

Our Kinect data is in the process of being uploaded and is expected to be uploaded on 2024/5/20.


Citation

@article{wang2024xrf55,
  title={XRF55: A Radio Frequency Dataset for Human Indoor Action Analysis},
  author={Wang, Fei and Lv, Yizhe and Zhu, Mengdie and Ding, Han and Han, Jinsong},
  journal={Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies},
  issue={1},
  volume={8},
  year={2024},
  publisher={ACM New York, NY, USA}
}

Some our previous work


Acknowledgements

This work was supported by National Natural Science Foundation of China. We are grateful to anonymous associate editors and reviewers for the invaluable comments. We also grateful to Dr. Yunpeng Song and Dr. Ge Wang for fruitful discussions. We thank all volunteers for their participations.
The website template is taken from Custom Diffusion (which was built on DreamFusion's project page). The text editor used in the demo video has been taken from Rich Text-to-Image.