Radio frequency (RF) devices such as Wi-Fi transceivers, radio frequency identification tags, and millimeter-wave radars have appeared in large numbers in daily lives. The presence and movement of humans can affect the propagation of RF signals, further, this phenomenon is exploited for human action recognition. Compared to camera solutions, RF approaches exhibit greater resilience to occlusions and lighting conditions, while also raising fewer privacy concerns in indoor scenarios. However, current works have many limitations, including the unavailability of datasets, insufficient training samples, and simple or limited action categories for specific applications, which seriously hinder the growth of RF solutions, presenting a significant obstacle in transitioning RF sensing research from the laboratory to a wide range of everyday life applications. To facilitate the transitioning, in this paper, we introduce and release a large-scale multiple radio frequency dataset, named XRF55, for indoor human action analysis. XRF55 encompasses 42.9K RF samples and 55 action classes of human-object interactions, human-human interactions, fitness, body motions, and human-computer instructions, collected from 39 subjects within 100 days. These actions were meticulously selected from 19 RF sensing papers and 16 video action recognition datasets. Each action is chosen to support various applications with high practical value, such as elderly fall detection, fatigue monitoring, domestic violence detection, etc. Moreover, XRF55 contains 23 RFID tags at 922.38MHz, 9 Wi-Fi links at 5.64GHz, one mmWave radar at 60-64GHz, and one Azure Kinect with RGB+D+IR sensors, covering frequency across decimeter wave, centimeter wave, and millimeter wave. In addition, we apply a mutual learning strategy over XRF55 for the task of action recognition. Unlike simple modality fusion, under mutual learning, three RF modalities are trained collaboratively and then work solely. We find these three RF modalities will promote each other. It is worth mentioning that, with synchronized Kinect, XRF55 also supports the exploration of action detection, action segmentation, pose estimation, human parsing, mesh reconstruction, etc., with RF-only or RF-Vision approaches. All data will be made publicly available soon.
WiFi
RFID
mmWave
Kinect
• Wi-Fi transceivers: We use one Thinkpad X201 laptop with one Intel 5300 wireless network card as the Wi-Fi transmitter, and use three other sets as the Wi-Fi receivers. The laptops are positioned at the four corners of a rectangle as shown in first figure, inspired by the placement strategy outlined in Widar3.0. This arrangement creates a larger rectangular sensing area. Their height is set at 1.2 meters, based on observations from ARIL, which is found to be effective for recognizing full-body actions. The transmitter is set to broadcast packets with a speed of 200 packets per second through one antenna under High Throughput (IEEE 802.11n) bitrates at channel 128 (5.64GHz). Every receiver monitors this channel with three antennas, thus there are 9 Wi-Fi links in total. We install a Wi-Fi tool in the transceivers to conduct channel estimation and obtain the channel state information (CSI) of 30 orthogonal frequency-division multiplexing (OFDM) subcarriers, leading to the recorded CSI with the size of (200𝑡) × 1 × 3 × 3 × 30, where 𝑡 is for the recording time in seconds.
• RFID devices: We use an Impinj Speedway R420 RFID reader with an RFMax S9028PCLJ directional antenna to broadcast the QUERY command 30 times per second at the frequency of 928.33MHz. We deploy 23 passive RFID tags branded Hansense to backscatter their Electronic Product Code (EPC) to the reader. The reader then can obtain the phase of each tag with the backscattered EPC, leading to the phase series with the size of (30𝑡) × 23, where 𝑡 is for the recording time in seconds.
• Millimeter-wave radar: We use a TI IWR6843ISK radar to generate frequency-modulated continuous wave (FMCW) signals at the frequency of 60GHz-64GHz. The radar has three transmitting antennas and four receiving antennas. The transmitting parameter is set to 20 frames per second and 64 chirps in each frame. The ADC sampling rate is 256. Meanwhile, we mount the radar on a TI DCA1000EVM board to record raw radar data in real time, leading to the raw radar data with the size of (20𝑡) × 3 × 4 × 256 × 64. Then we apply Doppler Fast Fourier Transform (FFT) and Angle FFT over the raw data to obtain Range-Doppler Heatmaps and Range-Angle Heatmaps with the size of (20𝑡) × 256 × 64. After that, we concatenate these heatmaps along the range dimension to (20𝑡) × 256 × 128.
• Azure Kinect: The Kinect records RGB, depth, and infrared images with 30 frames per second in 720P.
We recruited 39 subjects and let them repeat each action 20 times at the sensing area. We collect XRF55 in four different scenes. In scene 1, we record action samples of RFID, Wi-Fi, mmWave radar, and Kinect clips from 30 subjects. Besides, we record action clips from 3 subjects in the other three scenes respectively.We set the action-conducting window as 5 seconds for subjects to finish every repeat. These settings result in XRF55 with 42.9K samples that last 59h35min.
Each sample is a quadruple comprised of Wi-Fi ∈ 1000 × 1 × 3 × 3 × 30, RFID ∈ 150 × 23, mmWave radar ∈ 100 × 256 × 128, and corresponding synchronized videos from the Kinect. Further, we reshape or downsample the quadruple to reduce the training overhead to dimensions as shown in right table.
We designate the first 14 trials of each action performed by each subject as training samples, while reserving the last 6 trials for testing purposes. This results in XRF55 having 30.0K training samples and 12.9K test samples, respectively.
WiFi-CSI-RX1
WiFi-CSI-RX2
WiFi-CSI-RX3
mmWave-RangeAngle
mmWave-RangeDoppler
RFID
RGB
Depth
IR
WiFi-CSI-RX1
WiFi-CSI-RX2
WiFi-CSI-RX3
mmWave-RangeAngle
mmWave-RangeDoppler
RFID
RGB
Depth
IR
WiFi-CSI-RX1
WiFi-CSI-RX2
WiFi-CSI-RX3
mmWave-RangeAngle
mmWave-RangeDoppler
RFID
RGB
Depth
IR
WiFi-CSI-RX1
WiFi-CSI-RX2
WiFi-CSI-RX3
mmWave-RangeAngle
mmWave-RangeDoppler
RFID
RGB
Depth
IR
WiFi-CSI-RX1
WiFi-CSI-RX2
WiFi-CSI-RX3
mmWave-RangeAngle
mmWave-RangeDoppler
RFID
RGB
Depth
IR
In all, XRF55 includes 55 human indoor action classes which we categorize into 5 types: Human-Object Interaction, Human-Human Interaction, Fitness, Body Motion, and Human-Computer Interaction.
• 15 Human-Object Interaction Actions. Whole-home daily: carrying weight, mopping the floor, using a phone, throwing something, picking something, putting something on the table; Kitchen: cutting something; Dress: wearing a hat, putting on clothing; Bathroom: blowing dry hair, combing hair, brushing teeth; Healthcare: drinking, eating, and smoking.
• 7 Human-Human Interaction Actions. Social actions: shaking hands, hugging, handing something to someone; Violence actions for applications of domestic violence and invasion detection: kicking someone, hitting someone with something, choking someone’s neck, and pushing someone.
• 8 Fitness Actions. With equipment: hula hooping, weightlifting, jumping rope; Without equipment: body weight squats, Tai Chi, boxing, jumping jack, and high leg lifting.
• 14 Body Motion Actions. Whole-home daily: waving, clapping hands, jumping, walking, turning, running, sitting down, standing up; Healthcare: falling on the floor, stretching, patting on the shoulder; Musical instruments: playing Er-Hu, playing Ukulele, playing drum.
• 11 Human-Computer Interaction Actions. Hand gestures: pushing, pulling, swiping left, swiping right, swiping up, swiping down, drawing a circle, drawing a cross; When hands are not free: foot stamping, shaking head, and nodding.
Human-Object Interaction | Human-Human Interaction | Fitness | Body Motion | Human-Computer Interaction |
---|---|---|---|---|
carrying weight | shaking hands | hula hooping | waving | pushing |
mopping the floor | hugging | weightlifting | clapping hands | pulling |
using a phone | handing something to someone | jumping rope | jumping | swiping left |
throwing something | kicking someone | body weight squats | walking | swiping right |
picking something | hitting someone with something | Tai Chi | turning | swiping up |
putting something on the table | choking someone's neck | boxing | running | swiping down |
cutting something | pushing someone | jumping jack | sitting down | drawing a circle |
wearing a hat | high leg lifting | standing up | drawing a cross | |
putting on clothing | falling on the floor | foot stamping | ||
blowing dry hair | stretching | shaking head | ||
combing hair | patting on the shoulder | nodding | ||
brushing teeth | playing Er-Hu | |||
drinking | playing Ukulele | |||
eating | playing drum | |||
smoking |
carrying weight
mopping the floor
cutting something
wearing a hat
using a phone
throwing something
putting something on the table
putting on clothing
picking something
drinking
smoking
eating
brushing teeth
blowing dry hair
combing hair
shaking hands
hugging
handing something to someone
kicking someone
hitting someone with something
choking someone’s neck
pushing someone
body weight squats
Tai Chi
boxing
weightlifting
hula hooping
jumping rope
jumping jack
high leg lifting
waving
clapping hands
falling on the floor
jumping
running
sitting down
standing up
turning
walking
stretching
patting on the shoulder
playing Er-Hu
playing Ukulele
playing drum
foot stamping
shaking head
nodding
drawing a circle
drawing a cross
pushing
pulling
swiping left
swiping right
swiping up
swiping down
@article{wang2024xrf55,
title={XRF55: A Radio Frequency Dataset for Human Indoor Action Analysis},
author={Wang, Fei and Lv, Yizhe and Zhu, Mengdie and Ding, Han and Han, Jinsong},
journal={Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies},
issue={1},
volume={8},
year={2024},
publisher={ACM New York, NY, USA}
}
This work was supported by National Natural Science Foundation of China. We are grateful to anonymous associate editors and reviewers for the invaluable comments. We also grateful to Dr. Yunpeng Song and Dr. Ge Wang for fruitful discussions. We thank all volunteers for their participations.
The website template is taken from Custom
Diffusion (which was built on
DreamFusion's project page). The text editor used in the demo video has been taken from Rich Text-to-Image.
XRF55 dataset by Fei Wang, Yizhe Lv, Mengdie Zhu, Han Ding, Jinsong Han is licensed under Attribution-NonCommercial 4.0 International