Unmanned Aerial Vehicle (UAV) Intrusion Detection Datasets

for prediction-runtime aware classfiication



The consumer UAV (unmanned aerial vehicle) market has grown significantly over the past few years. Despite its huge potential in spurring economic growth by supporting various applications, the increase of consumer UAVs poses potential risks to public security and personal privacy. To minimize the risks, efficiently detecting and identifying invading UAVs is in urgent need. Given the fact that consumer UAVs are usually used in a civilian environment, existing physical detection methods (such as radar, vision, and sound) may become ineffective in many scenarios. To avoid these issues, encrypted WiFi traffic data records of the UAVs can be a very promising source to detect the UAV Intruders.

Therefore, for UAV intruder detection each input is an encrypted WiFi traffic record we monitor while the output is whether the currenct traffic is from a UAV or not. However, in addition to accuracy, the prediction runtime of the detector method is also highly important because the UAV intruders fly dozens of meters per second and it is important to obtain the prediction as fast as possible. Therefore, we will need to jointly optimize the accuracy and runtime of the prediction. Therefore, pruning and simplifying the model would be highly important, such as pruning the redudant feature generations and merging the shared computations.


1. UAVs we used in these datasets:

2. Features extracted from WiFi traffic:

For WiFi traffic records of each type of UAV, we consider two types of modes: 1) Bidirectional-flow mode. Here the i) uplink flow, ii) downlink flow, and iii) total traffic flow are considered. 2) Unidirectional-flow mode. Here only the total traffic flow is considered. For each type of flow, the packet sizes and packet inter-arrival time are the raw data sources. For each source, 9 statistical measures have been calculated as features in Table 1 are extracted:


Therefore, the Bidirectional-flow mode will involve 9 features × 2 sources × 3 direction flow = 54 features while Unidirectional-flow mode havs 9 features × 2 sources = 18 features.

3. Datasets:

The datasets are combinations between different UAV types and traffic modes.

4. Feature computational dependencies:

For example, for the feature "standard deviation", its feature generation runtime consists of the feature component "mean" and the remaining computation utilizing the computed "mean", the feature dependencies for the 9 features listed above is shown as follows:

Here each node is a feature and each hyperedge is a feature component shared among different features.


To use these datasets, please cite the paper:

Liang Zhao, Amir Alipour-Fanid, Martin Slawski and Kai Zeng. Prediction-time Efficient Classification Using Feature Computational Dependencies. in Proceedings of the 24st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2018), research track (acceptance rate: 18.4%), London, United Kingdom, Aug 2018, Pages 2787-2796 .

author = {Zhao, Liang and Alipour-Fanid, Amir and Slawski, Martin and Zeng, Kai},
title = {Prediction-time Efficient Classification Using Feature Computational Dependencies},
booktitle = {Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \&\#38; Data Mining},
year = {2018},
location = {London, United Kingdom},
pages = {2787--2796},
doi = {10.1145/3219819.3220117},
publisher = {ACM},


Datasets for Dataset1, Dataset2, Dataset3, Dataset4, Dataset5, and Dataset6.

Format for each dataset:

Each dataset file is a .mat document.
Each dataset consists of the following components:
data_tr: n×(k+1). training set. n is the number of training samples, k is the number of features. The last column is the label: 1 means UAV; 0, otherwise.
data_te: n'×(k+1). test set. n' is the number of test samples, k is the number of features. The last column is the label: 1 means UAV; 0, otherwise.
D: k× 1. The generation runtime for each feature.
H: k'×k. The incident matrix of the feature computational hypergraph (see the above paper for details). k' is the number of feature computational components and k is the numbe of features.


For any further questions, such as more detailed information on the raw data and features, please contact the author Liang Zhao: lzhao9@gmu.edu.