Civil unrest events are typically organized in social media, especially by Twitter and Facebook. Therefore, mining these data allow us capability to potentially detect and forecast future events. By identifying those tweets who could indicate about future civil unrest events, the goal is to utilize Twitter data as social sensors to forecast the spatiotemporal patterns of protests for different locations and dates.
Download link:
Dataset |
Spanish Tweets (%) | English Tweets (%) | Portuguese Tweets (%) | #Events | 2013 Data |
2014 Data |
Argentina | 91.6 |
7.3 |
1.1 |
1427 |
[AR_data] | [AR_data] |
Brazil | 10.1 |
16.0 |
73.9 |
3417 |
[BR_data] | [BR_data] |
Chile | 82.8 |
16.4 |
0.8 |
776 |
[CL_data] | [CL_data] |
Colombia | 89.8 |
9.4 |
0.8 |
1287 |
[CO_data] | [CO_data] |
Ecuador | 91.1 |
8.1 |
0.8 |
511 |
[EC_data] | [EC_data] |
El Salvador | 91.5 |
7.8 |
0.7 |
730 |
[EL_data] | [EL_data] |
Mexico | 83.7 |
15.4 |
0.9 |
5907 |
[MX_data] | [MX_data] |
Paraguay | 92.2 |
6.4 |
1.4 |
2114 |
[PY_data] | [PY_data] |
Uruguay | 89.7 |
8.8 |
1.4 |
664 |
[UY_data] | [UY_data] |
Venezuela | 92.3 |
6.9 |
0.8 |
3320 |
[VE_data] | [VE_data] |
Data format: *.mat (can be opened by Matlab)
Data description:
Variable Name |
Type | Size | Description |
keywords | array of string | 1*3 | keyword lists corresponding to various languages |
locations | array of string | 1*n | location names of n cities in the current country |
langs | array of string | 1*3 | language names |
Xs | arrays of matrices | 1*3 | input data: three matrices corresponding to the three languages
|
Y | array of matrices | 1*3 | output data: three vectors corresponding to the three languages
|
All the civil unrest tweet messages X, label set Y, and keywords are obtained from IARPA OSI project. Please refer to the papers [KDD 2014] and [KDD 2016] for details. The raw label set can be downloaded here: [Output Raw Data].
To use these datasets, please cite the papers:
Liang Zhao, Junxiang Wang, and Xiaojie Guo. Distant-supervision of heterogeneous multitask learning for social event forecasting with multilingual indicators. Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), Oral presentation (acceptance rate: 11.0%), pp. 4498-4505, New Orleans, US, Feb 2018.
NSF 1755850 (sole-PI): "CRII: III: Interpretable Models for Spatio-Temporal Event Forecasting using Social Sensors", $174,990. 2018-2021, National Science Foundation.