Civil unrest events are typically organized in social media, especially by Twitter and Facebook. Therefore, mining these data allow us capability to potentially detect and forecast future events. By identifying those tweets who could indicate about future civil unrest events, the goal is to utilize Twitter data as social sensors to forecast the spatiotemporal patterns of protests for different locations and dates.
Download link:
Dataset |
#Events | Download Link |
Argentina | 1427 |
[AR_data] |
Brazil | 3417 |
[BR_data] |
Chile | 776 |
[CL_data] |
Colombia | 1287 |
[CO_data] |
Ecuador | 511 |
[EC_data] |
El Salvador | 730 |
[EL_data] |
Mexico | 5907 |
[MX_data] |
Paraguay | 2114 |
[PY_data] |
Uruguay | 664 |
[UY_data] |
Venezuela | 3320 |
[VE_data] |
Data format: *.mat (can be opened by Matlab)
Data description:
Variable Name |
Type | Size | Description |
keywords | array of string | 1*923 | keyword list to represent a tweet message into a document vector |
locations | array of string | 1*n | location names of n cities in the current country |
dates | array of string | 1*729 | all the dates |
X | array of matrices | 1*n | input data: tweet data for n locations from 2013-01-01 to 2014-12-30
|
Y | array of matrices | 1*n | output data: event occurrence data for n locations from 2013-01-01 to 2014-12-30
|
All the civil unrest tweet messages X, label set Y, and keywords are obtained from IARPA OSI project. Please refer to the papers [KDD 14] and [KDD 16] for details. The raw label set can be downloaded here: [Output Raw Data].
To use these datasets, please cite the papers:
Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. "Multi-Task Learning for Spatio-Temporal Event Forecasting." in Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015), research track, (acceptance rate: 19.4%), Sydney, Australia, pp. 1503-1512, Aug 2015.
Sathappan Muthiah, Patrick Butler, Rupinder Paul Khandpur, Parang Saraf, Nathan Self, Alla Rozovskaya, Liang Zhao, Jose Cadena et al. "EMBERS at 4 years:Experiences operating an Open Source Indicators Forecasting System." in Proceedings of the 22st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016), applied data science track, accepted (acceptance rate: 19.9%), pp. 205-214, San Francisco, California, Aug 2016.
NSF 1755850 (sole-PI): "CRII: III: Interpretable Models for Spatio-Temporal Event Forecasting using Social Sensors", $174,990. 2018-2021, National Science Foundation.