Tutorial in IEEE International Conference on Big Data 2018 

Big Data Analytics on Societal Event Forecasting

Liang Zhao and Feng Chen
Department of Information Science and Technology
George Mason University
Email: lzhao9 AT gmu DOT edu
[ Slides] [ Abstract ] [ Outline ] [ References] [Presenters]

Slides [download]


Abstract

Spatio-temporal societal event forecasting, which has traditionally been prohibitively challenging, is now becoming possible and experiencing rapid growth thanks to the big data from Open Source Indicators (OSI) such as social media, news sources, blogs, economic indicators, and other meta-data source. Spatio-temporal societal event forecasting benefits the society in various aspects, such as political crises, humanitarian crises, mass violence, riots, mass migrations, disease outbreaks, economic instability, resource shortages, responses to natural disasters, and others.

Different from traditional event detection that identifies ongoing events, event forecasting focuses on predicting the future events yet to happen. Also different from traditional spatio-temporal prediction on numerical indices, spatio-temporal event forecasting needs to leverage the heterogeneous information from OSI to discover the predictive indicators and mappings to future societal events. The resulting problems typically require the predictive modeling techniques that can jointly handle semantic, temporal, and spatial information, and require a design of efficient algorithms that scale to high-dimensional large real-world datasets.


Taxonomy of Research Works

  • Introduction
  • Open source indicators to societal events
  • Main challenges
  • Comparisons with event detection
  • Comparisons with spatial prediction
  • Temporal event forecasting
  • Causal dependency mining
  • Predefined causality [12, 22, 3]
  • Optimized causality [17, 16, 2, 11]
  • Temporal dependency mining
  • Markov decision processes [15, 20]
  • Deep neural networks [7, 14, 24, 8]
  • Anormaly mining
  • Scan-Statistic based [9,33,34]
  • Distance based [35]
  • Spatio-temporal event forecasting
  • Discriminative Models
  • Multi-task models [39, 29, 30, 6]
  • Multi-level models [32, 27]
  • Multi-view models [31]
  • Multi-layer models [37, 38]
  • Generative and Mechanistic Models
  • Generative Models [19, 26,40]
  • Mechanistic Models [41]
  • Ensemble Models
  • Data-driven Models [12, 18]
  • Data-driven+Mechanistic-driven Models [28, 42]
  • Conclusion and future directions

  • References

    [1] Somayyeh Aghababaei and Masoud Makrehchi. Mining social media content for crime prediction. In Web Intelligence (WI), 2016 IEEE/WIC/ACM International Conference on, pages 526–531. IEEE, 2016

    [2] Marta Arias, Argimiro Arratia, and Ramon Xuriguera. Forecasting with Twitter data. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1):8, 2013.

    [3] Johan Bollen, Huina Mao, and Xiaojun Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1–8, 2011.

    [4] Feng Chen and Daniel B Neill. Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1166–1175. ACM, 2014

    [5] Mahtab Jahanbani Fard, Ping Wang, Sanjay Chawla, and Chandan K Reddy. A bayesian perspective on early stage event prediction in longitudinal data. IEEE Transactions on Knowledge and Data Engineering, 28(12):3126–3139, 2016.

    [6] Yuyang Gao and Liang Zhao. Incomplete label multi-task ordinal regression for spatial event scale forecasting. In AAAI Conference on Artificial Intelligence, pages 2999–3006, 2018.

    [7] Mark Granroth-Wilding and Stephen Clark. What happens next? event prediction using a compositional neural network model. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.

    [8] Linmei Hu, Juanzi Li, Liqiang Nie, Xiao-Li Li, and Chao Shao. What happens next? future subevent prediction using contextual hierarchical lstm. In AAAI Conference on Artificial Intelligence, 2017.

    [9] Hyeon-Woo Kang and Hang-Bong Kang. Prediction of crime occurrence from multi-modal data using deep learning. PloS one, 12(4):e0176244, 2017.

    [10] Gizem Korkmaz, Jose Cadena, Chris J Kuhlman, Achla Marathe, Anil Vullikanti, and Naren Ramakrishnan. Combining heterogeneous data sources for civil unrest forecasting. In Advances in Social Networks Analysis and Mining (ASONAM), 2015 IEEE/ACM International Conference on, pages 258–265. IEEE, 2015

    [11] Canasai Kruengkrai, Kentaro Torisawa, Chikara Hashimoto, Julien Kloetzer, Jong-Hoon Oh, and Masahiro Tanaka. Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks. In AAAI Conference on Artificial Intelligence, 2017.

    [12] Sathappan Muthiah, Patrick Butler, Rupinder Paul Khandpur, Parang Saraf, Nathan Self, Alla Rozovskaya, Liang Zhao, Jose Cadena, ChangTien Lu, Anil Vullikanti, Achla Marathe, Kristen Summers, Graham Katz, Andy Doyle, Jaime Arredondo, Dipak K. Gupta, David Mares, and Naren Ramakrishnan. Embers at 4 years: Experiences operating an open source 7 indicators forecasting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 205–214, New York, NY, USA, 2016. ACM.

    [13] Yue Ning, Sathappan Muthiah, Huzefa Rangwala, and Naren Ramakrishnan. Modeling precursors for event forecasting via nested multi-instance learning. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1095–1104. ACM, 2016

    [14] Karl Pichotta and Raymond J Mooney. Learning statistical scripts with lstm recurrent neural networks. In AAAI, pages 2800–2806, 2016.

    [15] Fengcai Qiao, Pei Li, Xin Zhang, Zhaoyun Ding, Jiajun Cheng, and Hui Wang. Predicting social unrest events with hidden markov models using gdelt. Discrete Dynamics in Nature and Society, 2017, 2017.

    [16] Kira Radinsky and Sagie Davidovich. Learning to predict from textual data. Journal of Artificial Intelligence Research, 45(1):641–684, 2012.

    [17] Kira Radinsky and Eric Horvitz. Mining the web to predict future events. In WSDM, pages 255–264, 2013

    [18] Naren Ramakrishnan, Patrick Butler, Sathappan Muthiah, Nathan Self, Rupinder Khandpur, Parang Saraf, Wei Wang, Jose Cadena, Anil Vullikanti, Gizem Korkmaz, et al. ’beating the news’ with embers: forecasting civil unrest using open source indicators. In KDD 2014, pages 1799–1808. ACM, 2014.

    [19] Theodoros Rekatsinas, Saurav Ghosh, Sumiko R Mekaru, Elaine O Nsoesie, John S Brownstein, Lise Getoor, and Naren Ramakrishnan. Sourceseer: Forecasting rare disease outbreaks using multiple data sources. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 379–387. SIAM, 2015.

    [20] Philip A Schrodt. Forecasting conflict in the balkans using hidden markov models. In Programming for Peace, pages 161–184. Springer, 2006.

    [21] Minglai Shao, Jianxin Li, Feng Chen, Hongyi Huang, Shuai Zhang, and Xunxun Chen. An efficient approach to event detection and forecasting in dynamic multivariate social media networks. In Proceedings of the 26th International Conference on World Wide Web, pages 1631–1639. International World Wide Web Conferences Steering Committee, 2017.

    [22] Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M Welpe. Predicting elections with Twitter: What 140 characters reveal about political sentiment. ICWSM, 10:178–185, 2010.

    [23] Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. Automatic crime prediction using events extracted from Twitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction, pages 231–238. Springer, 2012.

    [24] Wang, Zhongqing, and Yue Zhang. "DDoS event forecasting using Twitter data." Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 2017.

    [25] Qian Zhang, Nicola Perra, Daniela Perrotta, Michele Tizzoni, Daniela Paolotti, and Alessandro Vespignani. Forecasting seasonal influenza fusing digital indicators and a mechanistic disease model. In Proceedings of the 26th International Conference on World Wide Web, pages 311–319. International World Wide Web Conferences Steering Committee, 2017.

    [26] Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Spatiotemporal event forecasting in social media. In SDM 15, pages 963–971. SIAM, 2015

    [27] Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Multiresolution spatial event forecasting in social media. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pages 689–698. IEEE, 2016.

    [28] Liang Zhao, Jiangzhuo Chen, Feng Chen, Wei Wang, Chang-Tien Lu, and Naren Ramakrishnan. Simnest: Social media nested epidemic simulation via online semi-supervised deep learning. In Data Mining (ICDM), 2015 IEEE International Conference on, pages 639–648. IEEE, 2015

    [29] Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Multi-task learning for spatio-temporal event forecasting. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1503–1512. ACM, 2015.

    [30] Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Feature constrained multi-task learning models for spatiotemporal event forecasting. IEEE Transactions on Knowledge and Data Engineering, 29(5):1059–1072, 2017

    [31] Liang Zhao, Junxiang Wang, and Xiaojie Guo. Distant-supervision of heterogeneous multitask learning for social event forecasting with multilingual indicators. In AAAI Conference on Artificial Intelligence, pages 4498–4505, 2018.

    [32] Liang Zhao, Jieping Ye, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Hierarchical incomplete multi-source feature learning for spatiotemporal event forecasting. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2085–2094. ACM, 2016.

    [33] Chen, F., & Neill, D. B. (2014, August). Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1166-1175). ACM.

    [34] Chen, F., & Neill, D. B. (2015). Human rights event detection from heterogeneous social media graphs. Big Data, 3(1), 34-40.

    [35] Rozenshtein, P., Anagnostopoulos, A., Gionis, A., & Tatti, N. (2014, August). Event detection in activity networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining(pp. 1176- 1185). ACM.

    [36] Chen, F., & Neill, D. B. (2015). Human rights event detection from heterogeneous social media graphs. Big Data, 3(1), 34-40.

    [37] Wu, Congyu, and Matthew S. Gerber. "Forecasting Civil Unrest Using Social Media and Protest Participation Theory." IEEE Transactions on Computational Social Systems 5, no. 1 (2018): 82-94.

    [38] Zhuoning Yuan, Xun Zhou, Tianbao Yang. Hetero-ConvLSTM: A Deep Learning Approach to Traffic Accident Prediction on Heterogeneous Spatio-Temporal Data. In 24th ACM SIGKDD International Conference on Knowledge Discovery from Data (KDD), 2018 (Accepted).

    [39] Yuyang Gao, Liang Zhao, Lingfei Wu, Yanfang Ye, Hui Xiong, Chaowei Yang. Incomplete Label Multi-task Deep Learning for Spatio-temporal Event Subtype Forecasting.Thirty-third AAAI Conference on Artificial Intelligence (AAAI 2019), Hawaii, USA, Feb 2019, to appear.

    [40] Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. "Online Spatial Event Forecasting in Microblogs.", ACM Transactions on Spatial Algorithms and Systems (TSAS), Volume 2 Issue 4, Acticle No. 15, pp. 1-39, November 2016.

    [41] Fang Jin, Rupinder Khandpur, Nathan Self, Edward Dougherty, Sheng Guo, Feng Chen, B. Aditya Prakash, Naren Ramakrishnan. Modeling Mass Protest Adoption in Social Network Communities using Geometric Brownian Motion, in Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'14), Aug 2014.

    [42] Ting Hua, Chandan Reddy, Lijing Wang, Liang Zhao, Lei Zhang, Chang-Tien Lu, and Naren Ramakrishnan. Social Media based Simulation Models for Understanding Disease Dynamics. the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018) (acceptance rate: 20.6%), Stockholm, Sweden, Jul 2018, to appear.


    Presenters and Contact Information:


    Liang Zhao

    Dr. Liang Zhao is an assistant professor at Information Science and Technology Department at George Mason University. He got his PhD degree from Computer Science Department at Virginia Tech in the United States. His research interests include big data mining, artificial intelligence, and machine learning, with particular emphasis on sparse feature learning, social event forecasting, text mining, heterogeneous network modeling, and deep learning on graphs. He has published numerous papers in top venues in data mining and artificial intelligence such as ACM KDD, IEEE TKDE, AAAI, IJCAI, IEEE ICDM, ACM CIKM, and WWW. He has served as publication chair of SSTD 2017, co-chair of LENS workshop at SIGSPATIAL 2018, program committee of ACM KDD 2018, AAAI 2019, SDM 2019, IEEE ICDM 2018, and IEEE ICDM 2017. He has been serving as reviewer for top conferences and journals such as ACM KDD, ACM TKDD, IEEE TKDE, and IEEE ICDM.

    Address: Room 5343, Engineering Building George Mason University
    4400 Univ. Dr., Fairfax, VA 22030
    Telephone: +1 703-993-5910
    Email: lzhao9@gmu.edu