Online Petition Datasets

for Petition Victory Prediction



The rise of Online Petition Platform (OPP), which is a form of web-based petition host, came out over the recent decades and spurred with the internet and social networking. For example,, which was founded in 2007, has owned over 190 million users and hundreds of daily petitions covering various social aspects from health, economics, to government policies by July 2017. These petition websites have grown to be an important way to detect and track timely public concerns toward societal issues, as well as a creative attempt to fill the gap between the increasing public concerns and the decision-makers' attention.

Typically, an online petition can be easily created by using some web-based petition hosts to gather enough signatures in order to get the attention of the responsible decision-makers. A victorious petition means that the decision-makers take action to address the concern of the petition-launcher. It is of high interests of social science domain to analyze the factors that potentially indicate or lead to the victory (and failure) of the petitions. It is also charming for the practationers to forsee the victory or failure of the petitions based on the historical data and experience. This dataset is for the problem on forecasting petition victory (or failure)in the future and more importantly, dig out the influential factors for the success of a petition.

First, we queried the API to obtain information from 54,039 petitions during Jan 1, 2009 and Dec 17, 2017 from six countries: Philippines, India, Germany, Australia, Canada, and United States. Second, all corresponding comments were retrieved by again. The feature representation of each petition is illustrated in Table II in the original paper. All fields except basic petition properties were unstructured raw texts and were represented by a set of keywords provided by domain experts. Basic petition properties such as "calculated goal" and "weekly signature count" were obtained originally on the website. The label information of six datasets was extracted by the "is victory" field.


To use these datasets, please cite the paper:

Junxiang Wang, Yuyang Gao, Andreas Zufle, Jingyuan Yang, and Liang Zhao. Incomplete Label Uncertainty Estimation for Petition Victory Prediction with Dynamic Features. in Proceedings of the IEEE International Conference on Data Mining (ICDM 2018), Singapore, Dec 2018, to appear.

author = "Junxiang Wang, Yuyang Gao, Andreas Zufle, Jingyuan Yang and Liang Zhao",
title = "Incomplete Label Uncertainty Estimation for Petition Victory Prediction with Dynamic Features",
conference = "Proceedings of International Conference on Data Mining",
year = "2018",
month = "nov"


Datasets for Philippines, India, Germany, Australia, Canada, and United States.

Format for each dataset:

% Each dataset consists of the following components:
% data: petition data. -100 means this value is missing.
% _ID: the IDs of petitions.
% _index: the mapping from petition data to petition sets.
% _label: labels of peititons. 1 means this petition is victorious, -1 means this petition fails and 0 means this petition is unlabeled.
% _taskno: the task number assigned for multi-task learning according to missing patterns (see paper for details).


For any further questions, such as more detailed information on the raw data and features, please contact the author Junxiang Wang: