Meaningful patterns in time series are generally patterns in a smoothed version of the time series, rather than in the raw data.
There are several ways to smooth a time series, and each may type of smoothing may yield different patterns.
Smoothing of a time series {xt} yields a time series {st} which either has less variability than the unsmoothed time series or else corresponds more closely to some model.
A common smoothing method is to use a running average:
st = ave(xr),
where ave represents some kind of average and r ranges over some
set of indices near t.
There are three
variables in this method: the type of average, the length of the averaging
window, and the type of weighting within the window. In practice,
the most common average is a mean and the most common weighting is equal
weighting, which is called a "moving average", or sometimes "simple moving average"
for emphasis. In a moving average, the only smoothing parameter is the window width.
Another type of running average has a window width that at each point goes back
to the beginning of the series. Running averages of this type are called
"exponential smoothers". The simple form of exponential smoothing is
s0 = x0
st = axt+(1-a)st-1, for 0<a<1 and t>0.
Implicit in ATS is the existence of "changepoints", and the smoothed time series are the points lying on the broken line segments connecting the raw time series at the changepoints. Different window widths will identify different changepoints. The algorithm for ATS is given in Finding Patterns in Time Series.
The smoothing parameter in ATS, called "step size", determines how far ahead the points are considered for determining that the sign of the trend (up or down) has changed. Different step sizes yield different sets of changepoints.
The changepoints determined by ATS can be used to identify patterns such as "head-and-shoulders".
Bounding lines can be required to be completely above or completely below the raw data. A two-component smoothing parameter for bounding lines can allow a certain number of the points to be a certain distance outside the bounding lines.
The bounding lines can be used to identify patterns such as converging or diverging trends.
A "trend", which is either up or down, may be composed of multiple sub-trends. The values of the time series at the changepoints between sub-trends determine the trend. In an up trend, for example, the values of the time series at the changepoints constitute a sequence of "increasing highs and increasing lows".
We will identify trends and patterns based on changepoints as determined by ATS or on bounding lines smoothing.
Two basic functions for pattern identification given a sequence of changepoints are Trends and Patterns. These functions are designed to accept the output from ATS.
Patterns are identified by integers. They are described in the documentation for Patterns. For example, 1 represents a type of "head-and-shoulders" pattern, which is described explicitly in the documentation for Patterns.
These four functions will optionally add lines to a graph of the raw time series.
In addition, there is a function designed to loop through a sequence of step sizes for ATS smoothing, followed by identification of any trends or patterns for each specified step size.
ATS(x,step=0,segments=FALSE,offset=0,ltype=1,color="red",char="x")
BoundingLines(x,env=0,segments=FALSE,offset=0,ltype=1,color="blue")
Trends(brks,minlen=6,segments=FALSE,offset=0,ltype=1,color="blue")
Patterns(brks,pattern=1,segments=FALSE,offset=0,ltype=1,color="red")
FindTrendsPatterns(x,steps=0,minlen=NULL,whichpats=NULL)
The "sequenced univariate data", x, is just a numeric vector. It is called the "raw time series". It can be of class time series.
The "sequence of changepoints", brks, is a matrix whose first column contains the indices of the changepoints in the raw time series and whose second column contains the values of the raw time series at the changepoints. This is the form of the output of ATS.
## brks Matrix with two columns containing changepoint information. ## It is assumed that the rows represent changepoints of alternating ## trends (ATS). ## The first column is the index of the changepoint (with no offset). ## The second column is the value of x at the changepoint.
The functions that add lines to plots have a common set of arguments:
## Optional printing arguments; these arguments affect only the printing. ## segments Logical variable indicating whether to add a bounding line ## segment to an existing plot of a univariate data vector ## against its index. ## ** If segments=TRUE, there must be an existing plot over the ## appropriate range. ## ** If segments=FALSE, the additional arguments are not used. ## offset If segments=TRUE, the index of the original data plotted at ## which to begin plotting of trend lines for the current series x. ## The index of x is treated as starting at offset+1 with respect ## to the index of time used in the original plot. ## ltype If segments=TRUE, the line type, using the standard values in R. ## color If segments=TRUE, the line color, using the standard values in R.
Trends, which are determined from a
sequence of changepoints, are specified in a matrix with
four columns in which each row corresponds to a trend.
The absolute value of the entry in the first column
is the index of the first changepoint in the trend (that is, the
value in the first column of brks), and the entry in the second
is the length of the trend, measured by the number of changepoints.
The third column contains the index of the raw time series where the
trend begins, and the fourth column contains the index of the raw time
series where the trend ends. If the value in the first column is positive,
it is an up trend, and if negative, it is a downward trend.
Patterns, which are determined from a
sequence of changepoints,
are specified in a matrix with four columns in which each row corresponds to a
pattern.
The entry in the first column is the indicator of
the pattern, and the entry in the second column is the index
of the changepoint at which the pattern begins.
The third column contains the index of the raw time series where the
pattern begins, and the fourth column contains the index of the raw time
series where the pattern ends.