﻿ Programs for Smoothing Time Series and Detecting Patterns

Programs for Smoothing Time Series and Detecting Patterns

James Gentle

Meaningful patterns in time series are generally patterns in a smoothed version of the time series, rather than in the raw data.

There are several ways to smooth a time series, and each may type of smoothing may yield different patterns.

Methods of Smoothing Time Series

A simple method of smoothing a time series is just to subsample the series; for example, replace daily data with weekly data. Although this method is subject to sampling error, it may reduce the variance of the time series. Under a geometric Brownian motion model, for example, it reduces the volatility proportional to the ratio of the square roots of the intervals.

Smoothing of a time series {xt} yields a time series {st} which either has less variability than the unsmoothed time series or else corresponds more closely to some model.

A common smoothing method is to use a running average:
st = ave(xr),
where ave represents some kind of average and r ranges over some set of indices near t.
There are three variables in this method: the type of average, the length of the averaging window, and the type of weighting within the window. In practice, the most common average is a mean and the most common weighting is equal weighting, which is called a "moving average", or sometimes "simple moving average" for emphasis. In a moving average, the only smoothing parameter is the window width. Another type of running average has a window width that at each point goes back to the beginning of the series. Running averages of this type are called "exponential smoothers". The simple form of exponential smoothing is
s0 = x0
st = axt+(1-a)st-1, for 0<a<1 and t>0.

Alternating Trends Smoothing

Another smoothing method that also depends on a type of window width is called "alternating trends smoothing", or ATS. The smoothing model in ATS is an alternating sequence of up and down linear trends.

Implicit in ATS is the existence of "changepoints", and the smoothed time series are the points lying on the broken line segments connecting the raw time series at the changepoints. Different window widths will identify different changepoints. The algorithm for ATS is given in Finding Patterns in Time Series.

The smoothing parameter in ATS, called "step size", determines how far ahead the points are considered for determining that the sign of the trend (up or down) has changed. Different step sizes yield different sets of changepoints.

The changepoints determined by ATS can be used to identify patterns such as "head-and-shoulders".

Bounding Lines

A different approach to smoothing is to determine line segments that bound the time series. An upper bounding line is one that is above the raw data, and a lower bounding line is one that is below the raw data. The algorithm for bounding lines is given in Finding Patterns in Time Series. Bounding lines are generally determined for a given segment of a time series, possibly the subseries between two given changepoints.

Bounding lines can be required to be completely above or completely below the raw data. A two-component smoothing parameter for bounding lines can allow a certain number of the points to be a certain distance outside the bounding lines.

The bounding lines can be used to identify patterns such as converging or diverging trends.

Trends and Patterns in Time Series

Trends and patterns in time series vary depending on the type of smoothing done prior to identifying the trends or patterns.

A "trend", which is either up or down, may be composed of multiple sub-trends. The values of the time series at the changepoints between sub-trends determine the trend. In an up trend, for example, the values of the time series at the changepoints constitute a sequence of "increasing highs and increasing lows".

We will identify trends and patterns based on changepoints as determined by ATS or on bounding lines smoothing.

R Functions for Trends and Patterns in Time Series

Two R functions for smoothing sequenced univariate data are ATS and BoundingLines.

Two basic functions for pattern identification given a sequence of changepoints are Trends and Patterns. These functions are designed to accept the output from ATS.

Patterns are identified by integers. They are described in the documentation for Patterns. For example, 1 represents a type of "head-and-shoulders" pattern, which is described explicitly in the documentation for Patterns.

These four functions will optionally add lines to a graph of the raw time series.

In addition, there is a function designed to loop through a sequence of step sizes for ATS smoothing, followed by identification of any trends or patterns for each specified step size.

The functions are
• ATS - Determines changepoints in linear trends of sequenced univariate data.
```  ATS(x,step=0,segments=FALSE,offset=0,ltype=1,color="red",char="x")
```
• BoundingLines - Computes an upper or lower bounding line for sequenced univariate data.
```  BoundingLines(x,env=0,segments=FALSE,offset=0,ltype=1,color="blue")
```
• Trends - Determines beginning and ending points of trends, given a sequence of changepoints.
```  Trends(brks,minlen=6,segments=FALSE,offset=0,ltype=1,color="blue")
```
• Patterns - Determines beginning and ending points of patterns, given a sequence of changepoints.
```  Patterns(brks,pattern=1,segments=FALSE,offset=0,ltype=1,color="red")
```
• FindTrendsPatterns - Determines trends and patterns in sequenced univariate data, for given step sizes.
```  FindTrendsPatterns(x,steps=0,minlen=NULL,whichpats=NULL)
```

The arguments and the output of these functions have a consistency that facilitates their use together.

The "sequenced univariate data", x, is just a numeric vector. It is called the "raw time series". It can be of class time series.

The "sequence of changepoints", brks, is a matrix whose first column contains the indices of the changepoints in the raw time series and whose second column contains the values of the raw time series at the changepoints. This is the form of the output of ATS.

```##    brks      Matrix with two columns containing changepoint information.
##              It is assumed that the rows represent changepoints of alternating
##              trends (ATS).
##              The first column is the index of the changepoint (with no offset).
##              The second column is the value of x at the changepoint.
```

The functions that add lines to plots have a common set of arguments:

```## Optional printing arguments; these arguments affect only the printing.
##    segments  Logical variable indicating whether to add a bounding line
##              segment to an existing plot of a univariate data vector
##              against its index.
##              ** If segments=TRUE, there must be an existing plot over the
##              appropriate range.
##              ** If segments=FALSE, the additional arguments are not used.
##    offset    If segments=TRUE, the index of the original data plotted at
##              which to begin plotting of trend lines for the current series x.
##              The index of x is treated as starting at offset+1 with respect
##              to the index of time used in the original plot.
##    ltype     If segments=TRUE, the line type, using the standard values in R.
##    color     If segments=TRUE, the line color, using the standard values in R.
```

Trends, which are determined from a sequence of changepoints, are specified in a matrix with four columns in which each row corresponds to a trend.
The absolute value of the entry in the first column is the index of the first changepoint in the trend (that is, the value in the first column of brks), and the entry in the second is the length of the trend, measured by the number of changepoints. The third column contains the index of the raw time series where the trend begins, and the fourth column contains the index of the raw time series where the trend ends. If the value in the first column is positive, it is an up trend, and if negative, it is a downward trend.

Patterns, which are determined from a sequence of changepoints, are specified in a matrix with four columns in which each row corresponds to a pattern.
The entry in the first column is the indicator of the pattern, and the entry in the second column is the index of the changepoint at which the pattern begins. The third column contains the index of the raw time series where the pattern begins, and the fourth column contains the index of the raw time series where the pattern ends.