<TITLE>
Programs for Smoothing Time Series and Detecting Patterns
</TITLE>
<style>

 pre {
     font-size: 15px;
 }

 .boxed {
  background-color: lightgray;
}

</style>

<H2><font color=red>
Programs for Smoothing Time Series and Detecting Patterns
</font></H2>
<a href="http://mason.gmu.edu/~jgentle/">
James Gentle </a>
<p>

Meaningful patterns in time series are generally patterns in a smoothed version
of the time series, rather than in the raw data.
<p>
There are several ways to smooth a time series, and each may type of smoothing 
may yield different patterns.

<hr>

<H3><font color=red>
Methods of Smoothing Time Series
</font></H3>
A simple method of smoothing a time series is just to subsample the series; for
example, replace daily data with weekly data.  Although this method is subject
to sampling error, it may reduce the variance of the time series.
Under a geometric Brownian motion model, for example, it reduces the volatility
proportional to the ratio of the square roots of the intervals.
<p>
Smoothing of a time series {x<sub>t</sub>} yields a time series {s<sub>t</sub>}
which either has less variability than the unsmoothed time series 
or else corresponds more closely to some model.
<p>
A common smoothing method is to use a running average: 
<br>
s<sub>t</sub> = <tt>ave</tt>(x<sub>r</sub>), 
<br>
where <tt>ave</tt> represents some kind of average and r ranges over some
set of indices near t.  
<br>
There are three
variables in this method: the type of average, the length of the averaging
window, and the type of weighting within the window.  In practice,
the most common average is a mean and the most common weighting is equal
weighting, which is called a "moving average", or sometimes "simple moving average"
for emphasis.  In a moving average, the only smoothing parameter is the window width.
Another type of running average has a window width that at each point goes back
to the beginning of the series.  Running averages of this type are called
"exponential smoothers". The simple form of exponential smoothing is
<br>
s<sub>0</sub> = x<sub>0</sub>
<br>
s<sub>t</sub> = ax<sub>t</sub>+(1-a)s<sub>t-1</sub>, for 0&lt;a&lt;1 and t&gt;0.

<H4><font color=red>
Alternating Trends Smoothing
</font></H4>

Another smoothing method that also depends on a type of window width is
called "alternating trends smoothing", or ATS.
The smoothing model in ATS is an alternating sequence of up and down linear trends.
<p>
Implicit in ATS is the existence of "changepoints", and the smoothed time series
are the points lying on the broken line segments connecting the raw time series 
at the changepoints.  
Different window widths will identify different changepoints.  
The algorithm for ATS is given in
<a href="http://mason.gmu.edu/~jgentle/papers/FindingPatternsTimeSeriesDraft.pdf"> 
Finding Patterns in Time Series.</a>
<p>
The smoothing parameter in ATS, called "step size", determines how far ahead
the points are considered for determining that the sign of the trend (up or down)
has changed.  Different step sizes yield different sets of changepoints.
<p>
The changepoints determined by ATS can be used to identify patterns such as
"head-and-shoulders".

<H4><font color=red>
Bounding Lines
</font></H4>

A different approach to smoothing is to determine line segments that bound the
time series.  An upper bounding line is one that is above the raw data,
and a lower bounding line is one that is below the raw data. 
The algorithm for bounding lines is given in
<a href="http://mason.gmu.edu/~jgentle/papers/FindingPatternsTimeSeriesDraft.pdf"> 
Finding Patterns in Time Series.</a>  Bounding lines are generally determined
for a given segment of a time series, possibly the subseries between two given
changepoints.
<p>
Bounding lines can be required to be completely above or completely below the 
raw data.  A two-component smoothing parameter for bounding lines can allow a 
certain number of the points to be a certain distance outside the bounding
lines. 
<p>
The bounding lines can be used to identify patterns such as
converging or diverging trends.

<hr>

<H3><font color=red>
Trends and Patterns in Time Series
</font></H3>

Trends and patterns in time series vary depending on the type of smoothing
done prior to identifying the trends or patterns.
<p>
A "trend", which is either up or down, may be composed of multiple sub-trends.
The values of the time series at the changepoints between sub-trends determine
the trend.  In an up trend, for example, the values of the time series at the
changepoints constitute a sequence of "increasing highs and increasing lows".
<p>
We will identify trends and patterns based on changepoints as determined by
ATS or on bounding lines smoothing. 

<hr>

<H3><font color=red>
R Functions for Trends and Patterns in Time Series
</font></H3>

Two R functions for smoothing sequenced univariate data are <tt>ATS</tt> and 
<tt>BoundingLines</tt>.  
<p>
Two basic functions for pattern identification given a sequence of changepoints
are <tt>Trends</tt> and <tt>Patterns</tt>.  
These functions are designed to accept the output from <tt>ATS</tt>.
<p>
Patterns are identified by integers.  They are described in the documentation
for <tt>Patterns</tt>.  For example, 1 represents a type of "head-and-shoulders" pattern,
which is described explicitly in the  documentation
for <tt>Patterns</tt>.
<p>
These four functions will
optionally add lines to a graph of the raw time series.
<p>
In addition, there is a
function designed to loop through a sequence of step sizes for ATS smoothing,
followed by identification of any trends or patterns for each specified step
size.
<p>
<hr>
The functions are
<ul>
<li><tt>ATS</tt>  - Determines changepoints in linear trends of <font color=red>
sequenced univariate data.</font>
<div class="boxed">
<pre>
  ATS(x,step=0,segments=FALSE,offset=0,ltype=1,color="red",char="x")
</pre>
</div>
<li><tt>BoundingLines</tt> - Computes an upper or lower bounding line for <font color=red>
sequenced univariate data.</font>
<div class="boxed">
<pre>
  BoundingLines(x,env=0,segments=FALSE,offset=0,ltype=1,color="blue")
</pre>
</div>
<li><tt>Trends</tt> - Determines beginning and ending points of trends, given a <font color=blue>
sequence of changepoints.</font>
<div class="boxed">
<pre>
  Trends(brks,minlen=6,segments=FALSE,offset=0,ltype=1,color="blue")
</pre>
</div>
<li><tt>Patterns</tt> - Determines beginning and ending points of patterns, given a <font color=blue>
sequence of changepoints.</font>
<div class="boxed">
<pre>
  Patterns(brks,pattern=1,segments=FALSE,offset=0,ltype=1,color="red")
</pre>
</div>
<li><tt>FindTrendsPatterns</tt>  - Determines trends and patterns in  <font color=red>
sequenced univariate data,</font> for given step sizes.
<div class="boxed">
<pre>
  FindTrendsPatterns(x,steps=0,minlen=NULL,whichpats=NULL)
</pre>
</div>
</ul>
<hr>
The arguments and the output of these functions have a consistency that
facilitates their use together.
<p>
The <font color=red>"sequenced univariate data"</font>, <tt>x,</tt> 
is just a numeric vector.  It is called the "raw time series".
It can be of class <tt>time series.</tt>
<p>
The <font color=blue>"sequence of changepoints"</font>, <tt>brks,</tt> 
is a matrix whose first column contains the indices of the changepoints in
the raw time series and whose second column contains the values of
the raw time series at the changepoints.  This is the form of the output of
<tt>ATS.</tt>
<div class="boxed">
<pre>
##    brks      Matrix with two columns containing changepoint information.
##              It is assumed that the rows represent changepoints of alternating
##              trends (ATS).
##              The first column is the index of the changepoint (with no offset).
##              The second column is the value of x at the changepoint.
</pre>
</div>
<p>
The functions that add lines to plots have a common set of arguments:
<p>
<div class="boxed">
<pre>
## Optional printing arguments; these arguments affect only the printing.
##    segments  Logical variable indicating whether to add a bounding line 
##              segment to an existing plot of a univariate data vector 
##              against its index.
##              ** If segments=TRUE, there must be an existing plot over the
##              appropriate range.
##              ** If segments=FALSE, the additional arguments are not used.
##    offset    If segments=TRUE, the index of the original data plotted at
##              which to begin plotting of trend lines for the current series x.
##              The index of x is treated as starting at offset+1 with respect
##              to the index of time used in the original plot. 
##    ltype     If segments=TRUE, the line type, using the standard values in R.
##    color     If segments=TRUE, the line color, using the standard values in R.
</pre>
</div>
<p>
<b>Trends</b>, which are determined from a 
<font color=blue>sequence of changepoints</font>, are specified in a matrix with 
four columns in which each row corresponds to a trend.  
<br>
The absolute value of the entry in the first column
is the index of the first changepoint in the trend (that is, the  
value in the first column of brks), and the entry in the second 
is the length of the trend, measured by the number of changepoints.
The third column contains the index of the raw time series where the
trend begins, and the fourth column contains the index of the raw time 
series where the trend ends.  If the value in the first column is positive, 
it is an up trend, and if negative, it is a downward trend.
<p>
<b>Patterns</b>, which are determined from a 
<font color=blue>sequence of changepoints</font>, 
are specified in a matrix with four columns in which each row corresponds to a
pattern.  
<br>
The entry in the first column is the indicator of
the pattern, and the entry in the second column is the index 
of the changepoint at which the pattern begins.
The third column contains the index of the raw time series where the
pattern begins, and the fourth column contains the index of the raw time 
series where the pattern ends.

<hr>

<H3><font color=red>
Source Code
</font></H3>

<H4><font color=red>
R Function Sources
</font></H4>

<ul>
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/ATS.R">    
<tt>ATS.R</tt> </a>                                                                    
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/BoundingLines.R">          
<tt>BoundingLines.R</tt> </a>                                                          
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/Trends.R">                 
<tt>Trends.R</tt> </a>                                                             
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/Patterns.R">               
<tt>Patterns.R</tt> </a>                                                                                                                           
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/FindTrendsPatterns.R">     
<tt>FindTrendsPatterns.R</tt> </a> 
</ul>

<H4><font color=red>
Sample Scripts and Test Programs
</font></H4>

<ul>                                                                                                                          
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/testATS.R">                
<tt>testATS.R</tt> </a>                                                                
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/testBoundingLines.R">      
<tt>testBoundingLines.R</tt> </a>                                                                                                                  
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/testTrends.R">             
<tt>testTrends.R</tt> </a>                                                         
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/testPatterns.R">           
<tt>testPatterns.R</tt> </a>                                                           
<li><a href="http://mason.gmu.edu/~jgentle/papers/software/testFindTrendsPatterns.R"> 
<tt>testFindTrendsPatterns.R</tt> </a>                                                  
</ul>