## Data Smoothing : RC Filtering And Exponential Averaging

[Updated 12th March 2013]

What are RC Filtering and Exponential Averaging and how do they differ? The answer to the second part of the question is that they are the same process! If one comes from an electronics background then RC Filtering (or RC Smoothing) is the usual expression. On the other hand an approach based on time series statistics has the name Exponential Averaging, or to use the full name Exponential Weighted Moving Average. This is also variously known as EWMA or EMA.

A key advantage of the method is the simplicity of the formula for computing the next output. It takes a fraction of the previous output and one minus this fraction times the current input. Algebraically at time k the smoothed output yk is given by

$y_k = {\alpha}y_{k-1} + (1 - \alpha)x_k\,\,where\,0<\alpha <1$ (1)

As shown later this simple formula emphasises recent events, smooths out high frequency variations and reveals long term trends. Note there are two forms of the exponential averaging equation, the one above and a variant

$y_k = {\alpha}x_k + (1 - \alpha)y_{k-1}$ (2)

Both are correct. See the notes at end of the article for more details. In this discussion we will only use equation (1).

The above formula is sometimes written in the more limited fashion.

$y_k = \left(\frac{N-1}{N}\right)y_{k-1} + \left(\frac{1}{N}\right)x_k$

How is this formula derived and what is its interpretation? A key point is how do we select $\alpha$. To look into this one simple way is to consider an RC low pass filter.

Now an RC low pass filter is simply a series resistor R and a parallel capacitor C as illustrated below.

The time series equation for this circuit is

$RC\left(\frac{dy}{dt}\right) + y = x$

The product RC has units of time and is known as the time constant ,T , for the circuit. Suppose we represent the above equation in its digital form for a time series which has data taken every h seconds. We have

$T\left(\frac{y_k - y_{k-1}}{h}\right) + y_k = x_k$

Rearranging gives

$y_k = \left(\frac{T}{T+h}\right)y_{k-1} + \left(\frac{h}{T+h}\right)x_k$

or

$y_k = {\alpha}y_{k-1} + (1-\alpha)x_k$

This is exactly the same form as the previous equation. Comparing the two relationships for a we have

$\frac{T}{T+h} = \frac{N-1}{N}$

which reduces to the very simple relationship

$T = (N - 1)h$

Hence the choice of N is guided by what time constant we chose. Now equation (1) may be recognised as a low pass filter and the time constant typifies the behaviour of the filter. To see the significance of the Time Constant we need to look at the frequency characteristic of this low pass RC filter. In its general form this is

$H(\omega) = \frac{1}{1+i{\omega}RC} = \frac{1}{1+i{\omega}T}$

Expressing in modulus and phase form we have

$H{\omega} = {\frac{1}{\sqrt{1+\omega^{2T^2}}}e^{i\phi}}$

where the phase angle $\phi$ is $tan^{-1}(-{\omega}T )$.

The frequency $\omega = 1/T$ is called the nominal cut off frequency $\omega_c$. Physically it may be shown that at this frequency the power in the signal has been reduced by one half and the amplitude is reduced by the factor $(1/\sqrt{2}) = 0.7071$. In dB terms this frequency is where the amplitude has been reduced by 3dB.

Clearly as the time constant T increases so then the cut off frequency ${\omega}_c$ reduces and we apply more smoothing to the data, that is we eliminate the higher frequencies.

It is important to note that the frequency response is expressed in radians/second. That is there is a factor of $2\pi$ involved. For example choosing a time constant of 5 seconds gives an effective cut off frequency of $1/(2\pi * 5) = 0.032Hz$. One popular use of RC smoothing is to simulate the action of a meter such as used in a Sound Level Meter. These are generally typified by their time constant such as 1 second for S types and 0.125 seconds for F types. For these 2 cases the effective cut off frequencies are 0.16Hz and 1.27Hz respectively.

Actually it is not the time constant we usually wish to select but those periods we wish to include. Suppose we have a signal where we wish to include features with a P second period. Now a period P is a frequency $f_p = 1/P$. We could then choose a time constant T given by $T=P/(2*\pi)$ . However we know that we have lost about 30% of the output (-3dB) at $f_p$. Thus choosing a time constant which exactly corresponds to the periodicities we wish to keep is not the best scheme. It is usually better to choose a slightly higher cut off frequency, say $1.5f_p$ . The time constant is then $T=P/(3*\pi)$ which in practical terms is similar to $T = P/10$. This reduces the loss to around 15% at this periodicity. Hence in practical terms to retain events with a periodicity of $P$ or greater then choose a time constant of $(P/10)$. This will include the effects of periodicities of down to about $0.6P$. For example if we wish to include the effects of events happening with say an 8 second period (= 0.125Hz) then choose a time constant of 0.8 seconds. This gives a cut off frequency of approximately 0.2Hz so that our 8 second period is well in the main pass band of the filter. If we were sampling the data at 20 times/second (h = 0.05) then the value of N is (0.8/0.05) = 16 and $\alpha = (16/17) = 0.9412$.

This gives some insight into how to set $\alpha$. Basically for a known sample rate it typifies the averaging period and selects which high frequency fluctuations will be ignored.

By looking at the expansion of the algorithm we can see that it favours the most recent values, and also why it is referred to as exponential weighting. We have

$y_k = (1-\alpha)x_k + {\alpha}y_{k-1}$

Substituting for yk-1 gives

$y_k = (1 - \alpha)x_k + \alpha\left[(1-\alpha)x_{k-1} + {\alpha}y_{k-2}\right]
=(1 - \alpha)(x_k + {\alpha}x_{k-1}) + \alpha_2y_{k-2}$

Repeating this process several times leads to

$y_k = (1-\alpha)(x_k + {\alpha}x_{k-1} + {\alpha}^2x_{k-2}+ \dots + {\alpha}^Nx_{k-N})+{\alpha}^{N+1}y_{k-N-1}$

Because $\alpha$ is in the range $0<\alpha<1$ then clearly the terms to the right become smaller and behave like a decaying exponential. That is the current output is biased towards the more recent events but the larger we choose T then the less bias.

In summary we see that the simple formula $y_k= {\alpha}y_{k-1} + (1-\alpha)x_k$

• emphasises recent events
• smoothes out high frequency (short period) events
• reveals long term trends

## Appendix 1 – Alternate forms of the equation

Caution There are two forms of the exponential averaging equation that appear in the literature. Both are correct and equivalent.

The first form as shown above is $y_k = {\alpha}_1y_{k-1} + (1 - {\alpha}_1)x_k$…(A1)

The alternate form is $y_k = {\alpha}_{2}x_k + (1 - {\alpha}_2)y_{k-1}$ …(A2)

Note the use of ${\alpha}_1$ in the first equation and ${\alpha}_2$ in the second equation. In both equations ${\alpha}_1$ and ${\alpha}_2$ are values between zero and unity.

Earlier ${\alpha}_1$ was defined as

${\alpha}_1 = \big(\frac{T}{T+h}\big)$

which gives

$(1-{\alpha}_1)=\big(\frac{h}{T+h}\big)$.

Now choosing to define

${\alpha}_2 = \big(\frac{h}{T+h}\big)$

gives

$(1 - {\alpha}_2)=\big(\frac{T}{T+h}\big)$.

Hence the alternate form of the exponential averaging equation is

$y_k={\alpha}_2x_k+(1-{\alpha}_2)y_{k-1}$

where

${\alpha}_2 = (1-{\alpha}_1)= \big(\frac{h}{h+T}\big)$.

In physical terms it means that the choice of form one uses depends on how one wants to think of either taking ${\alpha}$ as the feed back fraction [equation (A1)] or as the fraction of the input [equation (A2)].

The first form is slightly less cumbersome in showing the RC filter relationship, and leads to a simpler understanding in filter terms.

The following two tabs change content below.

#### Dr Colin Mercer

Chief Signal Processing Analyst at Prosig
Dr Colin Mercer was formerly at the Institute of Sound and Vibration Research (ISVR), University of Southampton where he founded the Data Analysis Centre. He then went on to found Prosig in 1977. Colin retired as Chief Signal Processing Analyst at Prosig in December 2016. He is a Chartered Engineer and a Fellow of the British Computer Society.

## 19 thoughts on “Data Smoothing : RC Filtering And Exponential Averaging”

1. Chris Mason

Marco, thank you for pointing that out. I think this is one of our older articles that has been transferred from an old word processing document. Obviously, the editor (me!) failed to spot that the pi had not been transcribed correctly. It will be corrected shortly.

2. Chris Mason

Mike, thanks for spotting that. I have just checked back to Dr Mercer’s original technical note in our archive and it seems that there was error made when transferring the equations to the blog. We will correct the post. Thank you for letting us know

3. Julia Truchsess

Thank you thank you thank you. You could read 100 DSP texts without finding anything saying that an exponential averaging filter is the equivalent of an R-C filter.

4. Alan

hmm, do you have the equation for an EMA filter correct? is it not Yk = aXk + (1-a)Yk-1; rather than Yk = aYk-1 + (1-a)Xk

1. Dr Colin Mercer Post author

Alan,
Both forms of the equation appear in the literature, and both forms are correct as I will show below. The point you make is important one because using the alternate form means that the physical relationship with an RC filter is less apparent, moreover the interpretation of the meaning of a shown in the article is not appropriate for the alternate form.

First let us show both forms are correct. The form of the equation that I have used is

$y_k = {\alpha}_1y_{k-1} + (1 - {\alpha}_1)x_k$ …(1)

and the alternate form which does appear in many texts is

$y_k = {{\alpha}_2}x_k + (1 - {\alpha}_2)y_{k-1}$ …(2)

Note in the above I have used ${\alpha}_1$ in the first equation and ${\alpha}_2$ in the second equation. The equality of both forms of the equation is shown mathematically below taking simple steps at a time. What is not the same is the value used for ${\alpha}$ in each equation.

In both forms ${\alpha}$ is a value between zero and unity. First rewrite equation (1) replacing ${\alpha}_1$ by ${\beta}$. This gives

$y_k = {\beta}y_{k-1} + (1 - \beta)x_k$ …(1A)

Now define $\beta = (1 - {\alpha}_2)$ and so we also have ${\alpha}_2 = (1 - \beta)$. Substituting these into equation (1A) gives

$y_k = (1 - {\alpha}_2)y_{k-1} + {\alpha}_2x_k$ …(1B)

And finally re-arranging gives

$y_k = {\alpha}_{2}x_k + (1 - {\alpha}_2)y_{k-1}$ …(1C)

This equation is identical to the alternate form given in equation (2).

Put more simply ${\alpha}_2 = (1 - {\alpha}_1)$.

In physical terms it means that the choice of form one uses depends on how one wants to think of either taking $\alpha$ as the feed back fraction [equation (1)] or as the fraction of the input [equation (2)].

As mentioned above I have used the first form as it is slightly less cumbersome in showing the RC filter relationship, and leads to simpler understanding in filter terms.

However omitting the above is, in my view, a deficiency in the article as other people could make an incorrect inference so a revised version will appear soon.

Colin

5. Michael

I think another reason the first formulation is nice is alpha maps to ‘smoothness’: a higher choice of alpha means a ‘more smooth’ output.

1. Dr Colin Mercer Post author

Michael
Thanks for observation – I will add to the article something on those lines as it is always better in my view to relate to physical aspects.

6. Ian Lewis

Dr Mercer,
Excellent article, thank you. I have a question regarding the time constant when used with an rms detector as in a sound level meter that you refer to in the article.
If I use your equations to model an exponential filter with Time Constant 125ms and use an input step signal, I do indeed get an output that, after 125ms, is 63.2% of the final value.

However, if I square the input signal and put this through the filter, then I see that I need to double the time constant in order for the signal to reach 63.2% of its final value in 125ms.

Can you let me know if this is expected.
Many thanks.
Ian

7. Dr Colin Mercer Post author

Ian,
If you square a signal like a sine wave then basically you are doubling the frequency of its fundamental as well as introducing lots of other frequencies. Because the frequency has in effect been doubled then it is being ‘reduced’ by a greater amount by the low pass filter. In consequence it takes longer to reach the same amplitude.

The squaring operation is a non linear operation so I do not think it will always double precisely in all cases but it will tend to double if we have a dominant low frequency. Also note that the differential of a squared signal is twice the differential of the “un-squared” signal.

I suspect you might be trying to get a form of mean square smoothing, which is perfectly fine and valid. It might be better to apply the filter and then square as you know the effective cutoff. But if all you have is the squared signal then using a factor of 2 to modify your filter alpha value will approximately get you back to the original cut off frequency, or putting it a bit simpler define your cutoff frequency at twice the original.

1. Ian Lewis

Thanks for your response Dr Mercer. My question was really trying to get at what is actually done in an rms detector of a sound level meter. If the time constant is set for ‘fast’ (125ms) I would have thought that intuitively you would expect a sinusoidal input signal to produce an output of 63.2% of its final value after 125ms, but since the signal is being squared before it gets to the ‘mean’ detection, it will actually take twice as long as you explained.

8. Colin

The principle objective of the article is to show the equivalence of RC filtering and exponential averaging. If we are discussing the integration time equivalent to a true rectangular integrator then you are correct that there is a factor of two involved. Basically if we have a true rectangular integrator that integrates for Ti seconds the the equivalent RC integator time to achieve the same result is 2RC seconds. Ti is different from the RC ‘time constant’ T which is RC. Thus if we have a ‘Fast’ time constant of 125 msec, that is RC = 125 msec then that is equivalent to a true integration time of 250 msec

9. andrea

Dr Mercer,

Thank you for the article, it was very helpful. There are some recent papers in neuroscience that use a combination of EMA filters (short-windowed EMA – long-windowed EMA) as a band-pass filter for real time signal analysis. I would like to apply them, but I am struggling with the window sizes different research groups have used and its correspondence with the cutoff frequency.

Let’s say I want to keep all the frequencies below 0.5Hz (aprox) and that I acquire 10 samples/ second. This means that fp =0.5Hz; P= 2s; T = P/10=0.2;
h= 1/fs=0.1;

if:

T/(T+h)=((N-1)/N),
(T/h) + 1 = N
0.2/0.1 + 1 = N

N=3

Thefore, the window size I should be using should be N=3. Is this reasoning correct?

1. Dr Colin Mercer Post author

Before answering your question I must comment on the use of two high pass filters to form a band pass filter. Presumably they operate as two separate streams, so one result is the content from say $f_{low}$ to half sample rate and the other is the content from say $f_{high}$ to half sample rate. If all that is being done is the difference in mean square levels as indicating the power in the band from $f_{low}$ to $f_{high}$ then it may be reasonable if the two cut off frequencies are sufficiently far apart but I expect that the people using this technique are trying to simulate a narrower band filter. In my view that would be unreliable for serious work, and would be a source of concern.

Just for reference a band pass filter is a combination of a low frequency High Pass filter to remove the low frequencies and a high frequency Low pass filter to remove the high frequencies.

There is of course a low pass form of an RC filter, and hence a corresponding EMA. Perhaps though my judgement is being overcritical without knowing all the facts! So could you please send me some references to the studies you mentioned so I may critique as appropriate. Maybe they are using a low pass as well as a high pass filter.

Now turning to your actual question about how to determine N for a given target cut-off frequency I think it is best to use the basic equation T=(N-1)h. The discussion about periods was aimed at giving people a feel of what was going on. So please see the derivation below.

We have the relationships $T=(N-1)h$ and $T=1/2{\pi}{f_c}$ where $f_c$ is the notional cut-off frequency and h is the time between samples, Clearly $h = 1/{f_s}$ where $f_s$ is the sample rate in samples/sec.

Rearranging T=(N-1)h into a suitable form to include the cut-off frequency, $f_c$ and the sample rate, $f_s$, is shown below.

$N=1+\frac{T}{h}=1+\frac{1}{2{\pi}{f_c}h}=1+\frac{1}{2{\pi}(\frac{f_c}{f_s})}$

So using $f_c = 0.5Hz$ and $f_s = 10$ samples/sec so that $(fc/fs) = 0.05$ gives

$N=1+\frac{1}{2{\pi}*0.05}=4.1831$

So the closest integer value is 4. Re-arranging the above we have

$f_c = \frac{f_s}{2{\pi}(N-1)}$

So with N=4 we have $f_c = 0.5307 Hz$. Using N=3 gives an $f_c$ of 0.318 Hz. Note with N=1 we have a complete copy with no filtering.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1. We welcome any feedback, questions or comments
CLOSE
Optimization WordPress Plugins & Solutions by W3 EDGE