Data Smoothing : RC Filtering And Exponential Averaging

[Updated 12th March 2013]

What are RC Filtering and Exponential Averaging and how do they differ? The answer to the second part of the question is that they are the same process! If one comes from an electronics background then RC Filtering (or RC Smoothing) is the usual expression. On the other hand an approach based on time series statistics has the name Exponential Averaging, or to use the full name Exponential Weighted Moving Average. This is also variously known as EWMA or EMA.

A key advantage of the method is the simplicity of the formula for computing the next output. It takes a fraction of the previous output and one minus this fraction times the current input. Algebraically at time k the smoothed output yk is given by

y_k = {\alpha}y_{k-1} + (1 - \alpha)x_k\,\,where\,0<\alpha <1 (1)

As shown later this simple formula emphasises recent events, smooths out high frequency variations and reveals long term trends. Note there are two forms of the exponential averaging equation, the one above and a variant

y_k = {\alpha}x_k + (1 - \alpha)y_{k-1} (2)

Both are correct. See the notes at end of the article for more details. In this discussion we will only use equation (1).

The above formula is sometimes written in the more limited fashion.

y_k = \left(\frac{N-1}{N}\right)y_{k-1} + \left(\frac{1}{N}\right)x_k

How is this formula derived and what is its interpretation? A key point is how do we select \alpha. To look into this one simple way is to consider an RC low pass filter.

Now an RC low pass filter is simply a series resistor R and a parallel capacitor C as illustrated below.

The time series equation for this circuit is

RC\left(\frac{dy}{dt}\right) + y = x

The product RC has units of time and is known as the time constant ,T , for the circuit. Suppose we represent the above equation in its digital form for a time series which has data taken every h seconds. We have

T\left(\frac{y_k - y_{k-1}}{h}\right) + y_k = x_k

Rearranging gives

y_k = \left(\frac{T}{T+h}\right)y_{k-1} + \left(\frac{h}{T+h}\right)x_k


y_k = {\alpha}y_{k-1} + (1-\alpha)x_k

This is exactly the same form as the previous equation. Comparing the two relationships for a we have

\frac{T}{T+h} = \frac{N-1}{N}

which reduces to the very simple relationship

T = (N - 1)h

Hence the choice of N is guided by what time constant we chose. Now equation (1) may be recognised as a low pass filter and the time constant typifies the behaviour of the filter. To see the significance of the Time Constant we need to look at the frequency characteristic of this low pass RC filter. In its general form this is

H(\omega) = \frac{1}{1+i{\omega}RC} = \frac{1}{1+i{\omega}T}

Expressing in modulus and phase form we have

H{\omega} = {\frac{1}{\sqrt{1+\omega^{2T^2}}}e^{i\phi}}

where the phase angle \phi is tan^{-1}(-{\omega}T ).

The frequency \omega = 1/T is called the nominal cut off frequency \omega_c. Physically it may be shown that at this frequency the power in the signal has been reduced by one half and the amplitude is reduced by the factor (1/\sqrt{2}) = 0.7071. In dB terms this frequency is where the amplitude has been reduced by 3dB.

Clearly as the time constant T increases so then the cut off frequency {\omega}_c reduces and we apply more smoothing to the data, that is we eliminate the higher frequencies.

It is important to note that the frequency response is expressed in radians/second. That is there is a factor of 2\pi involved. For example choosing a time constant of 5 seconds gives an effective cut off frequency of 1/(2\pi * 5) = 0.032Hz. One popular use of RC smoothing is to simulate the action of a meter such as used in a Sound Level Meter. These are generally typified by their time constant such as 1 second for S types and 0.125 seconds for F types. For these 2 cases the effective cut off frequencies are 0.16Hz and 1.27Hz respectively.

Actually it is not the time constant we usually wish to select but those periods we wish to include. Suppose we have a signal where we wish to include features with a P second period. Now a period P is a frequency f_p = 1/P. We could then choose a time constant T given by T=P/(2*\pi) . However we know that we have lost about 30% of the output (-3dB) at f_p. Thus choosing a time constant which exactly corresponds to the periodicities we wish to keep is not the best scheme. It is usually better to choose a slightly higher cut off frequency, say 1.5f_p . The time constant is then T=P/(3*\pi) which in practical terms is similar to T = P/10. This reduces the loss to around 15% at this periodicity. Hence in practical terms to retain events with a periodicity of P or greater then choose a time constant of (P/10). This will include the effects of periodicities of down to about 0.6P. For example if we wish to include the effects of events happening with say an 8 second period (= 0.125Hz) then choose a time constant of 0.8 seconds. This gives a cut off frequency of approximately 0.2Hz so that our 8 second period is well in the main pass band of the filter. If we were sampling the data at 20 times/second (h = 0.05) then the value of N is (0.8/0.05) = 16 and \alpha = (16/17) = 0.9412.

This gives some insight into how to set \alpha. Basically for a known sample rate it typifies the averaging period and selects which high frequency fluctuations will be ignored.

By looking at the expansion of the algorithm we can see that it favours the most recent values, and also why it is referred to as exponential weighting. We have

y_k = (1-\alpha)x_k + {\alpha}y_{k-1}

Substituting for yk-1 gives

y_k = (1 - \alpha)x_k + \alpha\left[(1-\alpha)x_{k-1} + {\alpha}y_{k-2}\right]<br /> =(1 - \alpha)(x_k + {\alpha}x_{k-1}) + \alpha_2y_{k-2}

Repeating this process several times leads to

y_k = (1-\alpha)(x_k + {\alpha}x_{k-1} + {\alpha}^2x_{k-2}+ \dots + {\alpha}^Nx_{k-N})+{\alpha}^{N+1}y_{k-N-1}

Because \alpha is in the range 0<\alpha<1 then clearly the terms to the right become smaller and behave like a decaying exponential. That is the current output is biased towards the more recent events but the larger we choose T then the less bias.

In summary we see that the simple formula y_k= {\alpha}y_{k-1} + (1-\alpha)x_k

  • emphasises recent events
  • smoothes out high frequency (short period) events
  • reveals long term trends

Appendix 1 – Alternate forms of the equation

Caution There are two forms of the exponential averaging equation that appear in the literature. Both are correct and equivalent.

The first form as shown above is y_k = {\alpha}_1y_{k-1} + (1 - {\alpha}_1)x_k…(A1)

The alternate form is y_k = {\alpha}_{2}x_k + (1 - {\alpha}_2)y_{k-1} …(A2)

Note the use of {\alpha}_1 in the first equation and {\alpha}_2 in the second equation. In both equations {\alpha}_1 and {\alpha}_2 are values between zero and unity.

Earlier {\alpha}_1 was defined as

{\alpha}_1 = \big(\frac{T}{T+h}\big)

which gives


Now choosing to define

{\alpha}_2 = \big(\frac{h}{T+h}\big)


(1 - {\alpha}_2)=\big(\frac{T}{T+h}\big).

Hence the alternate form of the exponential averaging equation is



{\alpha}_2 = (1-{\alpha}_1)= \big(\frac{h}{h+T}\big).

In physical terms it means that the choice of form one uses depends on how one wants to think of either taking {\alpha} as the feed back fraction [equation (A1)] or as the fraction of the input [equation (A2)].

The first form is slightly less cumbersome in showing the RC filter relationship, and leads to a simpler understanding in filter terms.

The following two tabs change content below.

Dr Colin Mercer

Chief Signal Processing Analyst (Retired) at Prosig
Dr Colin Mercer was formerly at the Institute of Sound and Vibration Research (ISVR), University of Southampton where he founded the Data Analysis Centre. He then went on to found Prosig in 1977. Colin retired as Chief Signal Processing Analyst at Prosig in December 2016. He is a Chartered Engineer and a Fellow of the British Computer Society.

Latest posts by Dr Colin Mercer (see all)

0 0 vote
Article Rating
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Newest Most Voted
Inline Feedbacks
View all comments
13 years ago

Excellent! Exactly what I was looking for.

marco clark
marco clark
10 years ago

I think you want to change the ‘p’ to the symbol for pi.

Chris Mason
10 years ago

Marco, thank you for pointing that out. I think this is one of our older articles that has been transferred from an old word processing document. Obviously, the editor (me!) failed to spot that the pi had not been transcribed correctly. It will be corrected shortly.

9 years ago

it’s a very good article explanation about the exponential averaging!

Mike Z
Mike Z
9 years ago

I believe there is an error in the formula for T. It should be T = h*(N-1), not T = (N-1)/h.

Chris Mason
9 years ago

Mike, thanks for spotting that. I have just checked back to Dr Mercer’s original technical note in our archive and it seems that there was error made when transferring the equations to the blog. We will correct the post. Thank you for letting us know

Julia Truchsess
Julia Truchsess
9 years ago

Thank you thank you thank you. You could read 100 DSP texts without finding anything saying that an exponential averaging filter is the equivalent of an R-C filter.


[…] A good explanation of how an EWMA filter is mathematically identical to a simple RC low-pass filter can be found at this blog entry: Data Smoothing: RC Filtering and Exponential Averaging […]

7 years ago

hmm, do you have the equation for an EMA filter correct? is it not Yk = aXk + (1-a)Yk-1; rather than Yk = aYk-1 + (1-a)Xk

Colin King
7 years ago

I’ve always wondered about this, thanks for describing it so clearly.

7 years ago

I think another reason the first formulation is nice is alpha maps to ‘smoothness’: a higher choice of alpha means a ‘more smooth’ output.

Ian Lewis
Ian Lewis
5 years ago

Dr Mercer,
Excellent article, thank you. I have a question regarding the time constant when used with an rms detector as in a sound level meter that you refer to in the article.
If I use your equations to model an exponential filter with Time Constant 125ms and use an input step signal, I do indeed get an output that, after 125ms, is 63.2% of the final value.

However, if I square the input signal and put this through the filter, then I see that I need to double the time constant in order for the signal to reach 63.2% of its final value in 125ms.

Can you let me know if this is expected.
Many thanks.

Ian Lewis
Ian Lewis
5 years ago

Thanks for your response Dr Mercer. My question was really trying to get at what is actually done in an rms detector of a sound level meter. If the time constant is set for ‘fast’ (125ms) I would have thought that intuitively you would expect a sinusoidal input signal to produce an output of 63.2% of its final value after 125ms, but since the signal is being squared before it gets to the ‘mean’ detection, it will actually take twice as long as you explained.

5 years ago

The principle objective of the article is to show the equivalence of RC filtering and exponential averaging. If we are discussing the integration time equivalent to a true rectangular integrator then you are correct that there is a factor of two involved. Basically if we have a true rectangular integrator that integrates for Ti seconds the the equivalent RC integator time to achieve the same result is 2RC seconds. Ti is different from the RC ‘time constant’ T which is RC. Thus if we have a ‘Fast’ time constant of 125 msec, that is RC = 125 msec then that is equivalent to a true integration time of 250 msec

5 years ago

Dr Mercer,

Thank you for the article, it was very helpful. There are some recent papers in neuroscience that use a combination of EMA filters (short-windowed EMA – long-windowed EMA) as a band-pass filter for real time signal analysis. I would like to apply them, but I am struggling with the window sizes different research groups have used and its correspondence with the cutoff frequency.

Let’s say I want to keep all the frequencies below 0.5Hz (aprox) and that I acquire 10 samples/ second. This means that fp =0.5Hz; P= 2s; T = P/10=0.2;
h= 1/fs=0.1;


(T/h) + 1 = N
0.2/0.1 + 1 = N


Thefore, the window size I should be using should be N=3. Is this reasoning correct?

Would love your thoughts, please comment.x