Cleaning Up Data

When we have a very noisy signal with a large number of spikes and signal bursts then if all else fails try Median Filtering. This is a technique often used in cleaning up pictures. The operation is almost childishly simple in concept but we will save the details until we have examined an example.

The data below is a mess! In reality the best thing would be to redo the data capture but regrettably the test was destructive and this is all we have so we have to see what we can get.

Trying to edit the ‘spikes’ one at a time is very time consuming and not likely to be any better. Actually we have about a million data values in the above signal. Let us look in more detail at a couple of spikes.

What appeared to be just a single spike is actually an energy burst. Looking at one of the apparent groups of spikes reveals the situation as shown below.

Basically there is quite a lot going on with a mixture of spikes and some quite significant sustained deviations.

The Median filter will eliminate all of these but like any filtering operation there is a reduction in bandwidth. The Median Filter has a length associated with it. For the present purposes we have identified the length of the filter by the number of samples. The set of time histories below show how the increasing length of the median filter removes more of the “spikes”.

The values following the signal names such as the 191 in CambioX191 are the length of the Median filter in points. As the sample rate was 33000 points per second the data values were 30 microseconds apart, so that the length of the longest filter was just over 10 milliseconds.

The Auto Spectral Density analysis of all of the signals is shown below. The effect of the successively longer filters is quite clear. There has been a significant reduction in noise level as the expense of bandwidth.

Key for figure above
Blue – Original
Green – 21 point filter
Light cyan – 111 point filter
Red – 191 point filter
Purple – 351 points filter

It is interesting to look further at the low frequency end.

All curves show the two dominant resonances or forced responses. Note that the ASD with the 21 point filter actually shows an increased noise level initially. This is no doubt due to the increased randomness caused by the median process without really getting rid of most spikes. By the time all the spikes in the time history have been removed then the resultant spectrum looks very reasonable.

Overlays of the original and final signals are shown below.

The “recovered” signal appears quite reasonable and looks as if it had been through a conventional filter but without the “ringing” that a conventional filter would generate at each “spike”.

Technically finding the median of a set of data values is quite simple. First the data is sorted into order. The median value is then that value which is at the centre of the list. For example suppose we have the following seven values.

757 518 635 81 612 875 1129

After ordering they are

1129 875 757 635 612 518 81

Thus the median value is 635. This is simple when we have an odd number of values. If we have an even number of values in the set then the median value is the numerical average of the two values spanning the centre. If in the above example we had taken the first six values of the unordered set (ie the 1129 was omitted) then the median value would be (635 + 612)/2 = 623.5.

A median filter uses a short buffer of K points, sorts these into numerical order and emits the value at the centre of the ordered set. The buffer is then moved on one point at a time on the original data. Using a three point filter on the above example data would give the following outputs: 635 518 612 612 875.

The median filter always rejects the maximum and minimum values within the length being considered. This makes it good at spike elimination.

What has the median filter done? We know the median filter will remove pure single point spikes. But a single point spike is like a delta function and this has a broadband noise spectrum. So one effect is that removing identifiable spikes improves the overall signal to noise ration. Consider now is we had a continuously rising, or falling, signal such as a straight line. In this case apart from the ends the median filter will obviously have no effect as the data is already sorted. Now if we have a low frequency sinewave then locally the data is again monotonic and is already sorted. It will however reduce the amplitude by effectively clipping the peaks. Thus the median filter acts as a low pass filter. The longer the filter length then the lower is the effective cut off frequency. However if we use a median filter which is too short for the average “energy burst” or “spike” then we may actually introduce more randomness at the low frequency end by “emphasising” the low frequency content of the energy bursts. This is illustrated below.

Re ordering the data will undoubtedly increase some noise levels but the overall effect is to reduce high frequency noise due to spikes and energy bursts. The median filter has definite uses.

The following two tabs change content below.

Dr Colin Mercer

Chief Signal Processing Analyst at Prosig
Dr Colin Mercer was formerly at the Institute of Sound and Vibration Research (ISVR), University of Southampton where he founded the Data Analysis Centre. He then went on to found Prosig in 1977. Colin retired as Chief Signal Processing Analyst at Prosig in December 2016. He is a Chartered Engineer and a Fellow of the British Computer Society.

Latest posts by Dr Colin Mercer (see all)

One thought on “Cleaning Up Data

Leave a Reply

  1. We welcome any feedback, questions or comments
CLOSE
CLOSE
Optimization WordPress Plugins & Solutions by W3 EDGE