March 29, 2012

simple algorithm for 2D-Peakfinding

When it comes to analyzing data, a common task is to identify peaks in an x-y dataset. There are several ways to do that. Maybe the first idea that would come into one's mind is to look for zeros in the derivatives. This works fine for smooth theoretical curves but fails when noise is present (as it is certainly all kinds of experimental data).
To get rid of noise you can apply a low-pass filter on your data, but in some cases this is not desired. A simple approach is to use an algorithm that can does

  1. find peak candidates that are above some user-defined threshold level
  2. find out for each candidate whether it is the local maximum of a user defined environment
  3. return the valid peaks
the algorithm is able to find peaks even in noisy data

I've implemented such piece of code for octave that does the job. The code below created the picture shown. The function expects three input parameters: the environment for maxima, the data list and a threshold value.

function peaklist = SimplePeakFind ( environment, data,  thresh)
% ----------------------------------------------------------
% function peaklist = SimplePeakFind (  environment, data, thresh)
% finds the positions of peaks in a
% given data list. valid peaks
% are searched to be greater than 
% the threshold value.
% peaks are searched to be the maximum in a certain environment
% of values in the list.
%  ----------------------------------------------------------

  listlength = length( data );  
  peaklist = zeros(listlength,1); % create blank output
  SearchEnvHalf = max(1,floor( environment/2));
  % we only have to consider date above the threshold 
  dataAboveThreshIndx = find (data >= thresh);
  for CandidateIndx = 1 : length(dataAboveThreshIndx)
    Indx = dataAboveThreshIndx(CandidateIndx);  
    % consider list boundaries
    minindx = Indx - SearchEnvHalf;
    maxindx = Indx + SearchEnvHalf;
    if (minindx < 1)
      minindx = 1;
    if ( maxindx>listlength)
      maxindx = listlength;
    % data( CandidateIndx ) == maximum in environment?
    if (data(Indx) == max( data( minindx : maxindx)))
      peaklist(Indx) = Indx;      
  % finally shrink list to non-zero-values
  peaklist = peaklist( find( peaklist));

In many cases, this works fine for me. Let's look at a simple test case: we numerically create a number of gaussian pulses with random widths, positions and heights. Then we add noise. 

clear all;
more off;
N = 1500;
T = linspace(-20,20,N);  %x-vector
% create some random gaussian peaks
Npeaks = 9
ppos = T(round(rand(1,Npeaks)*N))
pheight = rand(1,15)*1;
pdur    = rand(1,15)*1;
data = zeros(1,N);
for i = 1:Npeaks
  data = data + pheight(i) * exp(- ((T-ppos(i))/pdur(i)).^2);  
% apply noisenoise = rand(1,N)*0.1;
datanoise = data+noise;
% Find peaks that are maxima in an
% environment of 25 data points and that
% bigger than 0.15 (noise level)
peaklist = SimplePeakFind(25, datanoise, 0.2);
%Plot result
     T(peaklist), datanoise(peaklist),
legend("noisy data","found peaks")
xlabel "x"
ylabel "y"

The result is shown in the picture above: all 9 peaks are identified correctly. Depending on your data, you'll have to play around with the parameters environment and thresh. Having the the peak positions, further data analysis (like curve fitting etc) is much easier.

Links: of course people already thought of this problem:

No comments:

Post a Comment