How do you process ras data from auryn simulations?

Discussions about how to analyze Auryn output data
asinha
Posts: 37
Joined: Wed Oct 15, 2014 5:12 pm
Location: Hatfield, Hertfordshire, UK
Contact:

How do you process ras data from auryn simulations?

Post by asinha » Fri Jan 30, 2015 12:11 pm

As my simulations get more and more complex and run longer, the amount of ras data that I get from auryn is also steadily increasing. At the moment, the resultant combined ras file that I get is 4.9gigs, but I expect it to at least double by next week. I was wondering what methods people use to process this data speedily and efficiently? Maybe some of us could get together and come up with auryn-utils, or just examples that will help auryn users process their data. Auryn is blazingly quick, but after the simulations, we also need methods to analyse the data that we get from our simulations.

Code: Select all

$ du -hsc 201501271755.e.ras 
4.9G    201501271755.e.ras
Some things that I've been looking at:
  • memory mapped files
  • parallel processing of data too (if you need to process different time snapshots like I do)
One suggestion that I've received from this question I asked is that it may be more efficient to process the data if it's in binary format rather than the current plain text format. I haven't done my research on this yet, so I cannot say if this is true. If it is true, I don't think it'll be much work to override the monitors to provide outputs in binary rather than the current streams. I can probably even submit patches for these.

I tried R too, but after spending about a week on it, I wasn't impressed - probably because I didn't use it right. Since R uses cpp underneath anyway, I gave up and moved back to cpp itself.

Comments?

Lexn
Posts: 1
Joined: Tue Oct 14, 2014 1:25 pm

Re: How do you process ras data from auryn simulations?

Post by Lexn » Fri Jan 30, 2015 1:28 pm

Hi Asinha,

Efficient storage of time-series data (which spike times are effectively) seems to be a non-trivial problem..

After a little research on this, I have the impression it is quite efficient and widely common to store many of them in a database. People seem divided between NoSql databases like http://www.mongodb.org/ and relational databases (http://blogs.enterprisedb.com/2014/09/2 ... r-reality/, https://news.ycombinator.com/item?id=6712703). This makes especially sense if you plan to handle several runs and do analysis over them. I had good experience with PostgreSQL doing exactly this, although I use python + http://www.sqlalchemy.org/ for a easy to handle ORM.

On the auryn side probably a standardized output in a more popular file format like HDF5 (http://www.hdfgroup.org/HDF5/) (this supports table like structures, but has no indexing which could be a problem) would be definitely worth implementing.
I just stumbled upon https://github.com/discretelogics/TeaFiles.Net which also looks pretty promising. These formats could be used both for spike-times and voltage/current/other timeseries the simulator can spit out.
For the file based approaches I am not sure about the indexing/selection properties - to calculate single unit firing rates over time-ranges some sort of indexing would be probably more efficient.

asinha
Posts: 37
Joined: Wed Oct 15, 2014 5:12 pm
Location: Hatfield, Hertfordshire, UK
Contact:

Re: How do you process ras data from auryn simulations?

Post by asinha » Thu Feb 05, 2015 3:56 pm

Hi,

I don't know if databases will really help. The processing isn't much of an issue. My C++ post processing program takes 1 second to give me the data I need AFTER I've loaded the combined raster file. The bottleneck, is therefore, the large file size that needs to be loaded into RAM. The simulation I just ran gave me the following output:

Code: Select all

[asinha@cs-1e114-2  201502041728(multiple-patterns %=)]$ lash *_e.ras
9.1G -rw-rw-r--. 1 asinha asinha 9.1G Feb  4 20:32 201502041728.0_e.ras
9.2G -rw-rw-r--. 1 asinha asinha 9.2G Feb  4 20:32 201502041728.10_e.ras
9.1G -rw-rw-r--. 1 asinha asinha 9.1G Feb  4 20:32 201502041728.11_e.ras
9.3G -rw-rw-r--. 1 asinha asinha 9.3G Feb  4 20:32 201502041728.12_e.ras
9.3G -rw-rw-r--. 1 asinha asinha 9.3G Feb  4 20:32 201502041728.13_e.ras
9.1G -rw-rw-r--. 1 asinha asinha 9.1G Feb  4 20:32 201502041728.14_e.ras
9.2G -rw-rw-r--. 1 asinha asinha 9.2G Feb  4 20:32 201502041728.15_e.ras
9.6G -rw-rw-r--. 1 asinha asinha 9.6G Feb  4 20:32 201502041728.1_e.ras
9.6G -rw-rw-r--. 1 asinha asinha 9.6G Feb  4 20:32 201502041728.2_e.ras
8.9G -rw-rw-r--. 1 asinha asinha 8.9G Feb  4 20:32 201502041728.3_e.ras
9.2G -rw-rw-r--. 1 asinha asinha 9.2G Feb  4 20:32 201502041728.4_e.ras
9.4G -rw-rw-r--. 1 asinha asinha 9.4G Feb  4 20:32 201502041728.5_e.ras
9.1G -rw-rw-r--. 1 asinha asinha 9.1G Feb  4 20:32 201502041728.6_e.ras
9.2G -rw-rw-r--. 1 asinha asinha 9.2G Feb  4 20:32 201502041728.7_e.ras
8.8G -rw-rw-r--. 1 asinha asinha 8.8G Feb  4 20:32 201502041728.8_e.ras
9.2G -rw-rw-r--. 1 asinha asinha 9.2G Feb  4 20:32 201502041728.9_e.ras
I expect that any database would take ages to import this much data, since it also runs indexing and other operations on them. I'm back to looking at memory mapped files - and since the main advantage of memory mapped files is when you can use seek operations to access records, this will probably need some modification to the SpikeMonitor that will ensure that each <time, neuronID> pair (each line of the ras file) is the same size.

I was quite surprised to see that a toolbox for this isn't already available, given the amount of post processing we do in our modelling :)

User avatar
zenke
Site Admin
Posts: 156
Joined: Tue Oct 14, 2014 11:34 am
Location: Basel, CH
Contact:

Re: How do you process ras data from auryn simulations?

Post by zenke » Thu Feb 05, 2015 5:28 pm

Dear asinha,

I think you have a valid point here that crunching these large text files becomes increasingly challenging and I am glad we started discussing this issue here. The reason why I implemented everything using human readable file formats is that it enables us to use a vast variety of readily available command line tools. Many of these tools can be extended easily to typical use cases faced in Auryn. Over the years I have implemented my own tool suite for analysis of most Auryn output formats, which is not published because the code is quite messy...

I like doing lots of my analysis on the command line and it's very simple using AWK scripts. For instance extracting the CV ISIs from a ras file boils down to the following code:

Code: Select all

#! /usr/bin/awk -f 
BEGIN {
    N = 20000 # maximum number of neurons
}

{
    curt = $1
    if (firingtime[$2]>0) {
        isi = curt-firingtime[$2]
        nspikes[$2] += 1
        sum[$2] += isi
        sum2[$2] += isi**2
    }
    firingtime[$2] = curt
}
END {
    for (i=1; i<=N; i++) {
        if (nspikes[i]>1) {
            mean = sum[i]/nspikes[i]
            var  = sum2[i]/(nspikes[i]-1)-mean**2
            print sqrt(var)/mean
        }

    }
}
To address your specific problem of working a specific chunk of data I implemented a simple command line script which reads out a given range of spikes or time series from any Auryn text based file. Since the ras files do not have an index the first access is quite slow, but the program works without loading everything in memory. Under the hood the script caches the response and the next access to the same data range will be much faster. The shell script 'range.sh' can be readily used in pipes. It expects three two arguments to benefit from caching (start_time, stop_time and filename) code is the following:

Code: Select all

#!/bin/bash

if [ "$1" != "" ]; then
    START=$1
else
    START=0
fi

if [ "$2" != "" ]; then
    STOP=$2
else
    STOP=$START+100
fi

if [ "$3" != "" ]; then
    FILE=$3
    CACHEFILE=$FILE.rangecache
    CACHERANGEFILE=$FILE.range
    if [ $CACHEFILE -nt $FILE ]; then
        CSTART=`head -n 1 $CACHERANGEFILE`
        CSTOP=`tail -n 1 $CACHERANGEFILE`
        if [ $START == $CSTART ] && [ $STOP == $CSTOP ]; then
            cat $CACHEFILE
            exit
        fi
    fi
    awk 'BEGIN { start = '$START'; stop = '$STOP'; } { if ( $1 < start ) skip; if ( $1 > start && $1 < stop && NF == fields ) print; fields = NF ; if ( $1 > stop ) exit}' $FILE | tee $CACHEFILE \
        && echo $START > $CACHERANGEFILE \
        && echo $STOP >> $CACHERANGEFILE \
        && exit
    rm -rf $CACHEFILE
else
@


I understand that this is still far away from what you would wish for, but that's how I have been working with the files so far. So far, text files have just proven to be the most versatile in terms of processing in other programs or using existing libraries. By making Auryn output binary I think one would lose this advantage, but for long simulations this might be indeed desirable. Why don't you go ahead and implement a BinarySpikeMonitor class with the same interface as SpikeMonitor.

You could use

Code: Select all

struct spikeEvent_type
{
  AurynTime time;
  NeuronID neuronID;
};
Then write data like this

Code: Select all

void writeSpikeEvent(ostream &dst, const spikeEvent &src)
{
    dst.write(&src, sizeof(src));
}
In your analysis code you could get data from somewhere with

Code: Select all

void readSpikeEvent(istream &src, spikeEvent &dst)
{
    src.read(&dst, sizeof(dst));
}

void gotoEvent(istream &src, size_t n)
{
   src.seekg(n * sizeof(spikeEvent_type), is.beg);
}
Now these type of files would still not have an index, but it would be easy to implement one. Even without index a simple binary search (e.g. Quicksort) should readily be able to pinpoint the relevant time section in the file, which usually takes most the time. If you are looking for something that gives you all firing times of a single neuron, however, you would be back to the database.

PS.: Above binary format could be extended to multicolumn time series data for other Auryn formats like pact and so on ... All this could be accompanied with a neat C program to extract ranges from binary ras files and dump them to human-readable ras files so one can continue directly plotting them in Gnuplot or using existing analysis software. Or of course you could do your analysis directly on the binary files.

asinha
Posts: 37
Joined: Wed Oct 15, 2014 5:12 pm
Location: Hatfield, Hertfordshire, UK
Contact:

Re: How do you process ras data from auryn simulations?

Post by asinha » Tue Feb 10, 2015 5:17 pm

A minor hiccup here:

Code: Select all

struct spikeEvent_type
{
  AurynTime time;
  NeuronID neuronID;
};
This addition is wrong, isn't it? In auryn_definitions.h, there are these lines:

Code: Select all

typedef unsigned int NeuronID;
typedef NeuronID AurynTime;
This implies that AurynTime is just an unsigned integer, not a float/double, but the value the Spike Monitor (and also BinarySpikeMonitor) is supposed to print is actually a double (since on multiplication with dt, the result is promoted to a double):

Code: Select all

 outfile << dt*(sys->get_clock()) << "  " << *it+offset << "\n";
So, the data structure should actually be:

Code: Select all

struct spikeEvent_type
{
  AurynDouble time;
  NeuronID neuronID;
};
I'll quickly open a pull request once you see and confirm this. I think it's correct, but I've been looking at code too long today to be a 100% sure. :)

User avatar
zenke
Site Admin
Posts: 156
Joined: Tue Oct 14, 2014 11:34 am
Location: Basel, CH
Contact:

Re: How do you process ras data from auryn simulations?

Post by zenke » Tue Feb 10, 2015 5:21 pm

You are right. I did this on purpose to avoid rounding errors in the floating point number for very large times. Not sure they will ever occur. However, the idea was to store the actual discrete clock value (sys->get_clock()) since the file is not plottable directly anyways, and then require a multiplication with dt in the downstream processing. Either way should work, but I expect processing to be slightly faster on the integers...

asinha
Posts: 37
Joined: Wed Oct 15, 2014 5:12 pm
Location: Hatfield, Hertfordshire, UK
Contact:

Re: How do you process ras data from auryn simulations?

Post by asinha » Tue Feb 10, 2015 5:26 pm

Makes sense. I've opened a pull request here for the fix, so that BinarySpikeMonitor users don't get rounded off data in their logged ras files.

https://github.com/fzenke/auryn/pull/8

User avatar
zenke
Site Admin
Posts: 156
Joined: Tue Oct 14, 2014 11:34 am
Location: Basel, CH
Contact:

Re: How do you process ras data from auryn simulations?

Post by zenke » Tue Feb 10, 2015 6:53 pm

Hmmm .. not sure that is what I meant. I was thinking rounding errors by limited floating point precision say you add 0.1ms to a single precision float storing 110254123s, you might just end up with 110254123s instead of 110254123.0001s if your precision isn't high enough. So I guess double should be fine then, around 15 digits precision whereas the current AurynTime which is of same type as NeuronID has an overflow at 429497s -- so 6 digits plus the four of a default timestep of 0.1ms gives 10 digits total which is well away from Double. I'd say it's fine, but one has to be careful when going to a larger AurynTime datatype such as long long ... let's make a mental note for that ;-)

asinha
Posts: 37
Joined: Wed Oct 15, 2014 5:12 pm
Location: Hatfield, Hertfordshire, UK
Contact:

Re: How do you process ras data from auryn simulations?

Post by asinha » Fri Feb 20, 2015 6:30 pm

Ah, I understand.

Anyway, I'm now using memory mapped fles and I've documented it here: http://ankursinha.in/blog/research/2015 ... ped-files/

Please post if you have any queries :)

User avatar
zenke
Site Admin
Posts: 156
Joined: Tue Oct 14, 2014 11:34 am
Location: Basel, CH
Contact:

Re: How do you process ras data from auryn simulations?

Post by zenke » Fri Feb 20, 2015 7:43 pm

Cool beans! What are you using finally for the time index? AurynTime or AurynDouble? Any idea if either one is faster? In your blog entry you still show the struct with AurynTime. Thanks for sharing this in any case.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest