Table of Contents
Running simulations in parallel
Let's come back to our example of the Vogels-Abbott network. This network is run by invoking the following commands in the build/home
directory.
./sim_coba_benchmark --dir /tmp --simtime 5
Suppose you would like to run the same code in parallel and assuming you have a running MPI implementation like OpenMPI, all you need to do is to use the following syntax:
mpirun -n 4 ./sim_coba_benchmark --dir /tmp --simtime 5
where -n 4
specifies the number of processes to be run in parallel to be four. Now if everything worked correctly you should have already felt an increase in performance – i.e. the program should have terminated its computation more quickly.
To use multiple machines as a cluster the specifics might be different depending on the MPI implementation used. For OpenMPI it suffices, if the same home directory is mounted on all machines, to add a hostfile containing the names of all machines in the luster to the command line (e.g. mpirun -n 32 -f hostfile.mpi ./sim_background
). However, since for MPICH2 and other implementations the workflow is slightly different, I would recommend the user to look at the respective tutorials corresponding to their local MPI flavor.
Output
Let's now look at the output side of this call. I will further assume that if your MPI implementation ran all processes on the single node. If that's not the case you will probably know that the output might be scattered over different /tmp
directories all over your cluster (or at least 4 nodes out of it). However, if that is the case you will also know that you can fix this issue by using a directory in the –dir
directive which is shared among all hosts. In our case all the processes were run on a single node.
Output files
We now find the following directory listing under /tmp
:
-bash-4.1$ ls -sl /tmp/coba.* 880 -rw-r--r-- 1 zenke lcn1 900018 Jan 13 11:17 /tmp/coba.0.e.ampa 916 -rw-r--r-- 1 zenke lcn1 935976 Jan 13 11:17 /tmp/coba.0.e.gaba 928 -rw-r--r-- 1 zenke lcn1 950019 Jan 13 11:17 /tmp/coba.0.e.mem 728 -rw-r--r-- 1 zenke lcn1 745435 Jan 13 12:23 /tmp/coba.0.e.ras 176 -rw-r--r-- 1 zenke lcn1 177948 Jan 13 12:23 /tmp/coba.0.i.ras 8 -rw-r--r-- 1 zenke lcn1 5444 Jan 13 12:23 /tmp/coba.0.log 704 -rw-r--r-- 1 zenke lcn1 719375 Jan 13 12:23 /tmp/coba.1.e.ras 172 -rw-r--r-- 1 zenke lcn1 174157 Jan 13 12:23 /tmp/coba.1.i.ras 8 -rw-r--r-- 1 zenke lcn1 5365 Jan 13 12:23 /tmp/coba.1.log 740 -rw-r--r-- 1 zenke lcn1 755861 Jan 13 12:23 /tmp/coba.2.e.ras 180 -rw-r--r-- 1 zenke lcn1 181694 Jan 13 12:23 /tmp/coba.2.i.ras 8 -rw-r--r-- 1 zenke lcn1 5365 Jan 13 12:23 /tmp/coba.2.log 880 -rw-r--r-- 1 zenke lcn1 900018 Jan 13 12:23 /tmp/coba.3.e.ampa 896 -rw-r--r-- 1 zenke lcn1 917174 Jan 13 12:23 /tmp/coba.3.e.gaba 928 -rw-r--r-- 1 zenke lcn1 950019 Jan 13 12:23 /tmp/coba.3.e.mem 720 -rw-r--r-- 1 zenke lcn1 737181 Jan 13 12:23 /tmp/coba.3.e.ras 212 -rw-r--r-- 1 zenke lcn1 216615 Jan 13 12:23 /tmp/coba.3.i.ras 8 -rw-r--r-- 1 zenke lcn1 5364 Jan 13 12:23 /tmp/coba.3.log
That means that each process has written its output to independent files. In generaly you have to specify the convention you want to use manually in your simulation executable with a call sequence similar to following
oss << dir << "/coba." << world.rank() << "."; string outputfile = oss.str(); char tmp [255]; stringstream logfile; logfile << outputfile << "log";
where logfile
in the end will contain a value such as coba.0.log
if this process is run on rank 0. This might be a little annoying and it might change in the near future of Auryn, but for now we have to live with this.
Log output
Now let us have a look into coba.0.log
. The beginning of the file will look similar to this
01/13/14 12:34:01:: Logger started on Rank 0 01/13/14 12:34:01:: Auryn version 0.3 ( compiled Jan 12 2014 21:24:37 ) 01/13/14 12:34:01:: Current AurynTime good for simulations up to 429497s ( 119.305h ) 01/13/14 12:34:01:: Current NeuronID and sync are good for simulations up to 536870911 cells. 01/13/14 12:34:01:: MPI run rank 0 of 4. 01/13/14 12:34:01:: Setting up neuron groups ... 01/13/14 12:34:01:: SpikingGroup:: Size 800 (ROUNDROBIN) 01/13/14 12:34:01:: SpikingGroup:: Registering delay (MINDELAY=8) 01/13/14 12:34:01:: SpikingGroup:: Size 200 (ROUNDROBIN) 01/13/14 12:34:01:: SpikingGroup:: Registering delay (MINDELAY=8) 01/13/14 12:34:01:: Setting up E connections ... <!--- snip --->
which after giving you some general information also tells you how the neurons in the two TIFGroups are distributed on the nodes. In particular the keyword ROUNDROBIN tells you that all neurons are distributed equally over all ranks. In particular since we had 3200 excitatory cells that means that each of the four ranks takes 800 of them. Similarly the 800 inhibitory neurons are distributed as 200 per rank.
This information in the log file is followed by a lot more information about the setup of the connections which we will skip for now. If you look at the end of the file it will look similar to this:
<!--- snip ---> 01/13/14 12:34:02:: Simulating ... 01/13/14 12:34:02:: Simulation triggered ( runtime=5s ) 01/13/14 12:34:02:: On this rank: neurons_total=1000, effective_load=1000, synapses_total=80367 01/13/14 12:34:02:: On all ranks: neurons_total=4000, synapses_total=320158 01/13/14 12:34:02:: Mark set (0s). Ran for 0s with SpeedFactor=0 01/13/14 12:34:05:: Simulation finished. Ran for 3s with SpeedFactor=0.6 01/13/14 12:34:05:: Freeing ...
This is in essence the log output underlying the call of sys→run(5,true)
which triggers a simulation of 5 seconds. Before running the simulation Auryn will collect some statistical information about how many neurons and synapses are used on the local rank (and if you are looking at the log file on rank 0 it will also give you the overall results). It is always reassuring to see that the total of 320158 synapses are in good agreement with our expectation for a random sparse network (N*N*sparseness=4000*4000*0.02=320000). At the end of the run Auryn displays the run time in full seconds and the SpeedFactor which is defined as the ratio between simulated time and the real time it took to run the simulation. In the present example the value of 0.6 states that the simulation ran almost twice as fast as real time.
Merging ras output files
For other output files that are passed as arguments to Monitor objects for instance the same rules hold as described in the output files section. That is why there are a total of four files coba.*.e.ras
. Since all excitatory neurons are distributed across nodes, each TIFGroup corresponding to the excitatory cells writes its spikes to its own file. For the user that means that you will have to merge output if for instance you would like to plot all spikes in a raster plot.
Multiple ras can be merged efficiently for combined analysis using linux command line syntax only and the following call will do the job
sort -g -m coba.*.e.ras > coba.e.ras