Differences

This shows you the differences between two versions of the page.

--- manual:parallel_execution [2014/01/13 11:52] – created zenke
+++ manual:parallel_execution [2014/10/14 17:32] – Adds paragraph on cluster computation. zenke
@@ Line 3: / Line 3: @@
 Let's come back to our [[examples:sim_coba_benchmark|example]] of the Vogels-Abbott network. This network is run by invoking the following commands in the ''build/home'' directory.
 <code shell>
-./sim_coba_benchmark --dir /tmp --prime --simtime 0.050
 ./sim_coba_benchmark --dir /tmp --simtime 5
 </code>
-Suppose you would like to run the same code in parallel. Suppose you have a running MPI implementation like OpenMPI, all you need to do is to use the following syntax:
+Suppose you would like to run the same code in parallel and assuming you have a running MPI implementation like OpenMPI, all you need to do is to use the following syntax:
 <code shell>
-mpirun -n 4 ./sim_coba_benchmark --dir /tmp --prime --simtime 0.050
 mpirun -n 4 ./sim_coba_benchmark --dir /tmp --simtime 5
 </code>
 where ''-n 4'' specifies the number of processes to be run in parallel to be four. Now if everything worked correctly you should have already felt an increase in performance -- i.e. the program should have terminated its computation more quickly.
+To use multiple machines as a cluster the specifics might be different depending on the MPI implementation used. For OpenMPI it suffices, if the same home directory is mounted on all machines, to add a hostfile containing the names of all machines in the luster to the command line (e.g. ''mpirun -n 32 -f hostfile.mpi ./sim_background''). However, since for MPICH2 and other implementations the workflow is slightly different, I would recommend the user to look at the respective tutorials corresponding to their local MPI flavor.
 ===== Output =====
-Let's now look at the output side of this call. I will further assume that if your MPI implementation ran all processes on the single node. If that's not the case you will probably know that the output might be scattered over different ''/tmp'' directories all over your cluster (or at least 4 nodes out of it). However, if that is the case you will also know that you can fix this issue by using a directory in the ''--dir'' directive which is shared among all hosts. In our case all the processes were run on a single node. We now find the following directory listing under ''/tmp'':
+Let's now look at the output side of this call. I will further assume that if your MPI implementation ran all processes on the single node. If that's not the case you will probably know that the output might be scattered over different ''/tmp'' directories all over your cluster (or at least 4 nodes out of it). However, if that is the case you will also know that you can fix this issue by using a directory in the ''--dir'' directive which is shared among all hosts. In our case all the processes were run on a single node.
+==== Output files ====
+We now find the following directory listing under ''/tmp'':
 <code shell>
 -bash-4.1$ ls -sl /tmp/coba.*
@@ Line 39: / Line 44: @@
 -rw-r--r-- 1 zenke lcn1   5364 Jan 13 12:23 /tmp/coba.3.log
 </code>
 That means that each process has written its output to independent files. In generaly you have to specify the convention you want to use manually in your simulation executable with a call sequence similar to following
 <code c++>
@@ Line 49: / Line 56: @@
 </code>
 where ''logfile'' in the end will contain a value such as ''coba.0.log'' if this process is run on rank 0. This might be a little annoying and it might change in the near future of Auryn, but for now we have to live with this.
+==== Log output ====
 Now let us have a look into ''coba.0.log''. The beginning of the file will look similar to this
@@ Line 82: / Line 92: @@
 </code>
 This is in essence the log output underlying the call of ''sys->run(5,true)'' which triggers a simulation of 5 seconds. Before running the simulation Auryn will collect some statistical information about how many neurons and synapses are used on the local rank (and if you are looking at the log file on rank 0 it will also give you the overall results). It is always reassuring to see that the total of 320158 synapses are in good agreement with our expectation for a random sparse network (N*N*sparseness=4000*4000*0.02=320000). At the end of the run Auryn displays the run time in full seconds and the SpeedFactor which is defined as the ratio between simulated time and the real time it took to run the simulation. In the present example the value of 0.6 states that the simulation ran almost twice as fast as real time.
+==== Merging ras output files ====
+For other output files that are passed as arguments to [[Monitor]] objects for instance the same rules hold as  described in the [[#output files]] section. That is why there are a total of four files ''coba.*.e.ras''. Since all excitatory neurons are distributed across nodes, each [[TIFGroup]] corresponding to the excitatory cells writes its spikes to its own file. For the user that means that you will have to merge output if for instance you would like to plot all spikes in a raster plot.
+Multiple [[ras]] can be merged efficiently using linux command line syntax only and the following call will do the job
+<code shell>
+sort -g -m coba.*.e.ras > coba.e.ras
+</code>