01 December 2014
My BSc (honours) dissertation in physics was about angular distributions of J/Psi mesons produced at the CMS. The following is a reflection on the process I used in writing my dissertation, with some notes on what I did wrong and how I would do it differently.
Given that the project dealt with data from CERN, my main analysis tool was ROOT. For those unfamiliar with ROOT, it is a data analysis framework optimised for dealing large datasets such as those produced at the Large Hadron Collider. Python, and gnuplot are used to process the results produced by ROOT into a LaTeX ready table or graph.
There are alternatives to some of these programs (e.g. Word for LaTeX, MATLAB for Python/gnuplot), but my choices were ultimately by preference: I like the documents LaTeX produces over other programs, and Python I chose because I felt like it was good opportunity to learn a new language, and gnuplot for its LaTeX friendly plots.
For the dissertation itself, almost everything I wrote (including layout changes) was written into the single main .tex file. The only bits that weren't put into the main file was my code, images, and tables processed by the Python script.
In essence, the workflow is relatively simple, and looks like this:
Whilst the workflow is fairly straight forward, a few of my working decisions hampered the ability to
The LaTeX document also had problems due to its length, and made compiling a pain if there were any errors. The enjoyment in working with my project was greatly over shadowed by these organisational problems, and led to hours being wasted trying to fix problems or deciphering what I had done.
The biggest issue on the whole was not compartmentalising everything. This includes separating the main .tex files into separate smaller files that makes things easier to maintain and edit; separating data files into families either by creating a new folder to put the files in or creating some kind of sane naming convention, or even using ROOTs internal data structures; and splitting up the code I used for ROOT (consisting of 1000+ lines with a lot of histogram objects, repeated code, and minimal comments) into more manageable bites.
Why didn't I split everything up? Before I started writing my results, I had already read a few guides on writing a thesis in LaTeX. One of the first things that would be noted is to split your .tex files in to chapters or sections. Knowing this, I decided it was too much effort and put it all into one file. Needless to say, I regretted that decision when the .tex files became a bit large.
I have since learned from my experience, and from the perspective of a reformed ex-non-compartmentaliser, the grass is much, much greener on this side. The advantages of doing it would have solved most of my problems. The ability to view the structure of the code and to run what is necessary with minimal effort, the ability to isolate problems and solve them later, the ability to simply function without fighting against the code.
I wish I wrote more comments. Not just in my programming code but my LaTeX stuff as well. In first year compsci there is a heavy emphasis on writing comments, even for the relatively simple tasks we were given. Clearly, that first year was a waste for me, since I wrote minimal amounts of documentation throughout the entire project. This made going through my code slow, especially if I hadn't look at that particular section for a few days.
Since finishing university, I had gained a great deal of respect for documentation, not only in code, but also general notes about errands. It doesn't matter if you're the only person who will read it, there will come a point where you put aside some bit code for an extended period of time, after which you'll come back to it and find you have no idea what you wrote. You may have some vague recollections, but you'll waste time relearning this piece of code because of the lack of documentation, incoherent variable names, and sometimes just weird bits that look right.
Of course not everything can be documented, nor should it be. Names should be relatively self documenting. Names such as "bin1.dat", "v2pt6.dat", or "4bins.plt" are not self documenting. My data files are full of names like this, and it was not an experience I would like to repeat.
Whilst I may sound a bit angsty about my work, disregarding the content I am fairly happy with the final result. There were other things I could have done to improve my dissertation, including asking for more help, not leaving work to the last minute, and not losing over my cool because the plots produced from ROOT look different from those produced in gnuplot or pgflot. But the thing would have helped the most would have been to actually be enthusiastic for my topic, or even physics itself. In my final year of university, I realised stopped caring for physics in my second year of university. I am now quite happy that I have done so, as I've enjoyed myself far more doing programming and computery things than I did solving meaningless problems that thousands before had done.