Monolith V2 Deployment notes

Notes of steps done and issues faced and solved during deployment

  1. Log in as testdaq user to the SPS string processor.

    sps-pub -> sps-access -> sps-stringproc01

  2. As testdaq, pause data taking using the command "pausetestdaq".

    This causes testdaq to wait for current run to end and then start no new runs. This does a killall automate.

    Monolith is on run #14575 and testdaq is on run #14576 (started 10 mins ago), discovered via a ps -ef

  3. Disable email checker.

    So winter overs won't be notified of failed runs. cron wakes up every 10 mins and looks for new runs started. check-latest-run emails to the winter overs list "wo".

  4. As root install the Monolith rpm.

    # rpm -ivh Monolith-2.0.5-1.noarch.rpm

  5. Wait for all post-processing of data to finish (ie wait for monolith to finish processing all runs.) Then issue the command "stoptestdaq".

    Would be an abort, but since all runs have finished for us, it is the equivalent of a clean shutdown. In this case all it really does is kill the rmi registry. This needs to be done before running the 'go' command as it confirms there is no trace of testdaq or monolith running to succeed.

  6. As testdaq, install the modified/new files from the testdaq-launch project which are needed in order to run monolith V2.

    Done manually by Mark, backing up the old ones in case a roll back needs to happen.

  7. As "testdaq", source the new .bashrc file

    By logging out/logging back in.

  8. As testdaq, start up data-taking with the new monolith using the command "go".

    This is a script which starts up rmi registry and automate

    New run is #14577 discovered by the new directory. This first run is a dark noise run, as always, but Monolith is currently disabled for dark noise runs.

  9. re-enabled the email checker

    So the Winter Overs will get notified of problems

  10. As testdaq, watch the most recently made ~/outputXXXXXX directory. Watch data-taking, watch the data-quality log, and then watch the monolith log file. Do a "tail -f" on the monolith log file, and verify that it successfully processes a run.

    Monolith indeed did not run on the dark noise run and testdaq has started a local coincidence run (first of ~48 LC runs).

    Monolith didn't run for run #14578, due to a misplaced file from the manual step above.

    Monolith ran for run #14579, found 5171 events and ran for 239 secs. Though due to a mistake in the background_it.pl, it ran twice on that run. Mark fixed that script and checked it in, moved the tag forward so the existing delivery tag is still correct.

    Run #14580 was a darknoise run, as Mark paused testdaq when Monolith was run twice.