A Beginner's Introduction to using anvil in IceCube

Introduction

The anvil command is designed to be a command line interface to the IceCube detector. It is based on the original rundaq, roadrunner and redaq commands and is designed to superseed these to provide a single point of access for the management of the IceCube detector for operators who are not using the IceAxe portal.

Before explaining how anvil can be used you should underestand that it is not designed for scripting the detector's operations, which can be done by writing jyboss scripts, and so does not support any "flow of control" operations. If this was the reason you started reading this document it is recommended that you first read the "Developer's Introduction to using jyboss in IceCube" and then read this document, paying close attention to "Using anvil commands in jyboss scripts"

Setting up to use anvil

The anvil command is normally installed as part of the standard deployment of IceCube software, therefore if you follow the instructions as laid out here and do not manage to get anvil working, you should look at the installation procedure that is documented elsewhere.

To use anvil on an IceCube cluster you need to log on to the *-expcont node of that cluster as the "jboss" user. While you can run the command from anywhere, it is recommended that you change into the ~/jyboss-scripts directory before you execute the command. Having changed inte the jyboss-scripts directory you can start anvil by simply typing anvil, at which point you should see the following rubric.

jboss@spts-expcont[jyboss-scripts] 20:00:28 (0)$ anvil
Welcome to Anvil, the command line interface to IceCube.
For general help type 'help', for specific help type 'help <topic>'.
For help getting started type 'help getting_started'.
Anvil@spts-expcont** 

You are now ready to use anvil to manage the detector.

Basic Detector Management - The Watchdog

The most basic level of detector management is making sure that ACME's watchdog task is running. This task is responsible for trying to keep the detector taking, processing and managing data. The current implement of this task is fairly basic, but as we understand the foibles of the detector we can improve this implement. To check whether the watchdog task is running you can issue the following command at the anvil prompt.

Anvil@spts-expcont** check acme
The watchdog and data-taking tasks are running

The watchdog task always tries to restart data-taking if that task is not running. This means you will sometimes see a response to the check acme command like the following.

Anvil@spts-expcont** check acme
Only the watchdog task is running

This normally means that you can caught the system after one data-taking task has stopped but before the next has started, and is considered normal.

During normal running the watchdog task should always be running, but there are some circumstances where it can interfere with operations, especially during debugging or a major failure. Under these circumstances the task can be stopped using the following command.

Anvil@spts-expcont** stop

This command requests that the watchdog task stop once the current data-taking task has stopped, but leaves the data-taking task to finish its current run. If you want the current data-taking task to stop immediately, and thus get the watchdog task to also stop immediately, then you shuld use the halt command.

Anvil@spts-expcont** halt

As far as anvil is concerned the acme subsystem is considered running succcessfully only when the watchdog task is running. Thus, if the data-taking task has been started by hand (See "Running the Data-Taking Task"), the check command will give you a result like the following.

Anvil@spts-expcont** check acme
Only the data-taking task is running
Elements failed Check: acme

Watching System Behaviour

While the watchdog task is normally keeps the detector running, there may be times when you want to check how various parts of the system are behaving. You can do this by using different options the check. For example, the default option for the check command is -s that checks the state of all of the detectors subsystems. Thus using this command with no arguments generates something like the following.

Anvil@spts-expcont** check
DAQ status is "Running"
Dispatch-Delivery status is "Started"
PnF status is "running"
SPADE is RUNNING.
The watchdog and data-taking tasks are running

For complete details on the check command use the help facility.

Anvil@spts-expcont** help check

Managing the JBoss Servers by Hand

As noted above, there may be times when it helps to manage the detector directly rather than delegate that responsibility to the watchdog task. In those cases you will need to manage the subsystem of the detector by hand using anvil. Of course, before you start any of those action you should make sure that the watchdog task is not running. otherwise it be executing commands at the same time you are which will lead to confusion at best!

Each subsystem that makes up the IceCube detector runs in one or more JBoss servers. These servers provide the standard exeuction environment for all of our Java software. Each JBoss server is defined by a cluster-independent label called its "location" . All operations effecting JBoss servers are specified in terms of its location. In the current organization of our clusters the location of a JBoss server can be simply derived by removing the cluster's prefix from its node name. For example the spts-expcont node hosts the JBoss server whose location is expcont.

Under normal running the watchdog task makes sure the necessary JBoss servers are running. But there are circumstances where you may want to manage the servers yourself, in these circumstances you will want to make sure that the watchdog is not running (which you ccan do using the check acme command). To check the state of all of the main JBoss servers on a cluster you can simple use the following command.

Anvil@spts-expcont** check -j
JBoss is running on expcont
JBoss is running on evbuilder
JBoss is running on fpmaster
JBoss is running on stringproc01
JBoss is running on ichub21
JBoss is running on icetop01
JBoss is running on ithub01

To check the state of a particular JBoss server you simply need to specify its location as an argument to the command.

Anvil@spts-expcont** check -j evbuilder
JBoss is running on evbuilder

There are sitiations where the JBoss servers get into strange states (this is usually cause by a breakdown in the management comunications layer) in which case they need to be stopped and restarted so that they are in a known state. The nojboss and jboss commands respectively do these tasks. Thus to cycle the main JBoss servers you can use the following commands. (You should note that these commands do take some time to complete as the wait for the shutdowns and startups to complete before returning the prompt.)

Anvil@spts-expcont** nojboss
Shutting down JBoss servers at the following locations:
    expcont
    evbuilder
    fpmaster
    stringproc01
    ichub21
    icetop01
    ithub01
JBoss stopping on expcont
JBoss stopping on evbuilder
JBoss stopping on fpmaster
JBoss stopping on stringproc01
JBoss stopping on ichub21
JBoss stopping on icetop01
JBoss stopping on ithub01
Anvil@spts-expcont** jboss
Starting JBoss servers at the following locations
    expcont
    evbuilder
    fpmaster
    stringproc01
    ichub21
    icetop01
    ithub01
JBoss starting on expcont
JBoss starting on evbuilder
JBoss starting on fpmaster
JBoss starting on stringproc01
JBoss starting on ichub21
JBoss starting on icetop01
JBoss starting on ithub01

Running the Data-Taking Task

If you have worked you way through the previous section the detector is currently has all of its JBoss servers running and in a known state but the detector will not be taking data. This section reviews how you can start and stop data-taking by hand.

To be completed.

Managing Data-Taking by Hand

As with the watchdog task, there may be occasions where you do not want to delegate the task of data-taking to the standard task, but wish to manage the whole process by hand. This section reviews the basis steps you have to go through to do that.

Starting Data-Taking

To begin this review, it helps to have the JBoss servers in a known state, therefore if you have not already do so you should cycle the main JBoss servers as discussed in "Managing the JBoss servers by hand". This leaves the dispatch-delivery subsystem shutdown, so before you startup DAQ you should make sure it has somewhere to send its data. The following commands starts up the dispatch-delivery subsystem and checks it has started properly.

Anvil@spts-expcont** startup dd
Anvil@spts-expcont** check dd
Dispatch-Delivery status is "Started"

You are now ready to start walking the DAQ through its paces. This starts with discovering when components are available for the next run using the following commands.

Anvil@spts-expcont** signal DiscoverSig
Anvil@spts-expcont** check daq
DAQ status is "Idle"
Elements failed Check: daq

(The "Elements failed check" means that DAQ is not running, so is the correct response for the current state.)

You can list the discovered components and their states using the daq command.

Anvil@spts-expcont** daq
 iceTopDataHandler_1                 Idle
  stringProcessor_21                 Idle
         snBuilder_0                 Idle
       tcalBuilder_0                 Idle
     globalTrigger_0                 Idle
      inIceTrigger_0                 Idle
      eventBuilder_0                 Idle
    monitorBuilder_0                 Idle
           domHub_81                 Idle
           domHub_21                 Idle
     iceTopTrigger_0                 Idle

With the DAQ in the Idle state you can now connect its output to the dispatch-delivery input with the following command.

Anvil@spts-expcont** connect daq dd

The next step in the process is configuring the DAQ by specifying the configuration Id and trigger files to use and moving the components into a configured state.

Anvil@spts-expcont** configure -i 204
Configuration 204:"daqSystem: DAQ system configuration to be used for LC with SN data taking on SPTS"
Anvil@spts-expcont** trigger -f /usr/local/icecube/scripts/configs/spts-triggers.xml
Anvil@spts-expcont** signal -w ConfigSig         
Anvil@spts-expcont** daq
 iceTopDataHandler_1                Ready
  stringProcessor_21                Ready
         snBuilder_0                Ready
       tcalBuilder_0                Ready
     globalTrigger_0                Ready
      inIceTrigger_0                Ready
      eventBuilder_0                Ready
    monitorBuilder_0                Ready
           domHub_81   ChannelsConfigured
           domHub_21   ChannelsConfigured
     iceTopTrigger_0                Ready
Anvil@spts-expcont** signal -w ConfigDOMsSig
Anvil@spts-expcont** daq
 iceTopDataHandler_1                Ready
  stringProcessor_21                Ready
         snBuilder_0                Ready
       tcalBuilder_0                Ready
     globalTrigger_0                Ready
      inIceTrigger_0                Ready
      eventBuilder_0                Ready
    monitorBuilder_0                Ready
           domHub_81                Ready
           domHub_21                Ready
     iceTopTrigger_0                Ready

The -w of the signal waits for the transition to complete before returning a prompt.

The final step before starting data-taking is to get ACME to tell DAQ what the run number of the new run will be.

Anvil@spts-expcont** number -u
Current run number is 1665

You are now ready to start data-taking.

Anvil@spts-expcont** signal -w StartSig
Anvil@spts-expcont** daq
 iceTopDataHandler_1              Running
  stringProcessor_21              Running
         snBuilder_0              Running
       tcalBuilder_0              Running
     globalTrigger_0              Running
      inIceTrigger_0              Running
      eventBuilder_0              Running
    monitorBuilder_0              Running
           domHub_81              Running
           domHub_21              Running
     iceTopTrigger_0              Running

Stopping Data-Taking

Stopping data-taking is much more straight-forward that starting as there is not the necessity for configuration. The following command can be used to stop data-taking.

Anvil@spts-expcont** signal -w StopSig
Anvil@spts-expcont** daq
 iceTopDataHandler_1                Ready
  stringProcessor_21                Ready
         snBuilder_0                Ready
       tcalBuilder_0                Ready
     globalTrigger_0                Ready
      inIceTrigger_0                Ready
      eventBuilder_0                Ready
    monitorBuilder_0                Ready
           domHub_81                Ready
           domHub_21                Ready
     iceTopTrigger_0                Ready

If you are going to restart data-taking using the same configuration information then you are done. If, however, you want the systems to pick up new configuration information you need to move it back into the Idle state with the following command.

Anvil@spts-expcont** signal -w IdleSig
Anvil@spts-expcont** daq
 iceTopDataHandler_1                 Idle
  stringProcessor_21                 Idle
         snBuilder_0                 Idle
       tcalBuilder_0                 Idle
     globalTrigger_0                 Idle
      inIceTrigger_0                 Idle
      eventBuilder_0                 Idle
    monitorBuilder_0                 Idle
           domHub_81                 Idle
           domHub_21                 Idle
     iceTopTrigger_0                 Idle