OSS, the ARM program's Operations Status System, is used to track the operational status at all ARM Climate Research Facility Sites through events that affect Systems and Components. Events are entered by each Site's operations team or by instrument mentors.
Help is mostly contextual and feature-oriented, though there is a general overview page.
In the following, some FAQ-style documentation about how the NSA team uses OSS.
PLEASE NOTE: DRAFT, work in progress. Comments are welcome - please either send them to Chris Waigl or leave them right on this wiki, via the "discussion" page. You will have to create an account first.
- Add instruction for multiple move
- Add instructions for use of new "spare for system" field
- Add instructions for keywords
- Add FAQ regarding spares and spares locations
- Add "lifecycle" of a component
- Add ARRA tracking information
What "state" code do I choose for my instrument/component after entering an event?
Q: The instrument/state codes can be confusing. For a component that is down and needs to be repaired, should I choose "OUT-REPAIR" or "IN-NOT OK"?
A: A general principle is that no component that is currently installed in an instrument should be in an "OUT-" state without a very good reason (such as temporarily). The state codes reflect a compromise that should work well enough for the operational teams of the various ARM facilities, though there certainly are debatable corner cases. In practical use, consistency is of higher value than precision in the description. The guideline is the [https:oss.arm.gov/state_legend.php OSS state legend], reproduced below.
COMMENTS REQUESTED In the case of a component that is broken, the recommendation would be to ask the question which situation comes closer to the one at hand:
- Is the component still installed in the whole system? Are parts of the system still powered up and/or producing data? ==> Use "IN-NOT OK".
- Has the component been removed from the system? Has the whole system been brought down and parts of it been moved away from its place of operation? Use "OUT-REPAIR".
How should I use Locations?
Q: When a component is sent away for repair or calibration, should I change the component's Location? Clearly, it is physically being relocated, but in many cases I see it is still listed under NSA C1 Barrow (for example)?
A: COMMENTS REQUESTED CW: There are various ways of dealing with this case and a consensus decision would be needed. The relevant aspects are the following:
- The Location choices belonging to the NSA are NSA C1 Barrow, NSA C1 Spares, NSA C2 Atqasuk, NSA C2 Spares. It is entirely unclear if NSA C2 Spares should be used any longer at all (probably not - everything in Atqasuk should be listed in the same Location to make it easier to find in OSS).
- NSA C1 Spares should ideally be filled with component and instruments ready to use as spares.
- "Location" should represent physical location. However, historically OSS hasn't been used this way: In many cases, only simple events that don't change Location (only Status) have been entered.
- A component that is physically removed should at the minimum be uninstalled from its system and the Event description should note that it is being sent. After that, there are three possible ways of going about the relocation:
- Leave it in NSA C1 Barrow. This could be an appropriate for decommissioned components and instruments in the future (it hasn't often been used as such), to keep NSA C1 Spares free of decommissioned units. However, for the calibration and repair case, we often see
- Move it to NSA C1 Spares. This could be appropriate for instruments that are regularly sent to calibration and expected to enter Spares ultimately.
- Move it to the facility where it is physically sent. The risk of this is two-fold: a) it hasn't happened very often in the past and is therefore not consistent with past practice; b) it may be hard to track down the component again as it won't show up any longer in a search that covers the NSA Locations. An example of b) is when I want a list of all T/RH sensors belonging to NSA -- if two are at SGP I would have to know which serial numbers correspond to the NSA units.
- Another recommendation is be to use "SHIPPED-CALIBRATION" and "SHIPPED-REPAIR" for these units (while keeping them attached an NSA Location in OSS), to distinguish this from "OUT-REPAIR" and "OUT-CALIBRATION".
What are the different types of events?
Q: There seem to be different types of events. Does a "move" event mean that a component was shipped?
A: There are three types of events:
- Basic events that change a system's (for example, instrument's) (and optionally some or all of its subsystems', for example components') state. They do not change the location of the instrument or component and cannot be used to install or remove a component into or from an instrument.
- "Move" events only apply to Components (not Systems) and are useful for a large number of actions on the Components. They are used to relocate a component (for example from the "live system" NSA C1 Barrow to the "spares store" NSA C1 Spares), install or uninstall it from a system, replace one component with another component (such as in an annual radiometer or T/RH sensor swap) or receive it from a different location after repair or calibration.
- "Calibration" events, available for both Instruments and Components, exist to track full calibration and calibration checks on-site.
How do I enter a basic event?
Q: What is the basic event entry form used for, where do I find it and how do I use it?
A: At the bottom of a component detail page. You can reach this page either by using the Search tab on the Components tab, or for example by narrowing down Sites > NSA > C1 (Barrow) > your Instrument > your Component.
- Reaching this form at the bottom of a System or Component detail page is the easiest way to enter a basic event.
- Such an event is only appropriate if the System or Component remains in place (is not moved, exchanged, uninstalled, installed). In practice, this happens only in simple outages - such as a power outage or intermittent outages that may lead up to the removal of the unit for repair at a later date .
- If the event is a calibration or calibration check, use the "calibrate" link. For anything else that moves, removes, swaps or un/installs the unit, use "move".
How do I relocate, install, uninstall or replace a Component?
A: Via the "move" event form.
The drop-down under "Step 1" offers context-appropriate choices. Each of them allows you to enter the subsequent component state, system state, and component location as well as to rename the component. It is important to give thought to which case applies. Not all of the following choices will be offered in every case - this depends on the intital state of the component:
- "Replace" is very powerful for a swap-out and will present you with a choice of possible replacement units and in the next step, form fields for both the "old" and the "new" unit.
- "Relocate" is generic way. It may fail if the component has a parent system that would logically not apply any more after relocation. In which case you should use...
- "Uninstall" - similar to "Relocate", but removes the component from its parent system
- "Install", similarly, attaches the component to a parent system
- "Receive" is useful in case a calibrated component arrives to be used as a spare unit
Two possible pitfalls are:
- Make sure if replacing a unit with a spare unit that the spare unit is correctly entered before using it.
- Make sure the System State you enter correctly reflects the reality. For example for a routine swap-out that is carried out while the system is suffering a longer outage (IN-NOT OK) that is already entered in OSS, make sure your event doesn't accidentally leave the system in IN-OK when in reality the second issue is not resolved at the end of the sensor swap.
Where do I find the "move" event form?
Q: What is the quickest way to access the "move" event form? A: Either via the link above the basic event form on a Component detail page (see above), or via the link next to the component on an Instrument detail page:
What kind of event for a data flow outage?
Q: Say my data flow for an instrument was out (irrecoverable) for a day. What kind of event should I create. A: A straightforward outage would be represented by two basic events:
- First an event that changes the state of the affected units to, for example "IN-NOT OK".
- Then a second event that changes the state of the affected units back to, for example, "IN-OK".
If the instrument is represented by an OSS System (likely), then this would be two system-wide events. Make sure you check the affected components carefully. Especially when entering the second event, it is easy to make the mistake of setting faulty sub-componenents, which were already in a non-functional state before the event, to "IN-OK".
How to represent a sensor swap?
Q: How to represent a sensor swap that involved powering down the instrument, swapping several sensors one by one, powering it back up?
Multiple event types:
- First event to change the instrument state and all affected component from "IN-OK" to "IN-NOT OK" (or "OUT-REPAIR"??)
- For each component, a Move>Replace event, whereby the component is moved to NSA C1 Spares (for example) and a component from Spares is brought in.
- Last event to change the instrument state back to "IN-OK"
When replacing components, the component I wanted to swap in was not available. What should I do in this case?
Q: I tried to follow the instructions for replacing components, but the component I wanted to swap in were not in NSA C1 Spares and had not been updated for a while. What should I do in this case?
A: We are still cleaning up mistakes and omissions from the past. Therefore, before even starting to enter an event in OSS, you need to check the initial state of all affected components. Historically, inventory management has not been complete and it is sometimes necessary to bring a spare component up to the correct initial state and event level before starting on the events you are interested in.