Copyright 1989-2016 by Kevin G. Barkes All rights reserved. This article may be duplicated or redistributed provided no alterations of any kind are made to this file. This edition of DCL Dialogue is sponsored by Networking Dynamics, developers and marketers of productivity software for OpenVMS systems. Contact our website www.networkingdynamics.com to download free demos of our software and see how you will save time, money and raise productivity! Be sure to mention DCL Dialogue! DCL DIALOGUE Originally published February, 1989 What To Do Before The Lights Go Out By Kevin G. Barkes I was at a customer site, shooting the breeze with the system manager, when the lights flickered for a moment. The MicroVAX went brain-dead, then slowly began its inexorable climb to silicon consciousness. The blood drained from the system manager's face, and she quickly took her phone off the hook. It was a rather ineffective action, since the first semi-hysterical user burst through the door less than a half-minute after the glitch hit. "The accounts payable jobs were running," the user moaned. "We'll have to purge and resubmit them. And we have to get them done before 5, or they won't be ready for mailing tomorrow." Having recovered her composure, the system manager assumed that peculiar unconcerned air that calms users and somewhat unsettles upper management. "No problem," she said. "As soon as the system's back up, I'll restart the queues and make certain the jobs are resubmitted in sequence." She sighed, put the phone back on the hook, and calmly handled the angry calls as they came in. After about 10 minutes of issuing reassurances, she took her notes, plopped down in front of the console, and heaved a big sigh. "I hate this. Now I have to manually go through all the queues, resubmit jobs, restart them, and make sure everything picks up from the point the system died." I was a tad puzzled. "You mean your queues don't restart automatically when the system boots?" I asked. "No," she replied, her voice barely audible over the console's noisy response to her SHOW/QUEUE/BATCH/ALL command. "We run lots of batch jobs that get their input from batch jobs which execute immediately before them. It's like dominoes. If the system dies, every batch job after the one killed by the crash will be useless. So we manually restart the queues after a reboot." She grabbed the recovery procedures manual from a nearby shelf and dug in. I stood by silently, answering the phone and keeping the more abusive users at bay while she methodically resubmitted jobs and restarted queues. It took about 20 minutes. When she was finished, the system manager jotted a note to herself to develop some DCL procedures to automate the recovery. "One of these days I'll get around to getting writing a COM file to handle this. Or better yet, talk the boss into buying a UPS system." "That won't completely solve your problem," I pointed out. "Not all crashes are due to power outages. Why not just use the /RESTART qualifier to the SUBMIT command?" "RESTART qualifier?" "Yes. Since version 4 of VMS, DCL's had a nifty little feature that permits a batch job to pick up at some pre-designated point following a system failure. It's not a true restart; that is, it doesn't begin execution immediately at the point of failure. That's why a lot of people pooh-poohed it when the capability was originally announced and it's not as widely used as it could be. Still, it's a big help and should eliminate all this manual stuff you have to go through." The system manager was hooked. "How's it work?" she asked. "All you have to do is add a few lines of code to your procedures and submit them with the /RESTART qualifer," I explained. "For example, take a look at this COM file," I said, pointing to Figure 1. "Figure 1?" she asked. "Slight temporal displacement," I replied. FIGURE 1 $! RUNBILLS_DAILY.COM $! This procedure executes the daily billing programs. $ SET NOON $ SET DEFAULT USER4:[BILLINGS.CURRENT] $ RUN BILLING$PROGRAMS:PROCESS1 $ RUN BILLING$PROGRAMS:PROCESS2 $ RUN BILLING$PROGRAMS:PROCESS3 $ RUN BILLING$PROGRAMS:PROCESS4 $ EXIT "Let's modify the procedure so it will work properly with RESTART," I said. Logging into a VT220 on top of the disk stack, I made a few quick hacks: FIGURE 2 $! RUNBILLS_DAILY.COM $! This procedure executes the daily billing programs. $! Note the restart checkpoints. $ SET NOON $ SET DEFAULT USER4:[BILLINGS.CURRENT] $ IF $RESTART THEN GOTO 'BATCH$RESTART' $ SET RESTART_VALUE = PROCESS1 $ PROCESS1: $ RUN BILLING$PROGRAMS:PROCESS1 $ SET RESTART_VALUE = PROCESS2 $ PROCESS2: $ RUN BILLING$PROGRAMS:PROCESS2 $ SET RESTART_VALUE = PROCESS3 $ PROCESS3: $ RUN BILLING$PROGRAMS:PROCESS3 $ SET RESTART_VALUE = PROCESS4 $ PROCESS4: $ RUN BILLING$PROGRAMS:PROCESS4 $ EXIT "What's the FIGURE 2 do?" she asked. "Line noise," I replied, and continued, "Okay, we've submitted this job with the /RESTART qualifier on the SUBMIT command, and it starts executing. When it hits the line with the $RESTART value, it checks to see what that system-maintained symbol contains. If it's blank, it means the job is executing for the first time. So it ignores the GOTO statement. "The SET RESTART_VALUE = PROCESS1 sets the value of the special system-maintained global symbol BATCH$RESTART to the string PROCESS1. The program PROCESS1 runs and completes. We set the value of BATCH$RESTART to PROCESS2 and begin executing the program PROCESS2. Now, let's say the system crashes. "When it comes back up and the queue restarts, instead of aborting this batch job, the system will begin re-executing it. The value in the symbol $RESTART will be true, so it will branch to the label contained in the symbol BATCH$RESTART... specifically, PROCESS2:. "Of course, in order to make this effective, your programs should be reasonably-sized and you should make certain you don't delete any files until you've passed the restart point where they may be needed. "This doesn't solve all the problems involved with recovering batch jobs from a system failure, but it is a major help. It's also handy when you need to stop a job and restart it on another queue. STOP/QUEUE/REQUEUE/ENTRY=nnn batch_queue does that." "What if I submit a job with the /RESTART qualifier without checking for $RESTART?" the system manager asked. "Then the command file starts at the first line in the command file and proceeds normally. If you submit your `permanently-resident' batch jobs this way, you never have to worry about resubmitting them, unless you ever totally cold-start your system," I explained. "Can you override the restart values?" she asked. "I think so," I replied, grabbing the DCL Dictionary from the shelf and turning to the definition for SET RESTART_VALUE. "Aha," I exclaimed. "There's a /NOCHECKPOINT qualifier to SET QUEUE/ENTRY that clears the BATCH$RESTART value." The system manager was effusive with thanks. "Anytime I can do anything for you, just let me know," she said. "Could you possibly arrange to get my retainer check out on time?" I asked, politely. "See Figure 3," she replied. "There is no Figure 3," I said. "Oh, that's the chart showing how late the checks will be because of this system crash. Too bad you missed our appointment yesterday." The lights flickered again. It was going to be a long month. ---------- Kevin G. Barkes is an independent consultant. He publishes the KGB Report newsletter, operates the www.kgbreport.com website, lurks on comp.os.vms, and can be reached at kgbarkes@gmail.com.