Discussion:
The Insane Operator has struck again
Ron Hudson
2010-06-02 18:22:50 UTC
Permalink
Hey All

My "Insane Operator"* has struck again.. you know he is the Boss's son so he
is un-fireable but he somehow
tripped all the UPS's to the OFF state - all at once...

Do I have to do anything to recover from this strange situation? Or does MVS
handle is all for me?




*actually the windows host machine crash-rebooted...
Gerhard Postpischil
2010-06-02 19:13:36 UTC
Permalink
Post by Ron Hudson
My "Insane Operator"* has struck again.. you know he is the Boss's son
so he is un-fireable but he somehow
tripped all the UPS's to the OFF state - all at once...
Do I have to do anything to recover from this strange situation? Or does
MVS handle is all for me?
MVS is fairly resilient, so generally it will recover. JES2 will
rerun any batch jobs that were running (which may not be what
you want). However, in really bad cases a write in progress can
cause data loss, or sometimes (ask me how I know) cross-linked
files that normal Windows cleanup cannot recover. The ground
rule is the same as on real iron - if you need it, make sure you
back it up. In Windows you can just copy the whole DASD
directory to another disks (I have a 150GB USB external for
that), and have needed to recover any number of times.

Gerhard Postpischil
Bradford, VT
Ron Hudson
2010-06-02 19:29:04 UTC
Permalink
On Wed, Jun 2, 2010 at 12:13 PM, Gerhard Postpischil
Post by Ron Hudson
My "Insane Operator"* has struck again.. you know he is the Boss's son
so he is un-fireable but he somehow
tripped all the UPS's to the OFF state - all at once...
Do I have to do anything to recover from this strange situation? Or does
MVS handle is all for me?
Is there an actual failure mode in a real installation that is like this?
One assumes that an important Asset such as a mainframe would have
redundant UPS's backed by Generators.
...In Windows you can just copy the whole DASD
directory to another disks ...
I tar the whole directory (so I get all the current states of the .conf
files)
Gerhard Postpischil
Bradford, VT
Kevin Monceaux
2010-06-02 21:04:29 UTC
Permalink
Is there an actual failure mode in a real installation that is like this?
One assumes that an important Asset such as a mainframe would have
redundant UPS's backed by Generators.
Did you have to bring back such memories? The shop I work in only has a
single UPS and non-redundant generators. The UPS has only failed once,
maybe twice, since I've been there. I've been in mainframe operations about
a dozen years or so. The time I definitely remember it failing I happened
to be the only one there when it happened. I was still relatively new. It
was around 20:00. I was back at the burster bursting checks and everything
in the computer room went dark for about five seconds. After recovering
from shock I started calling system programmers, etc., via the power failure
phone. Our phone controller hadn't recovered from the power failure yet.
All in all it made for an interesting night. I think when all was said and
done it turned out that a tech had accidentally left the UPS in bypass mode
after performing some routine maintenance on it.
--
Kevin
http://www.RawFedDogs.net
http://www.WacoAgilityGroup.org
Bruceville, TX

What's the definition of a legacy system? One that works!
Errare humanum est, ignoscere caninum.
Ron Hudson
2010-06-02 21:22:18 UTC
Permalink
Post by Kevin Monceaux
Did you have to bring back such memories?
Sorry, In payment I offer a like story..

I was a VAX system operator and I would go in on weekends because then I
could work on my
own projects as nobody was generally around. I was sitting at my desk
happily working a way
on a command history for VMS 3.0 (Which didn't have one) in DCL, when my
terminal started
getting messages from the kernel "ummm - I cant find the system volume..." .
I thought "That's not
good" and made my way to the computer room.

As I opened the door and walked up the ramp I noticed a brown haze hanging
just below the
cileing. The system volume was on one of those old dishwasher sized hard
drives, this one happened
to have a clear lid, I looked inside and was somewhat surpsrised to see a
bright shiney ring of bare
metal about half way out between the drive hub and the edge of the platters.
Thing was still spinning
but the attention light was blinking and the console decwriter was printing
over and over again
"unable to read the system volume"

Called my Boss, He was able to get a shiney new 400 mb (twice the size of
the original system volume)
installed and working by monday morning.) Got attaboys for being there to
raise the alarm in time.
no production downtime resulted. I had just done friday night's full backups
too!

After that we had a system volume and a shadow volume that the system volume
was copied to each
night by automatic batch job. Made fixing "oh help I just deleted my group's
most important file" fixes
much easier.
Kevin Monceaux
2010-06-02 22:47:36 UTC
Permalink
Post by Ron Hudson
I was a VAX system operator and I would go in on weekends because then I
could work on my own projects as nobody was generally around.
The first "real" computer, aka something larger than a PC, I got my hands on
was also a VAX. During my brief stay in college I worked as a volunteer
operator on a VAX 11/750. I think it was running VMS 5.3. It had a system
pack and a "student" pack. The system drive was an RM80, the student drive
was an RA60.
--
Kevin
http://www.RawFedDogs.net
http://www.WacoAgilityGroup.org
Bruceville, TX

What's the definition of a legacy system? One that works!
Errare humanum est, ignoscere caninum.
Ron Hudson
2010-06-03 00:24:59 UTC
Permalink
As an operator I was responsible for care and feeding (and backing up) of 1
11/780, 3 11/750 and a batch of Data General machines. (at Calma, just
before they were dismantled by GE)
Post by Kevin Monceaux
Post by Ron Hudson
I was a VAX system operator and I would go in on weekends because then I
could work on my own projects as nobody was generally around.
The first "real" computer, aka something larger than a PC, I got my hands on
was also a VAX. During my brief stay in college I worked as a volunteer
operator on a VAX 11/750. I think it was running VMS 5.3. It had a system
pack and a "student" pack. The system drive was an RM80, the student drive
was an RA60.
--
Kevin
http://www.RawFedDogs.net
http://www.WacoAgilityGroup.org
Bruceville, TX
What's the definition of a legacy system? One that works!
Errare humanum est, ignoscere caninum.
Gerhard Postpischil
2010-06-03 02:56:01 UTC
Permalink
Is there an actual failure mode in a real installation that is like this?
One assumes that an important Asset such as a mainframe would have
redundant UPS's backed by Generators.
It depends on the installation. I worked at a number of service
bureaus and ISVs, and none had a UPS or a generator.

One of our competitors (with government contracts) had both a
UPS and a generator. Their system crashed after a technician had
services the UPS, and hit a reset switch. Another time some
batteries sprang a leak, on Christmas day, and they were unable
to get service. The president and some VPs came in to shovel
sand on the sulfuric acid - they were down for three days.

OTOH, my PCs have a UPS each, but now I live in an area with
really flaky power.

Gerhard Postpischil
Bradford, VT
PeterH
2010-06-03 05:25:08 UTC
Permalink
Post by Gerhard Postpischil
Is there an actual failure mode in a real installation that is like this?
One assumes that an important Asset such as a mainframe would have
redundant UPS's backed by Generators.
It depends on the installation. I worked at a number of service
bureaus and ISVs, and none had a UPS or a generator.
I worked at this Nation's largest municipal utility.

It was considered a "preferred" installation, much as police, fire
and hospitals, and we had no reason to employ a generator.

Our headquarters building, where all our mainframes were located, had
two independent 34.5 kV feeders from two different distribution
stations.

Even during the three major earthquakes which devastated portions of
L.A. during my employment there, we never had as much as a glitch at
my residence in the Brentwood District, nor at my place of employment
at 111 N. Hope St.

Nearly every mainframe installation had the same opportunity to
obtain "preferred" status, all it cost was $$$, but most did not
elect to do so.
k***@public.gmane.org
2010-06-03 08:57:07 UTC
Permalink
even with 'preferred' status, and dual independent feeds, hospitals are
required to have enough internal generating capacity (and auto start/transfer
capacity within 30 seconds) to handle 'critical life safety' needs (for at
least 8 hours, IIRC). this would include power to run ventilators,
monitors, minimal lighting, and often at least one elevator.

And as several hurricanes (and flooding along the Miss/MO/OH system) have
demonstrated, it is NOT a good idea to have all of your generators in the
basement.

ck


In a message dated 6/3/2010 00:25:20 Central Daylight Time,
peterh5322-***@public.gmane.org writes:


I worked at this Nation's largest municipal utility.

It was considered a "preferred" installation, much as police, fire
and hospitals, and we had no reason to employ a generator.

Our headquarters building, where all our mainframes were located, had
two independent 34.5 kV feeders from two different distribution
stations.

Even during the three major earthquakes which devastated portions of
L.A. during my employment there, we never had as much as a glitch at
my residence in the Brentwood District, nor at my place of employment
at 111 N. Hope St.

Nearly every mainframe installation had the same opportunity to
obtain "preferred" status, all it cost was $$$, but most did not
elect to do so.
k***@public.gmane.org
2010-06-03 09:03:52 UTC
Permalink
oh, and even with that, where I had input, I spec'd at least a 250 VA class
APC UPS (with line conditioning and surge protection) for the computer
systems in my department.

ck


In a message dated 6/3/2010 03:57:40 Central Daylight Time, krin135-***@public.gmane.org
writes:




even with 'preferred' status, and dual independent feeds, hospitals are
required to have enough internal generating capacity (and auto start/transfer
capacity within 30 seconds) to handle 'critical life safety' needs (for at
least 8 hours, IIRC). this would include power to run ventilators,
monitors, minimal lighting, and often at least one elevator.

And as several hurricanes (and flooding along the Miss/MO/OH system) have
demonstrated, it is NOT a good idea to have all of your generators in the
basement.

ck


In a message dated 6/3/2010 00:25:20 Central Daylight Time,
***@rattlebrIn a message da


I worked at this Nation's largest municipal utility.

It was considered a "preferred" installation, much as police, fire
and hospitals, and we had no reason to employ a generator.

Our headquarters building, where all our mainframes were located, had
two independent 34.5 kV feeders from two different distribution
stations.

Even during the three major earthquakes which devastated portions of
L.A. during my employment there, we never had as much as a glitch at
my residence in the Brentwood District, nor at my place of employment
at 111 N. Hope St.

Nearly every mainframe installation had the same opportunity to
obtain "preferred" status, all it cost was $$$, but most did not
elect to do so.
yvette hirth
2010-06-03 14:09:35 UTC
Permalink
Post by k***@public.gmane.org
oh, and even with that, where I had input, I spec'd at least a 250 VA
class APC UPS (with line conditioning and surge protection) for the
computer systems in my department.
yes, APC SmartUPS 2200's (4) are my poweroff plan.

and speaking of backup:

before you restart after a power failure, or a "three-fingah salute
whoops forgot to shut down herc first", etc., run the cckdcdsk utility
on all of your packs:

cckdcdsk -3 -f <pack> <shadow(s), if it(they) exist>

that'll ensure you don't have any dasd i/o irregularities before you
restart and see a gazillion prompts that make for a real ugly ipl.

on my 3.8j setup, i run a script that tests for shadow file existence
and properly processes the dasd. looks like this:

#!/bin/bash
function check_disk {
if [ -f $2 ]
then
cckdcdsk -3 -f $1 $2
else
cckdcdsk -3 -f $1
fi
}
check_disk volum0_cckd.cuu volum0_01.sf
check_disk volum1_cckd.cuu volum1_01.sf
check_disk volum2_cckd.cuu volum2_01.sf
...

every time i run it after a reboot where herc was still running, i see
"free space error, fixed". yes, herc should do it as well, but i think
it's better to find the errors before you restart, rather than during
the restart process.

of course this is a linux script (that may work for osx as well, i can't
say one way or the other). y'all could create a dos batch file that
does the same thing.

hth
yvette hirth

Loading...