Updated: 24-SEP-2003 (Use your browsers' Reload button to ensure you're viewing the most recent version)
ALPSHAD06_071 Volume Shadowing ECO Summary
Copyright (c) Compaq Computer Corporation 2000, 2001. All rights reserved.
New Kit Date : 27-JAN-2000
Modification Date: 12-JUN-2001
Modification Type: This kit was made available again after the
ALPSHAD07_071 Kit was permanently put on hold.
********************** NOTES ***************************
* *
* The fixes in this ECO kit are already included in *
* OpenVMS Alpha V7.1-2. This ECO *WILL ONLY* install *
* on OpenVMS Alpha V7.1, V7.1-1H1 and V7.1-1H2. *
* *
* If a system is running volume shadowing, this kit *
* must be installed after the installation of the *
* ALPBASEx_071 kit in order to prevent potential *
* problems with writing crashdumps to shadowed disks *
* and Boot driver initialization failure errors after *
* running shutdown. *
* *
**********************************************************
PRODUCT: Volume Shadowing for OpenVMS (Phase II)
OP/SYS: Compaq OpenVMS Alpha
COMPONENT: Shadow Driver
Shadow Server
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: ALPSHAD06_071
ECO Kits Superseded by This ECO Kit: ALPSHAD05_071
ALPSHAD04_071
ALPSHAD03_071
ALPSHAD02_071
ALPSHAD01_071
ALPSYS01_071
ECO Kit Approximate Size: 2502 Blocks
Kit Applies To: OpenVMS Alpha V7.1, V7.1-1H1, V7.1-1H2
System/Cluster Reboot Necessary: Yes
Rolling Reboot Supported: Yes
Installation Rating: INSTALL_2
2 - To be installed on all systems running
the listed version(s) of OpenVMS and
using the following feature(s):
Volume Shadowing
Kit Dependencies:
The following remedial kit(s) must be installed BEFORE
installation of this kit:
ALPBASE02_071
In order to receive all the corrections listed in this
kit, the following remedial kits should also be installed:
None
********************* SUPERSESSION NOTE ********************
* *
* Customers who have installed the ALPSHAD04_071 remedial *
* kit may wish to override the requirement that the *
* ALPBASE02_071 kit be installed prior to installing the *
* ALPSHAD05_071 kit. This can be done by performing the *
* following steps in place of a normal kit installation: *
* *
* 1. Verify that you have previously installed the *
* ALPSHAD04_071 remedial kit. *
* 2. Copy the ALPSHAD05_071 remedial kit to a directory *
* on your system. *
* 3. Restore the images in the ALPSHAD05_071 kit to the *
* system with the following two commands: *
* *
* $ BACKUP/SELECT=SHADOW_SERVER.EXE ALPSHAD05_071.A/SA- *
* SYS$COMMON:[SYSEXE] *
* $ BACKUP/SELECT=SYS$SHDRIVER.EXE ALPSHAD05_071.A/SA - *
* SYS$COMMON:[SYS$LDR] *
* *
* Make sure that the images just restored have the latest *
* version numbers. *
* *
* The new images will not take effect until the system is *
* rebooted. If you have other nodes in your VMS cluster, *
* they must also be rebooted in order to make use of the *
* new image(s). If it is not possible or convenient to *
* reboot the entire cluster, a rolling re-boot may be *
* performed. *
* *
* If you have not previously installed the ALPSHAD04_071 *
* remedial kit then in order to install the ALPSHAD05_071 *
* remedial kit you must first install the ALPBASE02_071 *
* remedial kit. *
* *
**************************************************************
ECO KIT SUMMARY:
An ECO kit exists for Volume Shadowing on OpenVMS Alpha V7.1 through
V7.1-1H2.
Problems Addressed in the ALPSHAD06_071 Kit:
o Not all status data for all members of the shadowset is
displayed.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o The system can crash with a SHADDETINCON bugcheck.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o The system crashes when an I/O operation incurs a SS$_DATACHECK
error during a shadowset copy operation.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o When a copy operation that interrupts a merge operation is
terminating, it finds that there are no members marked for the
merge and the thread crashes the system with a SHADDETINCON bug
check.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o SHOW DEVICES shows zero percent merged status although the
shadow set status does not indicate that a merge is required.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o Bit 16 in SHADOW_SYS_DISK can be set by the user to eliminate
using remote members of the shadowset for reads. Occasionally,
use of bit 16 fails to eliminate remote members from being
used.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o A CPUSPINWAIT bug check can occur, if the read of the SCB, of a
shadow set member, cannot pass the checksum test.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o DCD (Disk Copy Data) will not always be initiated properly.
During an assisted copy operation, if the source member was
dismounted or otherwise removed from the shadow set, the
connection to the controller would not clean up correctly.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o A full copy operation that is interrupted for a mini-merge will
not complete the full copy operation correctly.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o Typing incorrect commands results in a system crash
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
o When two disks are added to a shadowset in the same mount
command, the copies are done sequentially instead of in
parallel. This causes the copies to take twice as long as they
should.
Images Affected:
- [SYS$LDR]SYS$SHDRIVER.EXE
- [SYSEXE]SHADOW_SERVER.EXE
Problems Addressed in the ALPSHAD05_071 Kit:
o Functionality was added to enable customers to shadow devices
that report an identical number of "Total Blocks".
o Faster I/O subsystems, for example the HSZ50 and the HSZ70,
were taking longer to perform full merges than some older and
slower subsystems.
Changes were made to allow the System Manager to adjust
thresholds. Two new logicals were adjusted to vary the merge
multiplication factor used for a virtual unit, on a per node
basis.
The logicals used must be defined in the system table and
therefore should be defined on each node in the cluster. The
valid range for a threshold is 100 to 1000. Any value outside
of this range causes a factor to default to 200. This value
of 200 is displayed at the start of a shadow set merge, in the
'%SHADOW_SERVER-I-SSRVINIMRG' message, following the word
'Factor'.
CAUTION:
Increasing the values excessively may cause application
performance problems when merges are occurring. When setting
values, System Managers must balance the site specific
application needs with their merge requirements.
Since two logical names are evaluated every one thousand I/Os,
the factor can possibly be adjusted while a merge is in
progress.
The first logical name is:
SHAD$MERGE_DELAY_FACTOR_DSAnnnn
^^^^
||||
vvvv
This logical name is virtual unit specific, with 'nnnn'
representing the virtual unit number. This delay factor will
be applied to the virtual unit only. If any important disks
need to be merged with minimal disruption, values as high
as 1,000% (threshold = 10 times best time) may be defined. By
the same token, if a particular disk's merge operation is
interfering with application I/O, it can cause the disk to
delay more frequently by reducing the value as low as:
100 (threshold = 1 times the best time)
If the above logical is not defined, then the following
logical is evaluated:
SHAD$MERGE_DELAY_FACTOR
Like the virtual unit specific logical, this value will adjust
the threshold, but only for all shadow sets that do not have a
virtual unit specific logical defined.
o Additional tracing code was added to help diagnose why mini
merge operations were converted to full merge.
o If full merge operations are interrupted with a copy operation,
then write logging is enabled, which wastes cluster write
logging resources.
o If a VMScluster that has more than 96 nodes crashes, then write
logging is never used to recover the virtual unit. The result
is unnecessary full merge operations.
o If a shadow set exists on multiple nodes in a cluster and one
cluster member adds a device, which cannot be accessed by other
nodes in the cluster, then those nodes will crash with an
INVEXCEPTN in the SHDriver within SHSB$MATCH_MASTER_SCB.
o A Virtual Unit can hang and then no further use of the virtual
unit is possible. If the System Dump Analyzer (SDA) is used to
examine the virtual unit, then a negative value will be found
in UCB$W_RWAITCNT.
o Repeating mini merges or full merges can occur immediately
after the successful completion of a previous mini merge or
full merge on a virtual unit.
o During a system shutdown, two possible scenarios could occur:
1. Other nodes that have the system disk virtual unit MOUNTed
may suspend use of that virtual unit, until the node
running shutdown is stopped.
2. When a system disk that is disabled for write logging is
mounted on several nodes in a cluster, a non-system disk
volume access to that virtual unit in the cluster may
suspend, until the node running shutdown is stopped.
o During a system reboot, the rebooting node may intermittently
hang if write logging is concurrently enabled on the system
disk and on other nodes in the cluster.
o Since a virtual unit can be aborted for several reasons,
additional tracing is needed to differentiate why the virtual
units abort.
Problems Addressed in the ALPSHAD04_071 Kit:
o When shutting down a node in a VMScluster, the system that is
being used to perform the shutdown will crash.
o Shadowsets intermittently hang.
o A new informational message has been added that will result in
a Mount Verify message if the IO$_DIAGNOSE function is
executed by the SHDRIVER.
o Additional code changes to improve the error log reporting for
Volume Shadowing.
o The Volume Shadowing code in OpenVMS V7.1 and (V6.2, with the
CLUSIO kit installed) included a new algorithm that did not
always guarantee that read requests would be serviced by a
locally connected disk in preference to a disk that was MSCP
served by another OpenVMS system. Prior to V7.1 (and V6.2
with the CLUSIO kit installed), if there were local and MSCP
served disks to choose from, all read requests were always
queued to a local disk, unless the queue depth exceeded
twenty, on the local member.
Some customers especially those who shadow over FDDI reported
that this new algorithm was notpreferable, and therefore
requested the ability to choose the previous behavior.
The ability to prefer that read requests be performed by local
shadow set members, over those served by an OpenVMS system has
been added to this version of the driver. To select that mode
of operation another bit(16) in SHADOW_SYS_DISK has been used.
$ MC SYSGEN
SYSGEN> SHOW SHADOW_SYS_DISK
Parameter Name Current Default Min. Max.
-------------- ------- ------- ------- -------
SHADOW_SYS_DISK 1 0 0 -1
SYSGEN> SET SHADOW_SYS_DISK %X10001
SYSGEN> WRITE CURRENT
SYSGEN> WRITE ACTIVE
SYSGEN> EXIT
Problems Addressed in the ALPSHAD03_071 Kit:
o A potential system crash with SHADDETINCON bugcheck at
SHDRIVER+12124 during boot from a multi-member shadow set.
This occurs if the booting member is not the first in the
member array, and the other member is not yet visible.
o SHADDETINCON bugchecks occur on multiple nodes in a VMScluster
during a merge operation.
System crash information
------------------------
Time of system crash: 13-APR-1997 13:21:05.59
Version of system: OpenVMS (TM) VAX Version V6.2
System Version Major ID/Minor ID: 1/0
VAXcluster node: CYV7KE, a VAX 7000-760
Crash CPU ID/Primary CPU ID: 00/00
Bitmask of CPUs active/available: 0000003F/0000003F
CPU 00 reason for Bugcheck: SHADDETINCON, SHADOWING detects
inconsistent state
Process currently executing on this CPU: None
Current IPL: 8 (decimal)
CPU database address: C9212000
MPB address: B29B09C0
CPU 00 Processor stack
General registers:
R0 = 00000000 R1 = B67D258C R2 = B67D2180 R3 = B6544600
R4 = B35992C0 R5 = B624A340 R6 = B65447C8 R7 = 00000000
R8 = B67D2180 R9 = B6544730 R10 = 00000000 R11 = B6544600
AP = B65446B8 FP = 7FE2534C SP = C9213DAC PC = B82E42B3
PSL = 04080000
Processor registers:
P0BR = C9946800 SBR = 1EF80400 ASTLVL = 00000004
P0LR = 0000018B SLR = 003FFF00 SISR = 00000010
P1BR = C9216400 PCBB = 7F7B0020 ICCS = 00000000
P1LR = 001FF116 SCBB = 1EF5F000 SID = 17000201
LDEV = 00018002 LBER = 00000000 LCNR = 00000001
LCON0 = DF0007ED LCON1 = 00000000 TODR = 44D09B64
LBECR0 = 0040003A LBECR1 = 00008060 LMODE = 000332A4
LMERR = 00000000 BIU_STAT = F00E1070 BIU_ADDR = 00000298
MMESTS = 10004005 TBSTS = 800001D0 PCSTS = FFFFF800
ISP = C9213DAC
KSP = 7FFE7800
ESP = 7FFE9800
SSP = 7FFED800
USP = 7FE2534C
o System crashes occur in SHADDETINCON SYS$SHDRIVER+3D3C0.
Bugcheck Type: SHADDETINCON, SHA RBADC2 (Clustered)
CPU Type: AlphaServer 2100 4/233
VMS Version: V6.2-1H2
Current Process: NULL
Current Image: <not available>
Failing PC: FFFFFFFF 8025B3C0
Failing PS: 08000000 00000804
Module: SYS$SHDRIVER
Offset: 0003D3C0
Boot Time: 15-APR-1997 08:39:31.00
System Uptime: 5 22:23
Crash/Primary CPU: 00/00
Saved Processes: 22
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 256 MByte (32768 PFNs)
Dumpfile Pagelets: 184518 blocks
Dump Flags: olddump,writecomp,errlogcomp,dump_style
EXE$GL_FLAGS: poolpging,init,bugdump
Stack Pointers:
KSP = FFFFFFFF 8A731D88 ESP = FFFFFFFF 8A733000 SSP = FFFFFFFF
8A72D000
USP = FFFFFFFF 8A72D000
General Registers:
R0 = 00000000 00000001 R1 = FFFFFFFF 8162F7E0 R2 = FFFFFFFF
8162F7C0
R3 = FFFFFFFF 8186EBC0 R4 = 00000000 00000003 R5 = FFFFFFFF
8162F890
R6 = FFFFFFFF 8186EE80 R7 = 00000000 00000000 R8 = FFFFFFFF
8162F7C0
R9 = FFFFFFFF 8186EDE8 R10 = 00000000 00000000 R11 = FFFFFFFF
8186EBC0
R12 = FFFFFFFF 8186ED38 R13 = FFFFFFFF 8710A270 R14 = FFFFFFFF
87084200
R15 = 00000000 003C60E0 R16 = 00000000 000008B4 R17 = 00000000
00000501
R18 = 00000000 00000000 R19 = FFFFFFFF 87084200 R20 = 00000000
00000000
R21 = FFFFFFFF 8162F808 R22 = FFFFFFFF 8710FB20 R23 = 00000000
00000000
R24 = 00000000 00000001 AI = 00000000 00000001 RA = FFFFFFFF
80288928
PV = FFFFFFFF 8710A698 R28 = 00000000 00000000 FP = FFFFFFFF
8A731DE0
PC = FFFFFFFF 8025B3C4 PS = 08000000 00000804
System Registers:
Page Table Base Register (PTBR) 00000000
00007FF8
Processor Base Register (PRBR) FFFFFFFF
8110A000
Privileged Context Block Base (PCBB) 00000000
0110A080
System Control Block Base (SCBB) 00000000
000001B3
Software Interrupt Summary Register (SISR) 00000000
00000000
Address Space Number (ASN) 00000000
00000000
AST Summary / AST Enable (ASTSR_ASTEN) 00000000
00000000
Floating-Point Enable (FEN) 00000000
00000000
Interrupt Priority Level (IPL) 00000000
00000008
Machine Check Error Summary (MCES) 00000000
00000000
Virtual Page Table Base Register (VPTB) 00000002
00000000
Failing Instruction:
SYS$SHDRIVER_NPRO+393C0: BUGCHK
Instruction Stream (last 20 instructions):
SYS$SHDRIVER_NPRO+39370: RET R31,(R28)
SYS$SHDRIVER_NPRO+39374: LDQ_U R31,(SP)
SYS$SHDRIVER_NPRO+39378: SUBQ SP,#X10,SP
SYS$SHDRIVER_NPRO+3937C: STQ R16,#X0008(SP)
SYS$SHDRIVER_NPRO+39380: STQ R17,(SP)
SYS$SHDRIVER_NPRO+39384: LDQ R17,#XF8E0(R13)
SYS$SHDRIVER_NPRO+39388: BIS R17,#X04,R17
SYS$SHDRIVER_NPRO+3938C: BIS R31,R17,R16
SYS$SHDRIVER_NPRO+39390: LDQ R17,(SP)
SYS$SHDRIVER_NPRO+39394: ADDQ SP,#X08,SP
SYS$SHDRIVER_NPRO+39398: BUGCHK
SYS$SHDRIVER_NPRO+3939C: HALT
SYS$SHDRIVER_NPRO+393A0: SUBQ SP,#X10,SP
SYS$SHDRIVER_NPRO+393A4: STQ R16,#X0008(SP)
SYS$SHDRIVER_NPRO+393A8: STQ R17,(SP)
SYS$SHDRIVER_NPRO+393AC: LDQ R17,#XF8E0(R13)
SYS$SHDRIVER_NPRO+393B0: BIS R17,#X04,R17
SYS$SHDRIVER_NPRO+393B4: BIS R31,R17,R16
SYS$SHDRIVER_NPRO+393B8: LDQ R17,(SP)
SYS$SHDRIVER_NPRO+393BC: ADDQ SP,#X08,SP
SYS$SHDRIVER_NPRO+393C0: BUGCHK
SYS$SHDRIVER_NPRO+393C4: HALT
SYS$SHDRIVER_NPRO+393C8: BIS R31,R31,R31
SYS$SHDRIVER_NPRO+393CC: BIS R31,R31,R31
SYS$SHDRIVER_NPRO+393D0: SUBQ SP,#X50,SP
o The Volume Shadowing software which was shipped in OpenVMS
Alpha and VAX V7.1 and the CLUSIO remedial kits, requires
additional non-paged pool to improve synchronization.
Customers should take this into account when they are tuning
their systems and be aware that Volume Shadowing is now
more sensitive to resource problems with the possibility
that systems may crash if non-paged pool is exhausted.
Shadowing uses approximately 800 bytes of additional non-paged
pool per concurrent IO to the virtual unit. This remedial kit
includes code which avoids system crashes if a system
exhausts non-paged pool.
Please be aware that there are still cases under which
Non-Paged Pool exhaustion will result in a SHADDETINCON
BUGCHECK. This modification reduces the probability but
does not completely eliminate them.
o During internal testing, a system crash occurred which
indicated that IOs were left outstanding in DUDRIVER
after a virtual unit had been removed.
o There was a missing index on a check for member valid
in the BBR_READ_RECOVERY routine.
o There was an "infinite" loop condition at SHCP$START_QUED,
and the code has been modified so that the persistent thread
will be "killed" if the VU it spawns fails.
o This remedial kit includes additional error logging
capabilities to collect additional information when
a virtual unit is made available.
The new LOG_IT macro code has the following input parameters:
o R0 - value of P4
o R1 - value of P5
o R2 - address of LW in SHAD containing P6
o R3 - VU UCB
o R5 - SHAD IRP address with:
- CDRP$L_BCNT = P1
- CDRP$L_MEDIA = P2
- CDRP$L_PID = P3
The implementation makes use of the following cells in the
errorlog record.
o EMB$W_SP_BOFF - set to %xBADE as TAG
o EMB$W_SP_FUNC - reason code
o EMB$L_SP_BCNT - LW for information
o EMB$L_SP_MEDIA - LW for information
o EMB$L_SP_RQPID - LW for information
o EMB$Q_SP_IOSB - 2 LW for information
o EMB$L_SP_CMDREF - LW for Information
o A process may intermittently hang during dismount of a
shadow-set while waiting for completion of the QIOW in
DO_IO routine.
o A KRNLSTAKNV halt occurs during MOUNT/CLUSTER DSAx:
Bugcheck Type: CPUSANITY, CPU sanity timer expired
Node: AI84 (Clustered)
CPU Type: AlphaServer 8400 Model EV56/440
VMS Version: V6.2-1H3
Current Process: PM2SKZ
Current Image: DSA40:[ZENT410.][EXE]BUS.EXE
Failing PC: FFFFFFFF 8001F8D0
Failing PS: 18000000 00001604
Module: SYSTEM_PRIMITIVES_MIN
Offset: 0000B8D0
Boot Time: 26-JUN-1997 08:34:37.00
System Uptime: 1 00:46:34.07
Crash/Primary CPU: 01/00
Saved Processes: 26
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 2048 MByte (262144 PFNs)
Dumpfile Pagelets: 999974 blocks
Dump Flags: writecomp,errlogcomp,dump_style
EXE$GL_FLAGS: poolpging,init,bugdump,pgflfrag
Stack Pointers:
KSP = 00000000 7FF91C98 ESP = 00000000 7FF96000 SSP = 00000000
7FF9C100
USP = 00000000 7EDE4030
General Registers:
R0 = 00000000 00000000 R1 = FFFFFFFF 814EA180 R2 = FFFFFFFF
81410000
R3 = FFFFFFFF 9DE268F8 R4 = 00000000 0000012C R5 = 00000000
7FF91D40
R6 = 00000000 7FF445A0 R7 = 08000000 00000200 R8 = FFFFFFFF
F7710250
R9 = 00000000 00000030 R10 = 00000000 00000031 R11 = 00000000
00000001
R12 = 00000000 00008001 R13 = FFFFFFFF 9DE268F8 R14 = FFFFFFFF
9DE25640
R15 = FFFFFFFF 9DE04200 R16 = 00000000 00000774 R17 = 00000000
7FF91C38
R18 = FFFFFFFF 9DE32CE0 R19 = FFFFFFFF 9DE04200 R20 = 00000000
00000000
R21 = 00000000 272007F0 R22 = FFFFFFFF 9DE04200 R23 = 00000000
00000000
R24 = FFFFFFFF 9DE04AC0 AI = 00000000 00000000 RA = FFFFFFFF
00000000
PV = FFFFFFFF FFFFFFFF R28 = FFFFFFFF 8001F83C FP = 00000000
7FF91E10
PC = FFFFFFFF 8001F8D4 PS = 18000000 00001604
Failing Instruction:
EXE$HWCLKINT_C+00510: BUGCHK
o The system crashes when a second node attempts to boot a system
disk shadow set with two members. The following SHADDETINCON
bugcheck at SHDRIVER+12124 or SYS$SHDRIVER_NPRO+449B4 occurs:
SHADDETINCON, SHADOWING detects inconsistent state
o The mount of a shadow set fails. The failure report says that
the set is already mounted or that there is a duplicate unit
number.
o This kit provides a new SYS$BASE_IMAGE.EXE. The V7.1-1H1
limited hardware release also provides this image. Both
images contain support for all of the features in both
releases. Therefore, there are no dependencies on the order
of installations. ALPSHAD03_071 may be installed prior to or
following the installation of V7.1-1H1.
However, if ALPSHAD03_071 is installed after V7.1-1H1, you
will see a warning message in regards to SYS$BASE_IMAGE.EXE.
You can safely ignore this message.
o SDA does not handle relocatable global (non-universal) symbols
correctly if they are in resident images.
o ALPSYS06_062 and ALPSHAD03_071 remedial kit
*** Notice ***
The SDA.EXE image included in this remedial kit will fix the
problems listed below if both the ALPSYS06_062 and ALPSHAD03_071
remedial kits are installed on the customer's system. Therefore,
in order to get the complete list of fixes customers should
install both kits. However, either of this kits will run safely
without the other kit installed.
o SDA> SHOW POOL can take an excessive period of time.
o SHOW POOL gives NOSUCHPOOL errors unnecessarily.
o SHOW POOL/SUMMARY counts and space totals do not match.
o SHOW POOL <range> can not always find the range.
o When minimum SYSTEM_PRIMITIVES is in use, SDA will not work
instead of signaling the correct message.
o The symbol file is opened by SDA even when /OVERRIDE specified
and it is not used..
o SDA can get into a loop printing blank lines.
o Some of BUGCHECK's messages are confusing.
o The Base SVA of buffer objects is only displayed as 32 bits.
o An incomplete dump is inaccessible by SDA. The changes in
this remedial kit will now treat DUMPINCOMPL as a warning if
this is a selective dump and the dump has progressed far
enough to dump the first process.
o SDA SHOW EXEC does not always display all execlets. READ/EXEC
does not read all the symbols.
o MODIFY DUMP does not work on the dump header and /CONFIRM
fails when the field being updated is a byte or a word and the
original value is negative.
o BUGCHECK's two public routines, (EXE$BUGCHK_REMOVE_VA and
EXE$BUGCHK_CANCEL_REMOVE_VA), do not synchronize their
manipulations with spinlocks.
o BUGCHECK fails if the only process is the swapper.
o Handling of Halt/Restart crashes when the Halt HWPCB used
is faulty.
o SHOW DEV MC only allows /HOME, but it is documented as
/HOMEPAGE.
Problems Addressed in the ALPSHAD02_071 Kit:
o Systems with a shadowed system disk running OpenVMS Alpha V7.1
with ALPSHAD01_071 installed may not shut down properly and
crash dumps may be lost. The error message will be:
**** Boot driver initialization routine returned failure
**** Memory dump canceled. IOVector = 00000000, Flags =
02016874
This error occurs because there is a dependency between the
ALPSHAD01_071 SYS$SHDRIVER and EXCEPTION.EXE; however,
EXCEPTION.EXE was not distributed with the ALPSHAD01_071 kit.
This kit simply provides the correct EXCEPTION.EXE. The other
images are the same as were shipped in ALPSHAD01_071.
Problems Addressed in the ALPSYS01_071 Kit:
o A specific $UNWIND call does not transfer control to the correct
PC on OpenVMS Alpha V7.1.
Existing Problems Not Addressed in the ALPSHAD01_071 Kit:
o The following three MOUNT problems were discovered at a late
stage in the release of this kit. OpenVMS Engineering is
working on solutions to these problems which will be available
in a future MOUNT ECO kit.
If a user, either manually or by a command procedure, executes
one of the following errors, MOUNT may incorrectly add members
to existing shadow sets.
- A MOUNT/SHAD with an incorrect volume label will succeed
in adding the member to the shadow set, for example:
$ MOUNT/SYSTEM DSA1/SHAD=$4$DUA1 TST1
$! The shadow set DSA1 is now available with DUA1 as
$! the only member
$ MOUNT/SYSTEM DSA1/SHAD=$4$DUA5 TST5
$! The device $4$DUA5 is wrongly added as a full copy
$! target.
- Similarly, a MOUNT/SHAD with an incorrect volume label of
a shadow set that is MOUNTed elsewhere in the VMScluster
will succeed in adding the member to the set on the other
nodes in the VMScluster, but the MOUNT will fail on the
local node, for example:
NODE_1> $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA1 TST1
NODE_1> $ ! The shadow set DSA1 is now available on NODE_1
NODE_2> $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA5 TST5
NODE_2> $ ! The MOUNT correctly fails on NODE_2 with an
$ ! INCVOLLABEL error
NODE_1> $ ! However, the member $4$DUA5 is incorrectly added
NODE_1> $ ! to the set DSA1 as a full copy target.
- MOUNT will incorrectly allow a non-shareable MOUNT/SHADOW of a
disk that is already mounted on another node as "shareable" to
succeed. As a result, corruption of the disk(s) will occur,
for example:
NODE_1> $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA1 TST1
NODE_1> $ ! The shadowset DSA1 is now available on NODE_1
NODE_2> $ MOUNT /NOSHARE DSA5/SHAD=$4$DUA1 TST1
NODE_2> $ ! The shadowset DSA5 is now incorrectly available
$ ! on NODE_2
NODE_1> $ ! The shadowset DSA1 is also available on NODE_1
Corruption of the disk will occur when write operations are
performed by either node.
Problems Addressed in the ALPSHAD01_071 ECO Kit:
o A SHADDETINCON BUGCHECK may occur in SHD_THREADS when an
attempt is made to terminate a thread that is still a
Significant Event.
o The Volume Shadowing driver delivered in OpenVMS V7.1 and
the V6.2 Cluster Compatibility kits (xxxCOMPAT_062) does not
contain the full solution for the 'Bad Block Repair' (BBR)
problem. As a result, a disk may not be expelled from a shadow
set when necessary.
o An incompatibility exists between the StorageWorks Host Based
RAID Software and the enhanced volume shadowing provided in
both OpenVMS 7.1 and in the Cluster Compatibility Kits
(xxxCOMPAT_62). Because of this incompatibility, RAID
software can no longer detect that a shadow set state change
has occurred.
o Write protecting a shadow set member which is being added to
an existing shadowset causes the virtual unit to hang.
o A system may crash with an INVEXCEPTN bugcheck in SHSB$SEND_MESSAGE
because the UCB address in R5 is zero. It may also crash in
IOC_STD$CVT_DEVNAM in IO_ROUTINES when an attempt is made to get
a DDB out of a UCB that is corrupt.
The problem occurs when the IRP$L_ARB field is not correctly set
up with the clone error index. In the SH$VP_DEV_DRVERR routine,
this byte is used as an index to fetch the longword UCB of the
erring device which is set to FF and is incorrect.
The bad value occurs when volume processing begins to initiate
mount verification after a device error occurs.
o A shadowset may hang in mountverify for an extended period of
time after it encounters a DRAB_INT controller failure on an
HSJ50 which is followed by many 'forced error flagged in last
sector read' error messages on multiple shadowset member disks.
RELATED ARTICLES:
Detailed articles describing the problems listed above may exist in
the OPENVMS database. To view these articles, open the appropriate
product database and perform a query using either of the following
search strings: 'ALPSHAD' or 'ALPSHAD06_071'.
ECO KIT ORDERING INSTRUCTIONS:
If after an evaluation you wish to obtain this kit, request it
electronically using the appropriate Advanced Electronic Services
(AES) Service Tool. If you are not familiar with how to request
kits electronically, open the DIA, WIS or DSNLINK database and
review the article entitled:
[AES] How To Electronically Request ECO Kits Using Service Tools
INSTALLATION NOTES:
The images in this kit will not take effect until the system is
rebooted. If there are other nodes in the VMScluster, they must
also be rebooted in order to make use of the new image(s).
If it is not possible or convenient to reboot the entire cluster at
this time, a rolling re-boot may be performed.
==========================================================================
| Table of Kit Image Information |
+----------------------------+----------+-----------------+--------------+
| | Overall | Image File | Image Link |
| Image Name | Checksum | Identification | Date/Time |
+----------------------------+----------+-----------------+--------------+
| SHADOW_SERVER.EXE | FE880C3C | X-7 | 24-NOV-1999 |
| | | 15:39:07.81 |
+----------------------------+----------+-----------------+--------------+
| SYS$SHDRIVER.EXE | F620ABFD | X-3 | 24-NOV-1999 |
| | | 15:39:17.43 |
+----------------------------+----------+-----------------+--------------+
|