How do I repair a failed source vSnap in an IBM Spectrum Protect Plus environment?

Created by Jim Smith on Fri, 11/15/2019 - 12:07

Published URL:

https://www.ibm.com/support/pages/node/1107525

1107525

How To

Summary

The vSnap servers in an IBM Spectrum Protect Plus environment provide disk storage for protecting data through backup and replication processes. You can repair and replace a failed vSnap server that is configured in your IBM Spectrum Protect Plus environment to act as the source for backup and replication services. The source vSnap server must be repaired so that backup and replication services can resume.

Objective

Repair a source vSnap server that has failed.

Steps

Before you begin

Important: It is assumed that all vSnap servers in the environment are protected by replication. If a vSnap server is not replicated and it fails, it cannot be recovered to a state that would allow it to continue as a disk storage source or target. In the absence of replication processes, you must create a new vSnap server and set up service level agreement (SLA) policies. When you run the policies, a new full backup process runs to the new vSnap server.

To determine which type of repair process is applicable to your vSnap server, see technote 1103847.

About this task

Important: Do not unregister or delete the failed vSnap server from IBM Spectrum Protect Plus. The failed vSnap server must remain registered for the replacement procedure to work correctly.

This procedure establishes a new source vSnap server in your IBM Spectrum Protect Plus environment to replace the failed source vSnap server. The new source vSnap server will contain only the most recent recovery points.

Note: The version of the new vSnap server must match the version of the deployed IBM Spectrum Protect Plus appliance.

Procedure

Log in to the target vSnap server console with the ID serveradmin by using Secure Shell (SSH) protocol.
Enter the following command: $ ssh serveradmin@MGMT_ADDRESS
For example, $ ssh serveradmin@10.10.10.2
Obtain the ID of the failed source vSnap server by opening a command prompt and entering the following command:
$ vsnap partner show

The output is similar to the following example:
```
ID:  12345678901234567890123456789012 
PARTNER TYPE: vsnap 
MGMT ADDRESS: 10.10.10.1 
API PORT: 8900 
SSH PORT: 22
```
Verify that the MGMT ADDRESS is the address of the failed source vSnap server. Take note of the failed source vSnap server's ID number.
In the environment with the source vSnap server, install a new vSnap server of the same type and version, and with the same storage allocation, as the failed source vSnap server.
For instructions about installing a vSnap server, see Installing a physical vSnap server.
Important: Do not register the new vSnap server with IBM Spectrum Protect Plus. Do not use the Add Disk Storage wizard.
1. You will first need to initialize the vSnap server with the following command:
  
  $ vsnap system init ----skip_pool id partner_id
  
  For example: $ vsnap system init --skip_pool –-id 12345678901234567890123456789012 using the failed source vSnap partner ID. A message indicates when the initialization is completed.
  
  Note: This command is different to the vSnap initialization command listed in the IBM Knowledge Center and in the Blueprints.
Complete the vSnap server and pool creation process as outlined in Chapter 5: vSnap Server Installation and Setup in the Blueprints.
Place the new source vSnap server into maintenance mode by entering the following command:

$ vsnap system maintenance begin

Placing the vSnap server into maintenance mode suspends operations such as snapshot creation, data restore jobs, and replication operations.
Initialize the new source vSnap server with the failed source vSnap server’s partner ID. Enter the following command:

$ vsnap system init --id partner_id

The following command is an example: $ vsnap system init --id 12345678901234567890123456789012
On the new source vSnap server, add the partner vSnap servers. Each partner must be added separately. To add a partner, enter the following command:

$ vsnap partner add --remote_addr remote_ip_address --local_addr local_ip_address

where, remote_ip_address specifies the IP address of the source vSnap server, and local_ip_address specifies the IP address of the new source vSnap server.

The following command is an example:

$ vsnap partner add --remote_addr 10.10.10.2 --local_addr 10.10.10.1
When prompted, enter the user ID and password for the target vSnap server.
Informational messages indicate when the partners are created and updated successfully.

Create a repair task on the new source vSnap server by entering the following command:

$ vsnap repair create –-async

The output of this command is similar to the following example:

ID: 12345678901234567890123456789012 
PARTNER TYPE: vsnap 
PARTNER ID: abcdef7890abcdef7890abcdef7890ab 
TOTAL VOLUMES: N/A 
SNAPSHOTS RESTORED: N/A 
RETRY: No 
CREATED: 2019-11-01 15:49:31 UTC 
UPDATED: 2019-11-01 15:49:31 UTC 
ENDED: N/A 
STATUS: PENDING 
MESSAGE: The repair has been scheduled

Monitor the number of volumes that are involved in the repair operation by entering the following command:

$ vsnap repair show

The output of this command is similar to the following example:

ID: 12345678901234567890123456789012 
PARTNER TYPE: vsnap 
PARTNER ID: abcdef7890abcdef7890abcdef7890ab 
TOTAL VOLUMES: 3 
SNAPSHOTS RESTORED: N/A 
RETRY: No 
CREATED: 2019-11-01 15:49:31 UTC 
UPDATED: 2019-11-01 15:49:31 UTC 
ENDED: N/A 
STATUS: ACTIVE 
MESSAGE: Created 0 volumes. There are 3 primary volumes that have recoverable snapshots, the latest snapshot of each will be restored. Restoring 3 snapshots: 3 active, 0 pending, 0 completed, and 0 failed

The number of volumes that are involved in the repair operation is indicated in the TOTAL VOLUMES field.

Monitor the status of the repair task by viewing the repair.log file on the new source vSnap server, in the following directory /opt/vsnap/log/repair.log. Alternatively, you can enter the following command:
$ vsnap repair show
The output of this command is similar to the previous example. The following status messages can be displayed during the repair process:
- STATUS: PENDING indicates that the repair job is about to run.
- STATUS: ACTIVE indicates that the repair job is active.
- STATUS: COMPLETED indicates that the repair job is completed.
- STATUS: FAILED indicates that the repair job failed and must be resubmitted.
During the repair operation, run the vSnap repair show command to verify when the status is COMPLETED.
$ vsnap repair session show
The output of this command is similar to the following example:
```
ID: 1 RELATIONSHIP: 72b19f6a9116a46aae6c642566906b31 
PARTNER TYPE: vsnap 
LOCAL SNAP: 1313 
REMOTE SNAP: 311 
STATUS: ACTIVE 
SENT: 102.15GB 
STARTED: 2019-11-01 15:51:18 UTC 
ENDED: N/A 
Created 0 volumes. 
There are 3 replica volumes whose snapshots will be restored on next replication.
```
A session for each volume involved in the repair operation is displayed.
Periodically issue the $ vsnap repair session show command to ensure that the amount of data being sent for each volume is increasing in increments. As the sessions finish you will see the status change to COMPLETED. When all the sessions finish, issue the $ vsnap repair session show command to verify that the overall status is COMPLETED. A final message indicating the number of volumes for which snapshots were restored is displayed. The message output is similar to the following example:
```
Created 0 volumes. 
There are 3 primary volumes that have recoverable snapshots, the latest snapshot of each will be restored. 
Restored 3 snapshots.
```
For any snapshots that are not restored and that indicate a FAILED status, resubmit the repair process by entering the following command:
```
$ vsnap repair create --async –-retry 
```
When the repair process reports a COMPLETED status, you can resume normal operations for the vSnap server by moving it out of maintenance mode. To resume normal processing, enter the following command:
```
$ vsnap system maintenance complete
```
Remove saved SSH host keys from the repaired source vSnap server and the target vSnap servers.
Run the following commands on both the source and target vSnap servers:
```
$ sudo rm -f /home/vsnap/.ssh/known_hosts
```
```
$ sudo rm -f /root/.ssh/known_hosts
```
Removing the SSH keys ensures that subsequent replication transfers do not produce errors that result from the changed host key of the repaired vSnap server.
Restart the vSnap service on the replaced server by entering the following command:
```
$ sudo systemctl restart vsnap
```
Click System Configuration > Backup Storage > Disk to verify that the new vSnap server is correctly registered, as follows:
- If the new vSnap server is using the same host name or IP address for registration, no change is required.
- If the new vSnap server is using a different host name or IP address for registration, you must update the registration by selecting the pencil icon.
To remove recovery points that are no longer available on the source vSnap server, start a maintenance job from the IBM Spectrum Protect Plus user interface.
For instructions, see Creating jobs and job schedules.
Tip: You might see informational messages that are similar to the following example:
CTGGA1843 storage snapshot spp_1004_2102_2_16de41fcbc3 not found on live Storage2101 Snapshot Type vsnap
To resume jobs that failed after the vSnap server became unavailable, run a storage server inventory job. For instructions, see Creating jobs and job schedules.

Results

The source vSnap server has been repaired with only the most recent recovery points. The next backup job that runs as part of an SLA will back up data incrementally. If you create a restore job, only the most recent recovery point will be available in the backup repository. All other recovery points will be available in the replication repositories, and in the object storage and archive storage repositories if applicable to your environment.

Document Location

Worldwide

[{"Type":"SW","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"ARM Category":[{"code":"a8m3p000000h9Z9AAI","label":"Product Documentation"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1.5;10.1.6;10.1.7"}]