How do I repair a failed target vSnap in an IBM Spectrum Protect Plus environment?

Created by Jim Smith on Fri, 11/15/2019 - 12:32
Published URL:
https://www.ibm.com/support/pages/node/1107543
1107543

How To


Summary

The vSnap servers in an IBM Spectrum Protect™ Plus environment provide disk storage for protecting data through backup and replication processes. You can repair and replace a failed vSnap server that is configured in your IBM Spectrum Protect Plus environment to act as the target for backup and replication services. The target vSnap server must be repaired so that backup and replication services can resume.

Objective

Repair a target vSnap server that has failed. 

Steps

Before you begin

Important: It is assumed that all vSnap servers in the environment are protected by replication. If a vSnap server is not replicated and it fails, it cannot be recovered to a state that would allow it to continue as a disk storage source or target. In the absence of replication processes, you must create a new vSnap server and set up service level agreement (SLA) policies. When you run the policies, a new full backup process runs to the new vSnap server.

About this task

Important: Do not unregister or delete the failed vSnap server from IBM Spectrum Protect Plus. The failed vSnap server must remain registered for the replacement procedure to work correctly.
This procedure establishes a new target vSnap server in your IBM Spectrum Protect Plus environment to replace the failed target vSnap server. The new target vSnap server will not contain any data but will be populated with the most recent recovery points during the next scheduled replication operation.
Note: The version of the new vSnap server must match the version of the deployed IBM Spectrum Protect Plus appliance.

To determine which type of repair process is applicable to your vSnap server, see technote 1103847.

Procedure

  1. Log in to the functioning vSnap server console with the ID serveradmin by using Secure Shell (SSH) protocol.
    Enter the following command: $ ssh serveradmin@MGMT_ADDRESS

    For example, $ ssh serveradmin@10.10.10.1

  2. Obtain the ID of the failed vSnap server by opening a command prompt and entering the following command:

    $ vsnap partner show

    The output is similar to the following example:

    ID: 12345678901234567890123456789012
    PARTNER TYPE: vsnap 
    MGMT ADDRESS: 10.10.10.2 
    API PORT: 8900 
    SSH PORT: 22
  3. Verify that the MGMT ADDRESS is the address of the failed vSnap server. Take note of the failed vSnap server's ID number.
  4. In the environment with the target vSnap server, install a new vSnap server of the same type and version, and with the same storage allocation, as the failed target vSnap server.
    For instructions about installing a vSnap server, see Installing a physical vSnap server.
    Important: Do not register the new vSnap server with IBM Spectrum Protect Plus. Do not use the Add Disk Storage wizard.
    1. You will first need to initialize the vSnap server with the following command:

      $ vsnap system init --skip_pool --id partner_id

      For example: $ vsnap system init --skip_pool --id 12345678901234567890123456789012 using the failed source vSnap partner ID. A message indicates when the initialization is completed.

      Note: This command is different to the vSnap initialization command listed in the IBM Knowledge Center and in the Blueprints.
  5. Complete the vSnap server and pool creation process as outlined in Chapter 5: vSnap Server Installation and Setup in the Blueprints.
  6. Place the new vSnap server into maintenance mode by entering the following command:

    $ vsnap system maintenance begin

    Placing the vSnap server into maintenance mode suspends operations such as snapshot creation, data restore jobs, and replication operations.
  7. Initialize the new target vSnap server with the failed target vSnap server’s partner ID. Enter the following command:

    $ vsnap system init --id partner_id

    The following command is an example:

    $ vsnap system init --id 12345678901234567890123456789012

  8. On the new target vSnap server, add the partner vSnap servers. Each partner must be added separately. To add a partner, enter the following command:

    $ vsnap partner add --remote_addr remote_ip_address --local_addr local_ip_address

    where, remote_ip_address specifies the IP address of the source vSnap server, and local_ip_address specifies the IP address of the new target vSnap server.

    The following command is an example:

    $ vsnap partner add --remote_addr 10.10.10.1 --local_addr 10.10.10.2

  9. When prompted, enter the user ID and password for the source vSnap server.
    Informational messages indicate when the partners are created and updated successfully.
  10. Create a repair task on the new source vSnap server by entering the following command:

    $ vsnap repair create --async

    The output of this command is similar to the following example:

    ID: 12345678901234567890123456789012 
    PARTNER TYPE: vsnap 
    PARTNER ID: abcdef7890abcdef7890abcdef7890ab 
    TOTAL VOLUMES: N/A 
    SNAPSHOTS RESTORED: N/A 
    RETRY: No 
    CREATED: 2019-11-01 15:49:31 UTC 
    UPDATED: 2019-11-01 15:49:31 UTC 
    ENDED: N/A 
    STATUS: PENDING 
    MESSAGE: The repair has been scheduled
  11. Monitor the number of volumes that are involved in the repair operation by entering the following command:

    $ vsnap repair show

    The output of this command is similar to the following example:
    ID: 12345678901234567890123456789012 
    PARTNER TYPE: vsnap 
    PARTNER ID: abcdef7890abcdef7890abcdef7890ab 
    TOTAL VOLUMES: 3 
    SNAPSHOTS RESTORED: N/A 
    RETRY: No 
    CREATED: 2019-11-01 15:49:31 UTC 
    UPDATED: 2019-11-01 15:49:31 UTC 
    ENDED: N/A 
    STATUS: ACTIVE 
    MESSAGE: Creating 3 volumes for partner 670d61a10f78456bb895b87c45e20999 

    The number of volumes that are involved in the repair operation is indicated in the TOTAL VOLUMES field.

  12. Monitor the status of the repair task by viewing the repair.log file on the new source vSnap server, in the following directory /opt/vsnap/log/repair.log. Alternatively, you can enter the following command:

    $ vsnap repair show

    The output of this command is similar to the previous example. The following status messages can be displayed during the repair process:
    • STATUS: PENDING indicates that the repair job is about to run.
    • STATUS: ACTIVE indicates that the repair job is active.
    • STATUS: COMPLETED indicates that the repair job is completed.
    • STATUS: FAILED indicates that the repair job failed and must be resubmitted.
  13. During the repair operation, run the vSnap repair show command to verify when the status is COMPLETED.
    $ vsnap repair session show
    The final message indicates the number of volumes whose snapshots will be restored on the next replication, as follows:
    Created 0 volumes. 
    There are 3 replica volumes whose snapshots will be restored on next replication.
  14. For any snapshots that are not restored and indicate a FAILED status, resubmit the repair process by entering the following command:
    $ vsnap repair create --async –-retry 
  15. When the repair process reports a COMPLETED status, you can resume normal operations for the vSnap server by moving it out of maintenance mode. To resume normal processing, enter the following command:
    $ vsnap system maintenance complete
  16. Remove saved SSH host keys from the repaired source vSnap server and the target vSnap servers.

    Run the following commands on both the source and target vSnap servers:

    $ sudo rm -f /home/vsnap/.ssh/known_hosts
    $ sudo rm -f /root/.ssh/known_hosts

    Removing the SSH keys ensures that subsequent replication transfers do not produce errors that result from the changed host key of the repaired vSnap server.

  17. Restart the vSnap service on the replaced server by entering the following command.
    $ sudo systemctl restart vsnap
  18. Click System Configuration > Backup Storage > Disk to verify that the new vSnap is correctly registered, as follows:
    • If the new vSnap server is using the same hostname or IP address for registration, no change is required.
    • If the new vSnap server is using a different hostname or IP address for registration, you must update the registration by selecting the pencil icon.
  19. To remove recovery points that are no longer available on the source vSnap server, start a maintenance job from the IBM Spectrum Protect Plus user interface.
    Follow the instructions here to do this, see Creating jobs and job schedules.
    Tip: You might see informational messages that are similar to the following example:
    CTGGA1843 storage snapshot spp_1004_2102_2_16de41fcbc3 not found on live Storage2101 Snapshot Type vsnap
  20. To resume jobs that failed after the vSnap server became unavailable, run a storage server inventory job. For instructions, see Creating jobs and job schedules.

Results

The target vSnap server has been repaired. A new backup job must be run on the source vSnap server before any additional action is taken on the new target vSnap server.

If a replication job is attempted on the new target vSnap server, a message is displayed as follows:

CTGGA0289 - Skipping volume <volume_id> because there are no new snapshots since last backup 

After a new backup job is run on the source vSnap server, the next scheduled replication job replicates the recovery points that are created by the backup job. At this point, if you create a restore job, only the most recent recovery point will be available in the replication repository. If the target vSnap server was also acting as a copy source to object or archive storage, the replication job must first run on the target vSnap server before any additional copy operations can complete successfully. The first copy of data to object storage will be a full copy.

Document Location

Worldwide

[{"Type":"SW","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"ARM Category":[{"code":"a8m3p000000h9Z9AAI","label":"Product Documentation"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1.5;10.1.6;10.1.7"}]

Document Information

Modified date:
15 July 2022

UID

ibm11107543