How do I repair a failed dual-role vSnap in an IBM Spectrum Protect Plus environment?

Created by Jim Smith on Fri, 11/15/2019 - 12:37
Published URL:
https://www.ibm.com/support/pages/node/1107549
1107549

How To


Summary

You can repair and replace a failed vSnap server that is configured in your IBM Spectrum Protect Plus environment to act as both the source and target for backup and replication services.

Objective

Repair a dual-role vSnap server that acts as the source and target, when it fails. 

Steps

About this task

Important: Do not unregister or delete the failed vSnap server from IBM Spectrum Protect Plus. The failed vSnap server must remain registered for the replacement procedure to work correctly.

This procedure establishes a new vSnap server in your IBM Spectrum Protect Plus environment to replace the failed vSnap server. After the repair process is completed, the new vSnap server is recovered to a point where backup jobs can continue to back up incremental changes (no full backup required) and replication jobs can continue.

To determine which type of repair process is applicable to your vSnap server, see technote 1103847.

Note: The version of the new vSnap server must match the version of the deployed IBM Spectrum Protect™ Plus appliance.

Procedure

  1. Log in to the functioning vSnap server in your environment console with the ID serveradmin by using Secure Shell (SSH) protocol.
    Enter the following command: $ ssh serveradmin@MGMT_ADDRESS

    For example, $ ssh serveradmin@10.10.10.2

  2. Obtain the ID of the failed vSnap server by opening a command prompt and entering the following command:

    $ vsnap partner show

    The output is similar to the following example:

    ID: 12345678901234567890123456789012
    PARTNER TYPE: vsnap 
    MGMT ADDRESS: 10.10.10.1 
    API PORT: 8900 
    SSH PORT: 22
  3. Verify that the MGMT ADDRESS is the address of the failed vSnap server. Take note of the failed vSnap server's ID number.
  4. On the target vSnap server, install a new vSnap server of the same type and version, and with the same storage allocation, as the failed source vSnap server.
    For instructions about installing a vSnap server, see Installing a physical vSnap server.
    Important: Do not register the new vSnap server with IBM Spectrum Protect Plus. Do not use the Add Disk Storage wizard.
    1. You will first need to initialize the vSnap server with the following command:

      $ vsnap system init ----skip_pool id partner_id

      For example: $ vsnap system init --skip_pool –-id 12345678901234567890123456789012 using the failed source vSnap partner ID. A message indicates when the initialization is completed.

      Note: This command is different to the vSnap initialization command listed in the IBM Knowledge Center and in the Blueprints.
  5. Complete the vSnap server and pool creation process as outlined in Chapter 5: vSnap Server Installation and Setup in the Blueprints.
  6. Place the new vSnap server into maintenance mode by entering the following command:

    $ vsnap system maintenance begin

    Placing the vSnap server into maintenance mode suspends operations such as snapshot creation, data restore jobs, and replication operations.
  7. Initialize the new target vSnap server with the failed target vSnap server’s partner ID. Enter the following command to initialize the vSnap:

    $ vsnap system init --id partner_id

    The following command is an example: $ vsnap system init --id 12345678901234567890123456789012

  8. On the new target vSnap server, add the partner vSnap servers. If there is more than one partner server, each partner must be added separately. To add a partner, enter the following command:

    $ vsnap partner add --remote_addr remote_ip_address --local_addr local_ip_address

    where, remote_ip_address specifies the IP address of the source vSnap server, and local_ip_address specifies the IP address of the new target vSnap server.

    The following command is an example:

    $ vsnap partner add --remote_addr 10.10.10.1 --local_addr 10.10.10.2

  9. When prompted, enter the user ID and password for the source vSnap server.
    Informational messages indicate when the partners are created and updated successfully.
  10. Create a repair task on the new source vSnap server by entering the following command:

    $ vsnap repair create –-async

    The output of this command is similar to the following example:

    ID: 12345678901234567890123456789012 
    PARTNER TYPE: vsnap 
    PARTNER ID: abcdef7890abcdef7890abcdef7890ab 
    TOTAL VOLUMES: N/A 
    SNAPSHOTS RESTORED: N/A 
    RETRY: No 
    CREATED: 2019-11-01 15:49:31 UTC 
    UPDATED: 2019-11-01 15:49:31 UTC 
    ENDED: N/A 
    STATUS: PENDING 
    MESSAGE: The repair has been scheduled
  11. Monitor the number of volumes that are involved in the repair operation by entering the following command:

    $ vsnap repair show

    The output of this command is similar to the following example:
    ID: 12345678901234567890123456789012 
    PARTNER TYPE: vsnap 
    PARTNER ID: abcdef7890abcdef7890abcdef7890ab 
    TOTAL VOLUMES: 6 
    SNAPSHOTS RESTORED: N/A 
    RETRY: No 
    CREATED: 2019-11-01 15:49:31 UTC 
    UPDATED: 2019-11-01 15:49:31 UTC 
    ENDED: N/A 
    STATUS: ACTIVE 
    MESSAGE: Created 0 volumes
    There are 3 replica volumes whose snapshots will be restored on next replication. 
    There are 3 primary volumes that have recoverable snapshots, the latest snapshot of each will be restored. 
    The number of volumes that are involved in the repair operation are indicated in the TOTAL VOLUMES field 
  12. Monitor the status of the repair task by viewing the repair.log file on the new source vSnap server, in the following directory /opt/vsnap/log/repair.log. Alternatively, you can enter the following command:

    $ vsnap repair show

  13. When the status of the repair operation is in the ACTIVE state, you can view the status of individual repair sessions by entering the following command:

    $ vsnap repair session show

    The output is similar to this example:

    
    ID: 1
    RELATIONSHIP: 72b19f6a9116a46aae6c642566906b31
    PARTNER TYPE: vsnap
    LOCAL SNAP: 1313
    REMOTE SNAP: 311
    STATUS: ACTIVE
    SENT: 102.15GB
    STARTED: 2019-11-01 15:51:18 UTC
    ENDED: N/A
    View a session for each of the source volumes in the repair operation. The amount of data that is sent for each volume shows increasing incremental values until the process completes. The final message indicates the number of volumes whose snapshots will be restored by the next replication operation, as shown in this example:
    Created 0 volumes. There are 3 replica volumes whose snapshots will be restored on next replication. 
  14. For any snapshots that are not restored and indicate a FAILED status, resubmit the repair process by entering the following command:
    $ vsnap repair create --async –-retry 
  15. When the repair process reports a COMPLETED status, you can resume normal operations for the vSnap server by moving it out of maintenance mode. To resume normal processing, enter the following command:
    $ vsnap system maintenance complete
  16. Optional: To view the total volumes and number of snapshots that were restored during the repair operation, run the show command for the vSnap server.
    The output includes the following information:
    • Total volumes lists the total number of volumes that were inspected during the repair operation. This list includes the source volumes (primary volumes) where the latest recovery point backup was restored, and target volumes (replica volumes) that are repopulated during upcoming replication operations as scheduled in SLAs.
    • SNAPSHOTS RESTORED lists the number of source volumes that were restored.
  17. Remove saved SSH host keys from the repaired source vSnap server and the target vSnap servers.

    Run the following commands on both the source and target vSnap servers:

    $ sudo rm -f /home/vsnap/.ssh/known_hosts
    $ sudo rm -f /root/.ssh/known_hosts

    Removing the SSH keys ensures that subsequent replication transfers do not produce errors that result from the changed host key of the repaired vSnap server.

  18. Restart the vSnap service on the replaced server by entering the following command:
    $ sudo systemctl restart vsnap
  19. Click System Configuration > Backup Storage > Disk to verify that the new vSnap server is correctly registered, as follows:
    • If the new vSnap server is using the same hostname or IP address for registration, no change is required.
    • If the new vSnap server is using a different hostname or IP address for registration, you must update the registration by selecting the pencil icon.
  20. To remove recovery points that are no longer available on the source vSnap server, start a maintenance job from the IBM Spectrum Protect Plus user interface.
    Follow the instructions here to do this, Creating jobs and job schedules.
    Tip: You might see informational messages that are similar to the following example:
    CTGGA1843 storage snapshot spp_1005_2102_2_16de41fcbc3 not found on live Storage2101 Snapshot Type vsnap
  21. To resume jobs that failed after the vSnap server became unavailable, run a storage server inventory job. For instructions, see Creating jobs and job schedules.

Results

For primary backup data that is stored on the repaired vSnap server, the latest recovery point for primary backup data is now available. Subsequent backups to the repaired vSnap server continue to send only incremental changes since the last backup. For replicated data stored on the repaired vSnap server, no replicated data is available immediately after the repair. Subsequent replication jobs from the partner vSnap server will repopulate any backups that are created on the partner vSnap server after the repair process was completed. If a replication job is attempted on the partner vSnap server before a backup is completed on the partner vSnap server, a warning message is displayed indicating that there are no new snapshots since the last backup:
CTGGA0289 - Skipping volume <volume_id> because there are no new snapshots since last backup

If the repaired vSnap server was acting as a copy source to object or archive storage, a backup job must first be run on the repaired vSnap server before any additional copy operations will be successful. The first copy of data to object storage will be a full copy.

Document Location

Worldwide

[{"Type":"SW","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"ARM Category":[{"code":"a8m3p000000h9Z9AAI","label":"Product Documentation"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1.5;10.1.6;10.1.7"}]

Document Information

Modified date:
13 April 2021

UID

ibm11107549