Persistent volume on Data Virtualization head pod becomes full

Created by Tom Lee on Thu, 03/10/2022 - 15:57
Published URL:
https://www.ibm.com/support/pages/node/6562489
6562489

Troubleshooting


Problem

In Data Virtualization on IBM Cloud Pak for Data, the persistent volume (PV) on the head pod becomes full.
Note: Starting in Cloud Pak for Data 4.6.0, Data Virtualization is renamed Watson Query.

Cause

Archived transaction logs (in Cloud Pak for Data 4.0.x) or other files, such as Db2 panic and dump files in $DIAGPATH/NODE0000, in the embedded Db2 database built up over time and are now taking a significant amount of space on the PV.

Diagnosing The Problem

To check that the PV is almost full, do the following steps:
  1. Connect to the Data Virtualization head pod as the db2inst1 user by running the following commands:
    oc rsh c-db2u-dv-db2u-0 bash
    su - db2inst1
  2. Check that the PV is almost full:
    df -h /mnt
  3. To check that a significant amount of this space is consumed by Db2 transaction logs, run the following command:
    du -sh /mnt/bludata0/db2/archive_log
  4. To check that a significant amount of this space is consumed by Db2 diagnostic logs and dumps, run the following command:
    du -sh $DIAGPATH/NODE0000

Resolving The Problem

To resolve the problem, do the following series of steps.
Disable the Big SQL daemon and the Data Virtualization head pod liveness probe script.
  1. Connect to the Data Virtualization head pod as the db2inst1 user:
    oc rsh c-db2u-dv-db2u-0 bash
    su - db2inst1
  2. Disable the Big SQL daemon.
    db2uctl markers create BIGSQL_DAEMON_PAUSE
  3. Disable the Data Virtualization head pod liveness probe script by creating a marker file.
    touch ~db2inst1/ibm/bigsql/skipliveness.txt
 
Update the db2nodes.cfg file with the correct list of head nodes and worker nodes.
  1. List one node information per line in the db2nodes.cfg file.
For example, if you have one head node and one worker, make sure db2nodes.cfg has the following information. Make sure to replace the <dv pod namespace> placeholder with the correct namespace information.
0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local 0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local
1 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal 0 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal
If you have one head node and two workers, your db2nodes.cfg file is similar to the following example. Notice that the index number at the beginning of each line of the db2nodes.cfg file changes when more entries are added to the file.
0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local 0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local
1 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal 0 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal
2 c-db2u-dv-db2u-2.c-db2u-dv-db2u-internal 0 c-db2u-dv-db2u-2.c-db2u-dv-db2u-internal
Check whether the user.json file is empty (0 bytes in size). If needed, recover the file by doing the following step.
  1. Replace the corrupted users JSON with a new copy from the template:
    db2_all "rm /mnt/blumeta0/db2_config/users.json; cp /sec_plugin/users.json /mnt/blumeta0/db2_config/users.json"
 
If Db2 is not running and cannot be started due to the PV being full, do the following steps to free up some space on the PV to allow Db2 to start:
  1. List the archived transaction logs:
    ls -ltd /mnt/bludata0/db2/archive_log/db2inst1/BIGSQL/NODE0000/LOGSTREAM0000/C0000000/*.LOG
  2. Delete the oldest archived transaction log:
    rm $(ls -td /mnt/bludata0/db2/archive_log/db2inst1/BIGSQL/NODE0000/LOGSTREAM0000/C0000000/*.LOG | tail -1)
  3. Repeat the previous step to remove enough archived transactions logs so that there is approximately 5% of free space reported by the df -h /mnt command.
    If you aren't able to clear enough space by deleting archive logs, or if there are no archive logs, proceed with the following steps to delete old backups and log files.
  4. Delete old backup directories:
    rm -rf /mnt/bludata0/river
    rm -rf /mnt/blumeta0/bigsql/backup
  5. Delete logs and files that are older than 30 days:
    log_retain_days=+30
    
    find /mnt/PV/versioned/logs/ -mtime ${log_retain_days} -name "*.log" -type f -delete
    find /mnt/PV/versioned/logs/ -mtime ${log_retain_days} -name "*.err" -type f -delete
    find /mnt/PV/versioned/logs/ -mtime ${log_retain_days} -name "*.zip" -type f -delete
    
    find /var/log/bigsql/cli/ -mtime ${log_retain_days} -name "*.log" -type f -delete
    
    find /var/ibm/bigsql/diag/ -mtime ${log_retain_days} -name "*.log" ! -name "db2diag*.log" -type f -delete
    find /var/ibm/bigsql/diag/ -mtime ${log_retain_days} -name "core*" -type f -delete
    find /var/ibm/bigsql/diag/ -mtime ${log_retain_days} -name "FODC*" -type d -delete
  6. Truncate and roll large log files (greater than 256MB):
    mb_to_keep=256
    log_backup_retain_days=+30
    
    bytes_to_keep=$(( mb_to_keep * 1024 * 1024 ))
    log_files=("/mnt/PV/versioned/logs/liveness.log" \
        "/mnt/PV/versioned/logs/liveness.err" \
        "/mnt/PV/versioned/logs/$(hostname)*.log" \
        "/mnt/PV/versioned/logs/dv_bigsql*.log" \
        "/var/log/bigsql/cli/bigsql-db2u-container-daemon.log" \
        "/var/log/bigsql/cli/bigsql-db2ubar-hook.log" \
    )
    for logfile in ${log_files[@]}; do
        if [ -s $logfile ]; then
            logFileSize=$(stat --format=%s "$logfile")
            if [[ $logFileSize -gt $bytes_to_keep ]]; then
                now="$(date -u +"%Y-%m-%d_%H.%M.%S.%3N_%Z")"
                tail -c $bytes_to_keep $logfile > $logfile.$now.bak
                logfile_dir="$(dirname $logfile)"
                cd $logfile_dir
                find . -mindepth 1 -iname '*.bak' -mtime +${log_backup_retain_days} -delete
                echo "" > $logfile
            fi
        fi
    done
  7. Exit Data Virtualization head pod.
  8. Scale down the db2u-dv db2u stateful set to 0 replicas and then back to the original number:
    oc get sts c-db2u-dv-db2u    ##note the original number of pods/replicas in the statefulset
     
    oc scale sts c-db2u-dv-db2u --replicas=0
     
    oc get po | grep c-db2u-dv-db2u   ##wait until the db2u-dv db2u pods have terminated
     
    oc scale sts c-db2u-dv-db2u --replicas=<original number of pods/replicas in the statefulset>
  9. Connect to the Data Virtualization head pod as the db2inst1 user:
    oc rsh c-db2u-dv-db2u-0 bash
    su - db2inst1
  10. Disable the Data Virtualization head pod liveness probe script by creating a marker file.
    touch ~db2inst1/ibm/bigsql/skipliveness.txt
  11. Stop agent processes:
    /opt/dv/current/qp/helper_scripts/qp_agents stop
  12. Clear temporary directories:
    rm -rf /mnt/PV/versioned/dv_data/qpendpoints/data1/gaiandb6415 /mnt/PV/versioned/dv_data/qpendpoints/data2/gaiandb6416 /mnt/PV/versioned/dv_data/qpendpoints/data3/gaiandb6417 /mnt/PV/versioned/dv_data/qpendpoints/data4/gaiandb6418 /mnt/PV/versioned/dv_data/qpendpoints/data5/gaiandb6419
  13. Start agent processes:
    /opt/dv/current/qp/helper_scripts/qp_agents start
  14. Wait for agents to start. It might take up to 30 seconds until all agents are available. Using the Db2 command line:
    db2 connect to bigsql;
    db2 "select 'Agents are available' from DVSYS.LISTNODES WHERE AGENT_TIER='H' HAVING COUNT(*) >= 5";
    Rerun the second statement until the text 'Agents are available' is displayed.
  15. Restore the current configuration to all agents:
    db2 "call dvsys.restoreconfig('',?,?);
    This command returns the following output:
    Value of output parameters
      --------------------------
      Parameter Name  : NUM_RESTORED
      Parameter Value : 6
    
      Parameter Name  : DIAGS
      Parameter Value :
    
      Return Status = 0
  16. Close the service connection:
    db2 connect reset;
If needed, revert the embedded Db2 database to default circular transaction logging by doing the following steps.
Note: This procedure assumes that no online Db2 backups of the bigsql database are being performed for the service. If online Db2 backups are being taken, do not implement these steps.
  1. Connect to Db2:
    db2 connect to bigsql
  2. Confirm the current LOGARCHMETH1 setting and that the disk location is DISK:/mnt/bludata0/db2/archive_log/:
    db2 get db cfg |grep LOGARCHMETH1
  3. Reconfigure the database to use circular logging:
    db2 update db cfg using LOGARCHMETH1 OFF
  4. Restart the BIG SQL server:
    bigsql stop; bigsql start
  5. Connect to Db2 and confirm that circular logging is applied (LOGARCHMETH1 is OFF):
    db2 connect to bigsql
     
    db2 get db cfg |grep LOGARCHMETH1
  6. Free up space on the PV by removing the old archived transaction log files:
    rm -rf /mnt/bludata0/db2/archive_log/db2inst1/BIGSQL
  7. Close the service connection:
    db2 connect reset;
Re-enable the Data Virtualization head pod liveness probe script and the Big SQL daemon.
  1. Re-enable the Data Virtualization head pod liveness probe script by deleting the marker file.
    rm -rf ~db2inst1/ibm/bigsql/skipliveness.txt
  2. Re-enable the Big SQL daemon.
    oc rsh c-db2u-dv-db2u-0 db2uctl markers delete BIGSQL_DAEMON_PAUSE

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m3p000000UoTZAA0","label":"Storage-\u003EStorage Volume"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.0.0;4.0.1;4.0.2;4.0.3;4.0.4;4.0.5;4.0.6;4.0.7;4.0.8;4.5.0;4.5.1;4.5.3;4.6.0;4.6.2;4.6.4;4.7.0;4.7.3;4.8.0"}]

Document Information

Modified date:
30 November 2023

UID

ibm16562489