Persistent volume on Data Virtualization head pod becomes full

Created by Tom Lee on Thu, 03/10/2022 - 15:57

Published URL:

https://www.ibm.com/support/pages/node/6562489

6562489

Troubleshooting

Problem

In Data Virtualization on IBM Cloud Pak for Data, the persistent volume (PV) on the head pod becomes full.

Note: Starting in Cloud Pak for Data 4.6.0, Data Virtualization is renamed Watson Query.

Cause

Archived transaction logs or other files, such as Db2 panic and dump files in $DIAGPATH/NODE0000, in the embedded Db2 database built up over time and are now taking a significant amount of space on the PV.

Diagnosing The Problem

To check that the PV is almost full, do the following steps:

Connect to the Data Virtualization head pod as the db2inst1 user by running the following commands:
```
oc rsh c-db2u-dv-db2u-0 bash
```
```
su - db2inst1
```
Check that the PV is almost full:
```
df -h /mnt
```
To check that a significant amount of this space is consumed by Db2 transaction logs, run the following command:
```
du -sh /mnt/bludata0/db2/archive_log
```
To check that a significant amount of this space is consumed by Db2 diagnostic logs and dumps, run the following command:
```
du -sh $DIAGPATH/NODE0000
```

Resolving The Problem

To resolve the problem, do the following series of steps.

Disable the Big SQL daemon and the Data Virtualization head pod liveness probe script.

Connect to the Data Virtualization head pod as the db2inst1 user:
```
oc rsh c-db2u-dv-db2u-0 bash
```
```
su - db2inst1
```

Disable the Big SQL daemon.

db2uctl markers create BIGSQL_DAEMON_PAUSE

Disable the Data Virtualization head pod liveness probe script by creating a marker file.
```
touch ~db2inst1/ibm/bigsql/skipliveness.txt
```

Update the db2nodes.cfg file with the correct list of head nodes and worker nodes.

List one node information per line in the db2nodes.cfg file.

For example, if you have one head node and one worker, make sure db2nodes.cfg has the following information. Make sure to replace the <dv pod namespace> placeholder with the correct namespace information.

0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local 0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local
1 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal 0 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal

If you have one head node and two workers, your db2nodes.cfg file is similar to the following example. Notice that the index number at the beginning of each line of the db2nodes.cfg file changes when more entries are added to the file.

0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local 0 c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.<dv pod namespace>.svc.cluster.local
1 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal 0 c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal
2 c-db2u-dv-db2u-2.c-db2u-dv-db2u-internal 0 c-db2u-dv-db2u-2.c-db2u-dv-db2u-internal

Check whether the user.json file is empty (0 bytes in size). If needed, recover the file by doing the following step.

Replace the corrupted users JSON with a new copy from the template:

db2_all "rm /mnt/blumeta0/db2_config/users.json; cp /sec_plugin/users.json /mnt/blumeta0/db2_config/users.json"

If Db2 is not running and cannot be started due to the PV being full, do the following steps to free up some space on the PV to allow Db2 to start:

Safely clean up transaction logs by following the instructions in one of the following pages, depending on the Cloud Pak for Data version that you are using:
- Cloud Pak for Data 4.x: Managing transaction logs (Db2 11.5)
- Cloud Pak for Data 5.x: Managing transaction logs (Db2 12.1)

Delete old backup directories:

rm -rf /mnt/bludata0/river
rm -rf /mnt/blumeta0/bigsql/backup

Delete logs and files that are older than 30 days:

log_retain_days=+30

find /mnt/PV/versioned/logs/ -mtime ${log_retain_days} -name "*.log" -type f -delete
find /mnt/PV/versioned/logs/ -mtime ${log_retain_days} -name "*.err" -type f -delete
find /mnt/PV/versioned/logs/ -mtime ${log_retain_days} -name "*.zip" -type f -delete

find /var/log/bigsql/cli/ -mtime ${log_retain_days} -name "*.log" -type f -delete

find /var/ibm/bigsql/diag/ -mtime ${log_retain_days} -name "*.log" ! -name "db2diag*.log" -type f -delete
find /var/ibm/bigsql/diag/ -mtime ${log_retain_days} -name "core*" -type f -delete
find /var/ibm/bigsql/diag/ -mtime ${log_retain_days} -name "FODC*" -type d -delete

Truncate and roll large log files (greater than 256MB):

mb_to_keep=256
log_backup_retain_days=+30

bytes_to_keep=$(( mb_to_keep * 1024 * 1024 ))
log_files=("/mnt/PV/versioned/logs/liveness.log" \
    "/mnt/PV/versioned/logs/liveness.err" \
    "/mnt/PV/versioned/logs/$(hostname)*.log" \
    "/mnt/PV/versioned/logs/dv_bigsql*.log" \
    "/var/log/bigsql/cli/bigsql-db2u-container-daemon.log" \
    "/var/log/bigsql/cli/bigsql-db2ubar-hook.log" \
)
for logfile in ${log_files[@]}; do
    if [ -s $logfile ]; then
        logFileSize=$(stat --format=%s "$logfile")
        if [[ $logFileSize -gt $bytes_to_keep ]]; then
            now="$(date -u +"%Y-%m-%d_%H.%M.%S.%3N_%Z")"
            tail -c $bytes_to_keep $logfile > $logfile.$now.bak
            logfile_dir="$(dirname $logfile)"
            cd $logfile_dir
            find . -mindepth 1 -iname '*.bak' -mtime +${log_backup_retain_days} -delete
            echo "" > $logfile
        fi
    fi
done

Exit Data Virtualization head pod.

Scale down the db2u-dv db2u stateful set to 0 replicas and then back to the original number:

oc get sts c-db2u-dv-db2u    ##note the original number of pods/replicas in the statefulset

oc scale sts c-db2u-dv-db2u --replicas=0

oc get po | grep c-db2u-dv-db2u   ##wait until the db2u-dv db2u pods have terminated

oc scale sts c-db2u-dv-db2u --replicas=<original number of pods/replicas in the statefulset>

Connect to the Data Virtualization head pod as the db2inst1 user:
```
oc rsh c-db2u-dv-db2u-0 bash
```
```
su - db2inst1
```
Disable the Data Virtualization head pod liveness probe script by creating a marker file.
```
touch ~db2inst1/ibm/bigsql/skipliveness.txt
```

Stop agent processes:

/opt/dv/current/qp/helper_scripts/qp_agents stop

Clear temporary directories:

rm -rf /mnt/PV/versioned/dv_data/qpendpoints/data1/gaiandb6415 /mnt/PV/versioned/dv_data/qpendpoints/data2/gaiandb6416 /mnt/PV/versioned/dv_data/qpendpoints/data3/gaiandb6417 /mnt/PV/versioned/dv_data/qpendpoints/data4/gaiandb6418 /mnt/PV/versioned/dv_data/qpendpoints/data5/gaiandb6419

Start agent processes:

/opt/dv/current/qp/helper_scripts/qp_agents start

Wait for agents to start. It might take up to 30 seconds until all agents are available. Using the Db2 command line:
```
db2 connect to bigsql;
db2 "select 'Agents are available' from DVSYS.LISTNODES WHERE AGENT_TIER='H' HAVING COUNT(*) >= 5";
```
Rerun the second statement until the text 'Agents are available' is displayed.

Restore the current configuration to all agents:

db2 "call dvsys.restoreconfig('',?,?);

This command returns the following output:

Value of output parameters
  --------------------------
  Parameter Name  : NUM_RESTORED
  Parameter Value : 6

  Parameter Name  : DIAGS
  Parameter Value :

  Return Status = 0

Close the service connection:
```
db2 connect reset;
```

If needed, revert the embedded Db2 database to default circular transaction logging by doing the following steps.

Note: This procedure assumes that no online Db2 backups of the bigsql database are being performed for the service. If online Db2 backups are being taken, do not implement these steps.

Connect to Db2:
```
db2 connect to bigsql
```
Confirm the current LOGARCHMETH1 setting and that the disk location is DISK:/mnt/bludata0/db2/archive_log/:
```
db2 get db cfg |grep LOGARCHMETH1
```
Reconfigure the database to use circular logging:
```
db2 update db cfg using LOGARCHMETH1 OFF
```
Restart the BIG SQL server:
```
bigsql stop; bigsql start
```
Connect to Db2 and confirm that circular logging is applied (LOGARCHMETH1 is OFF):
```
db2 connect to bigsql
```
```
db2 get db cfg |grep LOGARCHMETH1
```
Free up space on the PV by removing the old archived transaction log files:
```
rm -rf /mnt/bludata0/db2/archive_log/db2inst1/BIGSQL
```
Close the service connection:
```
db2 connect reset;
```

Re-enable the Data Virtualization head pod liveness probe script and the Big SQL daemon.

Re-enable the Data Virtualization head pod liveness probe script by deleting the marker file.
```
rm -rf ~db2inst1/ibm/bigsql/skipliveness.txt
```

Re-enable the Big SQL daemon.

oc rsh c-db2u-dv-db2u-0 db2uctl markers delete BIGSQL_DAEMON_PAUSE

Related Information

Persistent volume on Db2 Big SQL head pod becomes full

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m3p000000UoTZAA0","label":"Storage-\u003EStorage Volume"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.0.0;4.0.1;4.0.2;4.0.3;4.0.4;4.0.5;4.0.6;4.0.7;4.0.8;4.5.0;4.5.1;4.5.3;4.6.0;4.6.2;4.6.4;4.7.0;4.7.3;4.8.0;5.0.0;5.0.1;5.0.2;5.0.3;5.1.0;5.1.1"}]