virtual box vdi takes up much more room than actual size

5,749

Solution 1

The answer to this is actually quite simple.

Anytime your guest writes to the disk in a place that has not been written to before, then the VDI grows in size to accommodate the "new" data. VirtualBox doesn't have any knowledge of what is stored on the disk. So, it doesn't matter if it is unused space. The only thing that matters, is that the space was used at some point.

Now, if the guest wrote 20Gb of contiguous data (one byte following the next), then the VDI file would require 20GB of physical hard drive space.

But, that's not how it works in reality. Instead the guest VM will be continually reading/writing/moving data around. Even if the same 20GB of data is moved around to new areas of the partition, the VDI will grow to accommodate the "new" data.

The VDI file will never shrink, and eventually will always max out at your pre-defined maximum size.

Solution 2

Thin (or so called sparse or dynamic in virtualbox terminology) disk images will grow in size over time but will never shrink. Here is an example: You create 10GB thin VDI disk, mount it and then start creating / deleting series of relatively small files in a loop. Even though each of these small files will not be bigger than 100MB - your VDI disk will quickly grow to the maximum size.

Let's start off with the 10G disk, which was just created and mounted. This is how it looks like in VM

[root@localhost ~]# df -h | egrep "^Filesystem|test"
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/test-lvol    9.9G   23M  9.4G   1% /mnt

And this is how it looks like outside of VM (on the host)

ls -lh|grep New
-rw-------  1 dmitryzayats  staff    94M Oct 28 23:33 NewVirtualDisk1.vdi

So it only takes slightly less than 100M on a host.

Now we will run this one liner script in a sequence. Creating and deleting 1000 relatively small 100M size files. At any given point in time our VM will show that we only consume 200M on the file system.

for i in {1..1000}; do echo "Run=$i"; dd if=/dev/urandom of=/mnt/testfile${i} bs=1M count=100; df -h; rm -f /mnt/testfile$(($i-1)); done

Run=319
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.944908 s, 111 MB/s
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 992M     0  992M   0% /dev
tmpfs                   1001M     0 1001M   0% /dev/shm
tmpfs                   1001M  584K 1000M   1% /run
tmpfs                   1001M     0 1001M   0% /sys/fs/cgroup
/dev/mapper/fedora-root   13G  4.3G  7.4G  37% /
/dev/sda1                976M   82M  828M   9% /boot
tmpfs                   1001M  4.0K 1001M   1% /tmp
tmpfs                    201M     0  201M   0% /run/user/0
/dev/mapper/test-lvol    9.9G  223M  9.2G   3% /mnt

But things will look very different from the host perspective.

ls -lh|grep New
-rw-------  1 dmitryzayats  staff   9.8G Oct 29 00:13 NewVirtualDisk1.vdi

The reason for that is that every time new file is created - OS can write data to a different location on the block device. Then when file is deleted - from the point of view of the VM space is now free, but on the underlying VDI space was already reserved, size of the VDI file grew and there is no way to shrink it. There are some possibilities to actually decrease it, but that would require shutting down VM and punching holes in the VDI file. You can google for "punching holes in sparse files".

Some workloads are behaving particularly bad with thin disks. For example if Oracle database is in archive mode - it will create many archive logs and even if those are deleted regularly by running rman - it will quickly fill thin disk to it's maximum size.

Share:
5,749

Related videos on Youtube

Nikhil Komalan
Author by

Nikhil Komalan

a cloud-based block chain enabled disruptive distributed ledger encompassing big data </sarcasm> some handy things i keep forgetting find all services with name sc queryex type= service state= all | find /i "service-name" enable telnet client (from admin prompt) dism /online /Enable-Feature /FeatureName:TelnetClient

Updated on September 18, 2022

Comments

  • Nikhil Komalan
    Nikhil Komalan over 1 year

    I have a dynamic disk ubuntu instance that df reports is using 30gig (this is about what df reports too) and yet the vdi file is 130gig on disk:

    enter image description here

    enter image description here

    I don't recall it taking up this much room prior - (but not 100% on this) - its almost as if the guest was using more disk space and then one day the host has suddenly resized it to a much larger size than is required.

    I don't really want to shrink the dynamically sized disk (which isnt possible) but am wondering why the actual VDI needs to assume 5x the space of the underlying guest - doesnt seem very 'dynamic'

  • Nikhil Komalan
    Nikhil Komalan over 6 years
    makes sense given the context of what i was doing too which was downloading the 30 gig eth blockchain