Running a simulation on pure Ubuntu vs on Ubuntu in Windows (WSL)

10,893

Solution 1

Your simulation software is most likely either CPU bound or memory bound. For such workloads, one would not except to see any significant difference between running the code on "bare metal" or inside WSL (or any other compatibility layer or VM that uses native execution), since in either case the OS is mostly just standing by while the simulation code runs directly on the CPU.

However, it's also possible that your simulation is at least partially I/O bound, and that's where differences may emerge. Apparently, WSL (currently) has a rather slow filesystem interface layer that can slow down disk I/O significantly.* That said, while disk I/O can be the major bottleneck for many kinds of bulk data processing tasks, a "simulation" usually should not be spending the majority of its time reading and writing files. If yours is, you may want to consider running it from a RAM disk (e.g. tmpfs on native** Linux) to avoid needless physical disk access.

In any case, the only way to be sure is to test your simulation in both environments and time how long it takes to run. Before doing that, however, you may want to take a look at existing benchmarks, like this WSL vs. Docker vs. VirtualBox vs. native Linux performance benchmark by Phoronix from February 2018, and examine the results for any tests that stress the same components of the system as your simulation does.

(FWIW, the Phoronix results seem to mostly match the general principles I outlined above, although there are a few notable oddities like VirtualBox apparently outperforming native Linux in a few I/O bound benchmarks, apparently due to its virtual disk not always immediately syncing data to the physical disk. One potentially relevant issue that I failed to note above is that the benchmarks show significant differences in multi-threaded OpenMP performance both between the different host environments and also between different Linux distros even when running on bare hardware. In hindsight, that's not too surprising, since threading and IPC is handled by the kernel. I'd guess that much of the difference between the distros there may come down to different runtime and/or compile time kernel tuning parameters.)


*) According to this MSDN blog post from 2016, there are actually two filesystem interface components in WSL: VolFs, which closely emulates native Linux filesystem semantics over NTFS and is used to mount e.g. / and /home, and DrvFs, which provides mostly Windows-like semantics and is used for accessing the host Windows drives via /mnt/c etc. If your software doesn't specifically require native Linux filesystem features like multiple hard links to the same file, configuring it to store its data files in a DrvFs folder may improve file access performance on WSL.

**) According to this Reddit thread from May 2017, "tmpfs is currently emulated using disk" on WSL. Unless something has changed over the last year, this presumably means that using tmpfs on WSL gives no performance benefit over using a normal on-disk filesystem.

Solution 2

Ubuntu in Windows (WSL - 2017 Fall Creators Update) is definitely slower than "Pure" Ubuntu in Linux environment.

For example screen painting takes many times longer in Windows 10 versus Ubuntu 16.04, ie you can actually see the cursor move in Windows 10:

WSL bash startup.gif

It takes about 5 seconds for the WSL Bash splash screen to paint. By comparison it is about 1 1/2 seconds for the same splash screen in Ubuntu 16.04:

Ubuntu terminal splash.gif


CPU Benchmarking

The first section shows how slow screen I/O is but what about CPU benchmarking?

From this Ask Ubuntu Q&A: CPU benchmarking utility for Linux, I ran tests on Ubuntu 16.04 on Linux and Windows. On Linux about 24 seconds on Windows 10 version 1709 about 31 seconds. Linux is 6 seconds faster or about 25% faster. However I just upgraded Windows 10 to version 1803 (Redstone 4 aka Spring Creators April 2018 update) and it took 24 seconds which is the same as Linux.

Ubuntu 16.04 on Linux

$ sysbench --test=cpu --cpu-max-prime=20000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          23.5065s
    total number of events:              10000
    total time taken by event execution: 23.5049
    per-request statistics:
         min:                                  2.13ms
         avg:                                  2.35ms
         max:                                  8.52ms
         approx.  95 percentile:               2.76ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   23.5049/0.00

Ubuntu 16.04 on Windows 10 build 1709

$ sysbench --test=cpu --cpu-max-prime=20000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          30.5350s
    total number of events:              10000
    total time taken by event execution: 30.5231
    per-request statistics:
         min:                                  2.37ms
         avg:                                  3.05ms
         max:                                  6.21ms
         approx.  95 percentile:               4.01ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   30.5231/0.00

Ubuntu 16.04 on Windows 10 build 1803

$ sysbench --test=cpu --cpu-max-prime=20000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          23.7223s
    total number of events:              10000
    total time taken by event execution: 23.7155
    per-request statistics:
         min:                                  2.21ms
         avg:                                  2.37ms
         max:                                  4.53ms
         approx.  95 percentile:               2.73ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   23.7155/0.00

NOTE: Windows 10 spring update for 2018 (dubbed Redstone 4) came out on May 9th (4 days ago) and I will be installing it soon to check out the improvements. No doubt there are many. One I know of that interests me is the ability to run cron jobs on startup. I need that for automatic daily backups to gmail.com.

NOTE 2: I've just installed Windows 10 Build 1803 (April 2018 Spring Creators Update AKA Redstone 4) and the screen painting is much much faster. It's now only 3 seconds instead of 5 seconds to display the Bash splash screen. The CPU benchmark is on par with Linux now.

Solution 3

Think about it - in WSL your computer is running the full graphical Windows system (which is a horrific resource hog in the first place) plus the Ubuntu subsystem. In native Ubuntu it's only running Ubuntu.

Solution 4

I don't know whether this will affect your simulation in particular, but it might:

WSL does NOT use RAM for shared memory! It uses the disk!

This means, if your simulation uses shared memory (think /dev/shm), it may be slow and/or wear out your storage device! And the performance penalty comes from several layers:

  • The file system driver

  • The storage driver

  • The storage medium

But if it doesn't do this, then the performance should be similar to that on bare-metal Ubuntu (assuming no other I/O, as others have mentioned).

Share:
10,893

Related videos on Youtube

ABCDEMMM
Author by

ABCDEMMM

Updated on September 18, 2022

Comments

  • ABCDEMMM
    ABCDEMMM almost 2 years

    I would like to ask a question about testing a large CAE simulation on the same computer in the following two situations.

    1. Pure Ubuntu system
    2. Ubuntu system in Windows 10 (WSL)

    Are the calculation speeds in both cases almost the same or are they different?

    • Admin
      Admin about 6 years
      Without knowing the nature of the simulation, this is impossible to answer.
    • Admin
      Admin about 6 years
      @muru: It's not that vague. A "simulation" is presumably a computationally intensive background job, which makes it either CPU or memory bound. (Disk or network I/O might also be a bottleneck, but that's something people writing such programs tend to avoid, and some modern simulation code may even use the GPU for parallel computation.) One could pretty easily write (or download) a benchmark that tests all these 2 to 5 possible bottlenecks, and check if there's any significant difference between WSL and native Ubuntu for any of them. I'd do it, but I don't have WSL (or Windows 10) available.
    • Admin
      Admin about 6 years
      @IlmariKaronen "presumably". Depending on the data bring crunched, it could just as well be IO intensive even if CPU bound. And the rest of your comment is a pretty good reason for closing this - we have no idea of what possible combination of bottlenecks matters here.
    • Admin
      Admin about 6 years
      Well, I did post an answer, since it turns out that suitable benchmarks are already online. Obviously, I cannot say for sure whether the OP's specific simulation code will run slower on WSL or not; but in any case, an answer to that question is of no use to anyone but the OP anyway. What I can answer, based on the benchmarks, is what types of simulation code could reasonably be expected to have performance differences between WSL and native Linux.
    • Admin
      Admin about 6 years
      @muru, it is a CAE Simulation (Abaqus CAE).
  • muru
    muru about 6 years
    Note that this is misleading - this doesn't distinguish I/O performance and other computational performance. WSL is known to be slow for I/O (see e.g., Phoronix benchmarks). That does not say anything about whether OP's calculations can be done just as fast in WSL.
  • Ilmari Karonen
    Ilmari Karonen about 6 years
    I'm honestly surprised that drawing the splash screen isn't effectively instant in both cases. Your computer is (presumably) happy to do far more complex screen updates in a few milliseconds e.g. when playing video. And the last time I saw a terminal as slow as in your first recording was in the early 90's, when dialing up a BBS on my 2400 bps modem.
  • Rinzwind
    Rinzwind about 6 years
    and you can kill the desktop if you want
  • ABCDEMMM
    ABCDEMMM about 6 years
    how to kill the desktop ?
  • Jon Bentley
    Jon Bentley about 6 years
    What do you mean by "Ubuntu in Linux"?
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    Actually rather than killing the desktop, skip the step of installing it. Ubuntu in Windows 10 Store doesn't include the desktop You have to do extra steps to get it working.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @JonBentley Ubuntu running on Linux Kernel rather than Ubuntu running on Windows 10.
  • Matteo Italia
    Matteo Italia about 6 years
    Honestly, this kind of benchmark is completely useless for any kind of realistic program, as any benchmark that essentially measures the console painting speed. Either your program bottleneck is console I/O (which is notoriously slow even on Linux with most terminal emulators), or this isn't a reliable measure of anything useful.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @muru OP's The slow calendar drawing shown above does involve a computation loop but slow screen I/O is probably the biggest factor. However the OP's program is also screen I/O bound: Abaqus/CAE. Complete solution for Abaqus finite element modeling, visualization, and process automation. With Abaqus/CAE you can quickly and efficiently create, edit, monitor, diagnose, and visualize advanced Abaqus analyses.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @IlmariKaronen See my last comment about computation loop.
  • muru
    muru about 6 years
    @WinEunuuchs2Unix From what I can see, there's little compuation. but lots of I/O: fetching the weather from somewhere, reading the date and time, and printing it in a format, reading system information, etc. Anyway, have you ever used Abaqus? Simulation software like it or Ansys or Simulink are not screen I/O bound when running the actual simulation unless you force the simulation to be so. It's perfectly possible for these to show the just end results depending on the simulation done.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @MatteoItalia Linux is not slow by comparison as you can see with the exact same program running in the second GIF. The same slowness in Windows and Linux does occur between the first half and second half of the screen when the DISK I/O occurs to get system information.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @muru I added CPU benchmarking. And no I haven't used Abaqus.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @IlmariKaronen see my comment to Muru.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @MatteoItalia see my comment to Muru. Thanks to all three of you for taking time critiquing original answer.
  • Matteo Italia
    Matteo Italia about 6 years
    That's more like it. Now, what would be interesting is how this slowness is accounted for - the CPU itself isn't going to run slower changing OS, maybe it's preempted more often due to other stuff running in background (Windows systems are always unreasonably busy), bad scheduling, or slower thread synchronization primitives?
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @MatteoItalia From my limited viewing of MS videos everything a Linux program does goes through a translation layer into Windows NT Kernel "language". System Call is the technique I think they use but it's been six months since I watched the videos. I'm running Bash In Windows right now and typing this in Chrome on Windows 10. It's kind of fun :) Also I ran the test whilst Windows had just booted and it was in the process of downloading months of updates. I'll rerun the test later after reboot and "settling in".
  • Matteo Italia
    Matteo Italia about 6 years
    @WinEunuuchs2Unix: yes, Linux syscalls are implemented in terms of native NT functions (with some extra bookkeeping that is done IIRC in a driver); that's mostly how Win32 calls themselves are implemented, BTW - the NT kernel was designed from the beginning with this in mind (the initial idea was to have Win32, OS/2 and POSIX "personalities" built over the NT kernel). Still, for pure calculation this is completely irrelevant - as long as the program doesn't perform syscalls (and other stuff that traps, such as page faults), the OS is not involved in any way.
  • JimDeadlock
    JimDeadlock about 6 years
    CTRL-ALT-F2 will kill the desktop and drop you to TTY (no GUI). Type your username and password and you are there at the command prompt with no GUI. To get back to GUI do CTRL-ALT-F1
  • JimDeadlock
    JimDeadlock about 6 years
    ... or as @WinEunuuchs2Unix says, install Ubuntu Server (instead of Ubuntu Desktop) in the first place.
  • Eric Duminil
    Eric Duminil about 6 years
    @JimDeadlock I really don't think it kills the desktop, it just doesn't display it. Every gui app is still running in the background, aren't they?
  • vidarlo
    vidarlo about 6 years
    The windows GUI consumes some memory, but not very much CPU use when not doing anything. I don't see why that would have any significant impact?
  • WinEunuuchs2Unix
    WinEunuuchs2Unix about 6 years
    @MatteoItalia I just upgraded to Windows 10 Build 1803 (Redstone 4 aka April 2018 Spring Creators update) and Ubuntu Windows CPU benchmark test is now on part with Ubuntu Linux CPU tests.
  • Peter Cordes
    Peter Cordes about 6 years
    Perhaps not just tuning parameters, but compiler options (e.g. -O3 -march=haswell or something. I don't know what Clear Linux actually uses to build their kernels, but perhaps BMI2 / popcnt / whatever could make a measurable difference in glibc and the kernel. (The kernel won't benefit from AVX, though, because the kernel avoids touching FPU registers except in specific code like the software-RAID5/6 error-correction data.)
  • Peter Cordes
    Peter Cordes about 6 years
    Switching the console over to a different VT doesn't kill any processes; @EricDuminil is correct . It may pause things that were using CPU time to do graphics updates, because the X server knows it's no longer being displayed (and thus may not waste any time on OpenGL processing or whatever). But if you run pstree or ps auxw, it's obvious that all all processes are still alive. (Or top and hit M to sort by memory consumption).
  • Peter Cordes
    Peter Cordes about 6 years
    @MichaelEricOberlin: Changing to another VT doesn't affect the runlevel! It's just that text consoles are still available in a runlevel that starts GDM. (And BTW, runlevels are basically a thing of the past; systemd doesn't work like SysV init. The earlier part of this comment is pretending that you were running a 5 or 10 year old Linux distro with an old-school init setup.) But yes, logging out of your X session and stopping X11 / GDM will free up resources, especially if you have no swap space, or your desktop has crap that wakes up frequently even when "idle".
  • ABCDEMMM
    ABCDEMMM about 6 years
    really good to know it!
  • Eric Duminil
    Eric Duminil about 6 years
    @MichaelEricOberlin: Your comment is quite simply wrong. Would you please consider deleting it?
  • ABCDEMMM
    ABCDEMMM about 6 years
    "CTRL+ALT+F2" not working ....
  • William
    William almost 6 years
    It appears to be worse then a Virtual Machine or Virtual Box.
  • William
    William almost 6 years
    My god this makes it appear that a VirtualMachine is faster.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix almost 6 years
    @William That is simply Microsoft's implementation of terminal screen updating. When running Ubuntu GUI in WSL using VcXsrv performance is not too bad. I'm sure things will improve as the product (WSL) matures.