How do I mount a private /proc inside a namespace inside a docker container?

7,989

This command works:

sudo docker run --cap-add=sys_admin --security-opt label:disable -it fedora:rawhide /bin/sh -c 'for dir in $(awk '"'"'/\/proc\// { print $5; }'"'"' /proc/1/mountinfo ); do umount "$dir"; done; /usr/bin/unshare -Ufmp -r /bin/sh -c '"'"'mount --make-private / ; mount -t proc proc /proc ; ls /proc'"'"

I didn't split it over multiple lines because the quoting is really important. Basically, it unmounts a whole bunch of stuff in /proc before running unshare and mounting /proc in the child user namespace.

Docker mounts over a bunch of directories and files in /proc with its own directories that are empty tmpfs directories and null files. Various files in /proc represent values that are applicable to the whole system. In fact, /proc/kcore would allow you to read kernel memory inside the container if you were root, which, since a lot of people want to believe that containers are some kind of lightweight VM or something, would surprise a lot of people.

The kernel in (as of version 4.14 anyway) fs/namespace.c:mnt_already_visible checks to see if you're mounting an already mounted filesystem, and if that filesystem has things mounted as child filesystems and those mounts have the MNT_LOCKED flag, it fails. The MNT_LOCKED flag seems to be applied (I didn't hunt down where this is in the kernel) to all mounts whenever you create a user namespace in order to prevent you from unmounting things in that namespace (because you get privileges 'within' the user namespace) and making hidden stuff visible again.

The command I posted uses an awk script on the contents of /proc/1/mountinfo to pull out all of the subdirectories of and files in /proc that Docker has mounted over, and unmounts them all. This makes the /proc filesystem mountable in nested user namespaces again.

Share:
7,989

Related videos on Youtube

Omnifarious
Author by

Omnifarious

I've been programming since I was 8. I started with Apple Basic, then Timex Sinclair (ZX-81 at the time) Basic. After that I discovered the list of op-codes next to the ASCII chart in my ZX-81 manual. I began to compile my own hand-lettered sheets that put all the addressing modes of a given instruction in the same place so I could more easily handle-assemble small machine language programs. It's gone on from there. The learning never ends. Currently I do most of my programming in Python and C++ on Linux. I have a strong preference for Open Source software. I've noticed that this site tends to have a slight bias for existing members. Members with a higher reputation tend to be voted up more even if their answer is very similar to someone else's with a lower reputation.

Updated on September 18, 2022

Comments

  • Omnifarious
    Omnifarious almost 2 years

    I have a need to create namespaces inside a Docker container. And as part of this, I will need to mount a /proc private to the inner namespace. I realize that I will have to run the container with certain privileges to make this happen, but I would prefer to enable the most minimal set.

    This works:

    $ sudo docker run --privileged --security-opt=seccomp=unconfined \
     -it fedora:rawhide /usr/bin/unshare -Ufmp -r \
     /bin/sh -c 'mount -t proc proc /proc'
    

    This doesn't:

    $ sudo docker run --cap-add=sys_admin --security-opt=seccomp=unconfined \
      -it fedora:rawhide /usr/bin/unshare -Ufmp -r \
       /bin/sh -c 'mount -t proc proc /proc'
    mount: /proc: cannot mount proc read-only.
    

    So, just turning off seccomp filters and adding CAP_SYS_ADMIN isn't enough. What is enough?

    Update: Selinux is a part of the problem. If you turn off selinux enforcement globally, it works. But, you can also turn off enforcement for a particular container with --security-opt label:disable, and this is documented in the security configuration section of the online Docker manual:

    sudo docker run --cap-add=sys_admin --security-opt label:disable \
     -it fedora:rawhide /usr/bin/unshare -fmp /bin/sh -c \
     'mount --make-private / ; mount -t proc proc /proc'
    

    But that fails if the -U and -r flags are added back to unshare. And, of course, adding --privileged to the docker run command works just fine even with the -U and -r flags.

    I'm currently trying to use the kernel tracing stuff to figure out what, exactly, is giving me an EPERM. It's a very unhelpfully unspecific error to get.

    • Omnifarious
      Omnifarious over 6 years
      SamYaple on the #docker channel on Freenode has been pretty helpful here, and this may be a cgroups issue. There appears to be a 'devices' cgroup.
    • c4f4t0r
      c4f4t0r over 6 years
      have you tried using -v /proc:/proc ?
    • Omnifarious
      Omnifarious over 6 years
      @c4f4t0r - Well, that wouldn't do what I want. I don't want the /proc from the namespace docker is running in (presumably the root level namespace).
    • Omnifarious
      Omnifarious over 6 years
      @c4f4t0r - Using ftrace, the kernel sources and some creative thinking, I figured out the problem. serverfault.com/a/897476/71430
    • James Stevens
      James Stevens over 3 years
      This won't help you, but TBH: I'm really surprised this isn't a standard mount option as it seems a pretty common requirement to me. I need it :) ... I'm running a single binary & including all its libraries, so there is no need for a base-O/S in the container - so I don't have one, but it does seem to want /proc for some features. The same would happen with static binaries, like Go ones - they could be installed without a base-O/S in a container. I tried -v /proc:/proc and it doesn't help for the reason you give.
  • LtWorf
    LtWorf over 4 years
    It doesn't work. Please don't waste time of people with those replies.
  • Omnifarious
    Omnifarious over 4 years
    @LtWorf - That isn't a helpful comment. This command worked for me. It's possible Docker has changed how it does things since I wrote this and so this doesn't work anymore. Or it's possible I missed some crucial piece of information necessary to its correct functioning. But I didn't post a 'useless reply' because I posted what I did to solve my problem. If you want to be helpful, please say why it didn't work for you.
  • Kai
    Kai almost 4 years
    Speaking of outdated information, Worf is a Lieutenant Commander, last I heard. :)