How to create multiple network namespace from a single process instance

c linux networking network-programming linux-namespaces

18,655

Solution 1

You only have to bind mount /proc/*/ns/* if you need to access these namespaces from another process, or need to get handle to be able to switch back and forth between the two. It is not needed to use multiple namespaces from a single process.

unshare does create new namespace.
clone and fork by default do not create any new namespaces.
there is one "current" namespace of each kind assigned to a process. It can be changed by unshare or setns. Set of namespaces (by default) is inherited by child processes.

Whenever you do open(/proc/N/ns/net), it creates inode for this file, and all subsequent open()s will return file that is bound to the same namespace. Details are lost in the depths of kernel dentry cache.

Also, each process has only one /proc/self/ns/net file entry, and bind mount does not create new instances of this proc file. Opening those mounted files are exactly the same as opening /proc/self/ns/net file directly (which will keep pointing to the namespace it pointed to when you first opened it).

It seems that "/proc/*/ns" is half-baked like this.

So, if you only need 2 namespaces, you can:

open /proc/1/ns/net
unshare
open /proc/self/ns/net

and switch between the two.

For more that 2 you might have to clone(). There seems to be no way to create more than one /proc/N/ns/net file per process.

However, if you do not need to switch between namespaces at runtime, or to share them with other processes, you can use many namespaces like this:

open sockets and run processes for main namespace.
unshare
open sockets and run processes for 2nd namespace (netlink, tcp, etc)
unshare
...
unshare
open sockets and run processes for Nth namespace (netlink, tcp, etc)

Open sockets keep reference to their network namespace, so they will not be collected until sockets are closed.

You can also use netlink to move interfaces between namespaces, by sending netlink command on source namespace, and specifying dst namespace either by PID or namespace FD (the later you don't have).

You need to switch process namespace before accessing /proc entries that depend on that namespace. Once "proc" file is open, it keeps reference to the namespace.

Solution 2

Network Namespaces are, by design, created with a call to clone, and it can be modified after by unshare. Take note that even if you do create a new network namespace with unshare, in fact you just modify network stack of your running process. unshare is unable to modify network stack of other processes, so you won't be able to create another one only with unshare.

In order to work, a new network namespace needs a new network stack, and so it needs a new process. That's all.

Good news is that it can be made very lightweight with clone, see:

Clone() differs from the traditional fork() system call in UNIX, in that it allows the parent and child processes to selectively share or duplicate resources.

You are able to divert only on this network stack (and avoid memory space, table of file descriptors and table of signal handlers). Your new network process can be made more like a thread than a real fork.

You can manipulate them with C code or with Linux Kernel and/or LXC tools.

For instance, to add a device to new network namespace, it's as simple as:

echo $PID > /sys/class/net/ethX/new_ns_pid

See this page for more info about CLI available.

On the C-side, one can take a look at lxc-unshare implementation. Despite its name it uses clone, as you can see (lxc_clone is here). One can also look at LTP implementation, where the author has chosen to use fork directly.

EDIT: There is a trick that you can use to make them persistent, but you will still need to fork, even temporarily.

Take a look at this code of ipsource2 (I have removed error checking for clarity):

snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name);

/* Create the base netns directory if it doesn't exist */
mkdir(NETNS_RUN_DIR, S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH);

/* Create the filesystem state */
fd = open(netns_path, O_RDONLY|O_CREAT|O_EXCL, 0);
[...]
close(fd);
unshare(CLONE_NEWNET);
/* Bind the netns last so I can watch for it */
mount("/proc/self/ns/net", netns_path, "none", MS_BIND, NULL)

If you execute this code in a forked process, you'll be able to create new network namespace at will. In order to delete them, you can simply umount and delete this bind:

umount2(netns_path, MNT_DETACH);
if (unlink(netns_path) < 0) [...]

EDIT2: Another (dirty) trick would be simply to execute "ip netns add .." cli with system.

18,655

Author by

user389238

Updated on June 07, 2022

Comments

user389238 almost 2 years
I am using following C function to create multiple network namespaces from a single process instance:
```
void create_namespace(const char *ns_name)
{
    char ns_path[100];

    snprintf(ns_path, 100, "%s/%s", "/var/run/netns", ns_name);
    close(open(ns_path, O_RDONLY|O_CREAT|O_EXCL, 0));
    unshare(CLONE_NEWNET);
    mount("/proc/self/ns/net", ns_path, "none", MS_BIND , NULL);
}
```
After my process creates all the namspaces and I add a tap interface to any of the one network namespace (with ip link set tap1 netns ns1 command), then I actually see this interface in all of the namespaces (presumably, this is actually a single namespace that goes under different names).

But, if I create multiple namespaces by using multiple processes, then everything is working just fine.

What could be wrong here? Do I have to pass any additional flags to the unshare() to get this working from a single process instance? Is there a limitation that a single process instance can't create multiple network namespaces? Or is there a problem with mount() call, because /proc/self/ns/net is actually mounted multiple times?

Update: It seems that unshare() function creates multiple network namespaces correctly, but all the mount points in /var/run/netns/ actually reference to the first network namespace that was mounted in that direcotry.

Update2: It seems that the best approach is to fork() another process and execute create_namespace() function from there. Anyway, I would be glad to hear a better solution that does not involve fork() call or at least get a confirmation that would prove that it is impossible to create and manage multiple network namespaces from a single process.

Update3: I am able to create multiple namespaces with unshare() by using the following code:
```
int  main() {
    create_namespace("a");
    system("ip tuntap add mode tap tapa");
    system("ifconfig -a");//shows lo and tapA interface
    create_namespace("b");
    system("ip tuntap add mode tap tapb");
    system("ifconfig -a");//show lo and tapB interface, but does not show tapA. So this is second namespace created.
}
```
But after the process terminates and I execute ip netns exec a ifconfig -a and ip netns exec b ifconfig -a it seems that both commands were suddenly executed in namespace a. So the actual problem is storing the references to the namespaces (or calling mount() the right way. But I am not sure, if this is possible).
user389238 almost 12 years

+1, but could you explain what you mean with "Take note that you do not create a new network namespace with unshare"? See update #3, because my understanding is that unshare() still can create network namespaces. The clone(CLONE_NEWNET) is something like "I am going to create a new child process with a new network namespace", while unshare(CLONE_NEWNET) is like "I do not want to share network namespace with my parent process anymore. So create a new one.". lxc uses clone(), while iproute2 uses unshare().
Coren almost 12 years

I'll try to explain. You create a network namespace for your current process with unshare, but since network namespaces needs a PID to live, you won't be able to create a new one only with unshare for the same process.
user389238 almost 12 years

I see you point, but unshare() still can create a new network namespace (this needs an update to your answer). Also, I guess, the namespace does not necessarily need an actual PID where to live in (e.g. after executing "ip netns add nsX" command the ip process terminates, but the namespace nsX still remains). I guess this limitation "why it is impossible to create multiple network namespaces from single process" has to do something with how mount() works.
Coren almost 12 years

If you take a look at iproute2 source code, you'll see that they keep the current network stack even after the process died with a mount trick: /* Bind the netns last so I can watch for it */ if (mount("/proc/self/ns/net", netns_path, "none", MS_BIND, NULL) < 0)
Coren almost 12 years

They make persistent their network namespace using this wonderful feature of linux kernel about removed but still usable files since they are still opened by a running process.