How to passively capture from Unix domain sockets (AF_UNIX socket monitoring)?

26,950

Solution 1

As of Linux kernel v4.2-rc5 it is not possible to capture directly using the interfaces that are in use by libpcap. libpcap uses the Linux-specific AF_PACKET (alias PF_PACKET) domain which only allows you to capture data for data going through a "netdevice" (such as Ethernet interfaces).

There is no kernel interface for capturing from AF_UNIX sockets. Standard Ethernet captures have an Ethernet header with source/destination, etc. Unix sockets have no such fake header and the link-layer header types registry does not list anything related to this.

The basic entry points for data are unix_stream_recvmsg and unix_stream_sendmsg for SOCK_STREAM (SOCK_DGRAM and SOCK_SEQPACKET have similarly named functions). Data is buffered in sk->sk_receive_queue and in the unix_stream_sendmsg function, there is no code that ultimately lead into calling the tpacket_rcv function for packet captures. See this analysis by osgx on SO for more details on the internals of packet capture in general.

Back to the original question on AF_UNIX socket monitoring, if you are mainly interested in application data, you have some options:

  • Passive (also works for already running processes):
    • Use strace and capture on possible system calls that perform I/O. There are lots of them, read, pread64, readv, preadv, recvmsg and many more... See @Stéphane Chazelas example for xterm. Disadvantage of this approach is that you first have to find your file descriptor and then still might miss out system calls. With strace you can use -e trace=file for most of them (pread is only covered by -e trace=desc, but it probably not used for Unix sockets by most of the programs).
    • Break on/modify unix_stream_recvmsg, unix_stream_sendmsg (or unix_dgram_* or unix_seqpacket_*) in the kernel and output the data, somewhere. You can use SystemTap for setting such trace points, here is an example to monitor for outgoing messages. Requires kernel support and availability of debugging symbols.
  • Active (only works for new processes):

    • Use a proxy that also writes files. You could write a quick multiplexer yourself or hack something like this that also outputs a pcap (beware of the limitations, for example AF_UNIX can pass file descriptors, AF_INET cannot):

      # fake TCP server connects to real Unix socket
      socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CONNECT:some.sock
      # start packet capture on said port
      tcpdump -i lo -f 'tcp port 6000'
      # clients connect to this Unix socket
      socat UNIX-LISTEN:fake.sock,fork TCP-CONNECT:127.0.0.1:6000
      
    • Use a dedicated application proxy. For X11, there is xscope (git, manual).

The suggested CONFIG_UNIX_DIAG option is unfortunately also not helpful here, it can only be used to collect statistics, not acquire realtime data as they flow by (see linux/unix_diag.h).

Unfortunately there are no perfect tracers at the moment for Unix domain sockets that produce pcaps (to my best knowledge). Ideally there would be a libpcap format that has a header containing the source/dest PID (when available) followed by optional additional data (credentials, file descriptors) and finally the data. Lacking that, the best that can be done is syscall tracing.


Additional information (for the interested reader), here are some backtraces (acquired with GDB breaking on unix_stream_* and rbreak packet.c:., Linux in QEMU and socat on mainline Linux 4.2-rc5):

# echo foo | socat - UNIX-LISTEN:/foo &
# echo bar | socat - UNIX-CONNECT:/foo
unix_stream_sendmsg at net/unix/af_unix.c:1638
sock_sendmsg_nosec at net/socket.c:610
sock_sendmsg at net/socket.c:620
sock_write_iter at net/socket.c:819
new_sync_write at fs/read_write.c:478
__vfs_write at fs/read_write.c:491
vfs_write at fs/read_write.c:538
SYSC_write at fs/read_write.c:585
SyS_write at fs/read_write.c:577
entry_SYSCALL_64_fastpath at arch/x86/entry/entry_64.S:186

unix_stream_recvmsg at net/unix/af_unix.c:2210
sock_recvmsg_nosec at net/socket.c:712
sock_recvmsg at net/socket.c:720
sock_read_iter at net/socket.c:797
new_sync_read at fs/read_write.c:422
__vfs_read at fs/read_write.c:434
vfs_read at fs/read_write.c:454
SYSC_read at fs/read_write.c:569
SyS_read at fs/read_write.c:562

# tcpdump -i lo &
# echo foo | socat - TCP-LISTEN:1337 &
# echo bar | socat - TCP-CONNECT:127.0.0.1:1337
tpacket_rcv at net/packet/af_packet.c:1962
dev_queue_xmit_nit at net/core/dev.c:1862
xmit_one at net/core/dev.c:2679
dev_hard_start_xmit at net/core/dev.c:2699
__dev_queue_xmit at net/core/dev.c:3104
dev_queue_xmit_sk at net/core/dev.c:3138
dev_queue_xmit at netdevice.h:2190
neigh_hh_output at include/net/neighbour.h:467
dst_neigh_output at include/net/dst.h:401
ip_finish_output2 at net/ipv4/ip_output.c:210
ip_finish_output at net/ipv4/ip_output.c:284
ip_output at net/ipv4/ip_output.c:356
dst_output_sk at include/net/dst.h:440
ip_local_out_sk at net/ipv4/ip_output.c:119
ip_local_out at include/net/ip.h:119
ip_queue_xmit at net/ipv4/ip_output.c:454
tcp_transmit_skb at net/ipv4/tcp_output.c:1039
tcp_write_xmit at net/ipv4/tcp_output.c:2128
__tcp_push_pending_frames at net/ipv4/tcp_output.c:2303
tcp_push at net/ipv4/tcp.c:689
tcp_sendmsg at net/ipv4/tcp.c:1276
inet_sendmsg at net/ipv4/af_inet.c:733
sock_sendmsg_nosec at net/socket.c:610
sock_sendmsg at net/socket.c:620
sock_write_iter at net/socket.c:819
new_sync_write at fs/read_write.c:478
__vfs_write at fs/read_write.c:491
vfs_write at fs/read_write.c:538
SYSC_write at fs/read_write.c:585
SyS_write at fs/read_write.c:577
entry_SYSCALL_64_fastpath at arch/x86/entry/entry_64.S:186

tpacket_rcv at net/packet/af_packet.c:1962
dev_queue_xmit_nit at net/core/dev.c:1862
xmit_one at net/core/dev.c:2679
dev_hard_start_xmit at net/core/dev.c:2699
__dev_queue_xmit at net/core/dev.c:3104
dev_queue_xmit_sk at net/core/dev.c:3138
dev_queue_xmit at netdevice.h:2190
neigh_hh_output at include/net/neighbour.h:467
dst_neigh_output at include/net/dst.h:401
ip_finish_output2 at net/ipv4/ip_output.c:210
ip_finish_output at net/ipv4/ip_output.c:284
ip_output at net/ipv4/ip_output.c:356
dst_output_sk at include/net/dst.h:440
ip_local_out_sk at net/ipv4/ip_output.c:119
ip_local_out at include/net/ip.h:119
ip_queue_xmit at net/ipv4/ip_output.c:454
tcp_transmit_skb at net/ipv4/tcp_output.c:1039
tcp_send_ack at net/ipv4/tcp_output.c:3375
__tcp_ack_snd_check at net/ipv4/tcp_input.c:4901
tcp_ack_snd_check at net/ipv4/tcp_input.c:4914
tcp_rcv_state_process at net/ipv4/tcp_input.c:5937
tcp_v4_do_rcv at net/ipv4/tcp_ipv4.c:1423
tcp_v4_rcv at net/ipv4/tcp_ipv4.c:1633
ip_local_deliver_finish at net/ipv4/ip_input.c:216
ip_local_deliver at net/ipv4/ip_input.c:256
dst_input at include/net/dst.h:450
ip_rcv_finish at net/ipv4/ip_input.c:367
ip_rcv at net/ipv4/ip_input.c:455
__netif_receive_skb_core at net/core/dev.c:3892
__netif_receive_skb at net/core/dev.c:3927
process_backlog at net/core/dev.c:4504
napi_poll at net/core/dev.c:4743
net_rx_action at net/core/dev.c:4808
__do_softirq at kernel/softirq.c:273
do_softirq_own_stack at arch/x86/entry/entry_64.S:970

Solution 2

I wrote a tool to capture and dump unix domain socket traffic. It uses bpf/kprobe to probe kernel function unix_stream_sendmsg and dump traffic to user space.

The tool depends on bcc, so you need to install bcc first.

An example run:

$ sudo ./sockdump.py /var/run/docker.sock # run "docker ps" in another terminal
>>> docker[3412] len 83
GET /_ping HTTP/1.1
Host: docker
User-Agent: Docker-Client/18.06.1-ce (linux)

>>> dockerd[370] len 215
HTTP/1.1 200 OK
Api-Version: 1.38
Docker-Experimental: false
Ostype: linux
Server: Docker/18.06.1-ce (linux)
Date: Tue, 25 Sep 2018 07:05:03 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8

OK
...
Share:
26,950

Related videos on Youtube

Lekensteyn
Author by

Lekensteyn

Arch Linux user, open-source enthusiast, programmer, Wireshark developer, TRU/e Security master student at TU/e. Interests: network protocols, Linux kernel, server administration, Android, breaking & fixing stuff.

Updated on September 18, 2022

Comments

  • Lekensteyn
    Lekensteyn over 1 year

    TCP/IP and UDP captures can be made using tcpdump/dumpcap and produces a pcap/pcapng file which can be fed to Wireshark for further analysis. Does a similar tool exist for named Unix domain sockets? (A general solution that works for abstract sockets would be nice too though.)

    strace as-is is not sufficient, it is not straightforward to filter for Unix domain sockets I/O. A proxy using socat or alike is also not suitable as the goal is passive analysis for existing open programs.

    How can I obtain a packet capture that I can use in Wireshark for analysis? Example protocol applications are X11 (Xorg, my current application) and cURL/PHP (HTTP). I have seen a CONFIG_UNIX_DIAG option in the Linux kernel, is this of some use?

    • Stéphane Chazelas
      Stéphane Chazelas over 8 years
    • Lekensteyn
      Lekensteyn over 8 years
      @StéphaneChazelas Thanks, but since Xorg was started with -nolisten tcp, there is no TCP socket. If all fails, then I will probably revert to using xscope or your neat strace+text2pcap trick. I would still be interested in a generic Unix socket capture though (for just data, not side-channel data).
    • Stéphane Chazelas
      Stéphane Chazelas over 8 years
      Beside strace, you can also look at auditd and systemtap.
    • Lekensteyn
      Lekensteyn over 8 years
      systemtap almost looks like a GDB hack, but then at kernel level. Don't know about audit, I only found a LSM hook that checked whether you are allowed to read/write. (I am currently digging in the Linux kernel source code)
  • Lekensteyn
    Lekensteyn over 8 years
    By the way, if you have read kristrev.github.io/2013/07/26/… and saw instructions to watch for link up notifications via netlink and wondered whether the diagnostics can provide packet sniffing, the answer is still no. These diagnostics provide statistics via polling, not realtime.
  • Velkan
    Velkan about 2 years
    Out of date as of 5.15.0-18 kernel. There are build errors: undeclared bpf_timer and BPF_F_BROADCAST, Builds when adding the declarations directly to the code but doesn't show any output.