When and how to use chain priorities in nftables

6,280

UPDATE: iptables-nft (rather than iptables-legacy) is using the nftables kernel API and in addition a compatibility layer to reuse xtables kernel modules (those described in iptables-extensions) when there's no native nftables translation available. It should be treated as nftables in most regards, except for this question that it has fixed priorities like the legacy version, so nftables' priorities still matter here.


iptables (legacy) and nftables both rely on the same netfilter infrastructure, and use hooks at various places. it's explained there: Netfilter hooks, or there's this systemtap manpage, which documents a bit of the hook handling:

PRIORITY is an integer priority giving the order in which the probe point should be triggered relative to any other netfilter hook functions which trigger on the same packet. Hook functions execute on each packet in order from smallest priority number to largest priority number. [...]

or also this blog about netfilter: How to Filter Network Packets using Netfilter–Part 1 Netfilter Hooks (blog disappeared, using a Wayback Machine link instead.)

All this together tell that various modules/functionalities can register at each of the five possible hooks (for the IPv4 case), and in each hook they'll be called by order of the registered priority for this hook.

Those hooks are not only for iptables or nftables. There are various other users, like systemtap above, or even netfilter's own submodules. For example, with IPv4 when using NAT either with iptables or nftables, nf_conntrack_ipv4 will register in 4 hooks at various priorities for a total of 6 times. This module will in turn pull nf_defrag_ipv4 which registers at NF_INET_PRE_ROUTING/NF_IP_PRI_CONNTRACK_DEFRAG and NF_INET_LOCAL_OUT/NF_IP_PRI_CONNTRACK_DEFRAG.

So yes, the priority is relevant only within the same hook. But in this same hook there are several users, and they have already their predefined priority (with often but not always the same value reused across different hooks), so to interact correctly around them, a compatible priority has to be used.

For example, if rules have to be done early on non-defragmented packets, then later (as usual) with defragmented packets, just register two nftables chains in prerouting, one <= -401 (eg -450), the other between -399 and -201 (eg -300). The best iptables could do until recently was -300, ie it couldn't see fragmented packets whenever conntrack, thus early defragmentation was in use (since kernel 4.15 with option raw_before_defrag it will register at -450 instead, but can't do both, but iptables-nft doesn't appear to offer such choice).


So now about the interactions between nftables and iptables: both can be used together, with the exception of NAT in older kernels where they both compete over netfilter's nat ressource: only one should register nat, unless using a kernel >= 4.18 as explained in the wiki. The examples nftables settings just ship with the same priorities as iptables with minor differences.

If both iptables and nftables are used together and one should be used before the other because there are interactions and order of effect needed, just sligthly lower or increase nftables' priority accordingly, since iptables' can't be changed.

For example in a mostly iptables setting, one can use nftables with a specific match feature not available in iptables to mark a packet, and then handle this mark in iptables, because it has support for a specific target (eg the fancy iptables LED target to blink a led) no available in nftables. Just register a sligthly lower priority value for the nftables hook to be sure it's done before. For an usual input filter rule, that would be for example -5 instead of 0. Then again, this value shouldn't be lower than -149 or it will execute before iptables' INPUT mangle chain which is perhaps not what is intended. That's the only other low value that would matter in the input case. For example there's no NF_IP_PRI_CONNTRACK threshold to consider, because conntrack doesn't register something at this priority in NF_INET_LOCAL_IN, neither does SELinux register something in this hook if something related to it did matter, so -225 has no special meaning here.

Share:
6,280

Related videos on Youtube

iosjdjoisdfijodjoi893
Author by

iosjdjoisdfijodjoi893

Hi, I'm Felix Dreissig. I consider myself a Computer Scientist at the intersection of software development, IT operations, and information security. I currently work at noris network as IT Security Engineer in an ops/SRE team delivering core services to the company. Before that, I studied Computer Science at FAU Erlangen-Nürnberg with a focus on IT security, distributed and operating systems, and compiler technology. In a side job during my studies, I used to do Python programming and work on streaming data processing using Apache Flink. I also have some experience in full-stack web development, Linux infrastructure automation, and computer networking. Besides that, I (used to?) play (IT security) CTFs with our local team and am involved in the infrastructure development and hosting for our own FAUST CTF.

Updated on September 18, 2022

Comments

  • iosjdjoisdfijodjoi893
    iosjdjoisdfijodjoi893 over 1 year

    When configuring a chain in nftables, one has to provide a priority value. Almost all online examples set a piority of 0; sometimes, a value of 100 gets used with certain hooks (output, postrouting).

    The nftables wiki has to say:

    The priority can be used to order the chains or to put them before or after some Netfilter internal operations. For example, a chain on the prerouting hook with the priority -300 will be placed before connection tracking operations.

    For reference, here's the list of different priority used in iptables:

    • NF_IP_PRI_CONNTRACK_DEFRAG (-400): priority of defragmentation
    • NF_IP_PRI_RAW (-300): traditional priority of the raw table placed before connection tracking operation
    • NF_IP_PRI_SELINUX_FIRST (-225): SELinux operations
    • NF_IP_PRI_CONNTRACK (-200): Connection tracking operations
    • NF_IP_PRI_MANGLE (-150): mangle operation
    • NF_IP_PRI_NAT_DST (-100): destination NAT
    • NF_IP_PRI_FILTER (0): filtering operation, the filter table
    • NF_IP_PRI_SECURITY (50): Place of security table where secmark can be set for example
    • NF_IP_PRI_NAT_SRC (100): source NAT
    • NF_IP_PRI_SELINUX_LAST (225): SELinux at packet exit
    • NF_IP_PRI_CONNTRACK_HELPER (300): connection tracking at exit

    This states that the priority controls interaction with internal Netfilter operations, but only mentions the values used by iptables as examples.

    In which cases is the priority relevant (i.e. has to be set to a value ≠ 0)? Only for multiple chains with same hook? What about combining nftables and iptables? Which internal Netfilter operations are relevant for determining the correct priority value?

  • Marcos Oliveira
    Marcos Oliveira over 3 years
    How could one use multiple chains with policy drop for the same hook? Packets traverse all chains, even if accepted by a rule before, so the only way I thought of was marking accepted packets and drop packets not marked as accepted in later chains. Is there a more practical way? I tried using nftables for my workstation while keeping iptables rules created by libvirt and lxd, but it didn't work, even if only droping packets in the last chain. So it seems non practical using iptables and nftables together, or even multiple nftables chains...
  • Marcos Oliveira
    Marcos Oliveira over 3 years
    So, while it is possible to create multiple chains for the same hook with different priorities, it is not practical (even more if trying to use both iptables and nftables). It would be good to have something likes PF's quick rule for nftables. Or at least, a cleaner way of delaying the decision, so that if a packet accepted by a chain is not dropped by a default policy. Like a policy that only drop packets not accepted before by current/another chains
  • Marcos Oliveira
    Marcos Oliveira over 3 years
    I created a bug suggesting quick accept verdict and delayed drop policy: bugzilla.netfilter.org/show_bug.cgi?id=1471
  • A.B
    A.B over 3 years
    Sorry to tell, but I can't see how your bug will go anywhere. You are free to use the existing infrastructure to implement it using marks, but asking that it becomes built-in becomes a complete change of paradigm, and would add a whole layer of complexity on the handling.
  • Marcos Oliveira
    Marcos Oliveira over 3 years
    Nftables is not so widely adopted and it already brings change of paradigms. Even with little usage, current behavior is bringing confusion regarding accept and drop between chains of different priorities. Implementing one option (quick accept) would not be a breaking change. Regarding delaying drop by default, then yes, this would be a breaking change and I doubt it would be implemented... I just expressed my opinion. PF's implementation of delaying both accept and drop is much saner IMO.
  • Alexis
    Alexis over 2 years
    Agree... a non-default accept must be accepted for good. Still don't understand why this isn't the case. Creating multiple chains doesn't make sense with this behavior.