MikroTik IPsec client Fortigate 'Received ESP packet with unknown SPI.'

18,769

May not be the cause of your problem, but may be useful information for other users. We had a slightly similar problem with a VPN between a Mikrotik and a Sonicwall. Traffic would randomly stop, requiring the SA's to be flushed.

In the end we realised that the Sonicwall was creating a separate SA for each network policy (by the look of your screenshot it looks like you have 2 policies/subnets going over the VPN). I don't know if this 'SA-per-policy' setting is hard coded or configurable as I didn't have access to the Sonicwall.

Our Mikrotik was using the 'require' level for the policies (the default, and seen in your screenshot). This causes the router to create a single SA with the remote peer. When sending traffic for any of the policies for that peer, it will use this same SA, regardless of the src/dest subnet.

This basically meant that it worked as long as we only used one subnet. As soon as our Mikrotik tried to send traffic for the second subnet, it would send over the existing SA (which as far as the Sonicwall is concerned is for a specific subnet pair), the Sonicwall would complain, SA sequence numbers would go out of whack and the whole lot stopped. (In our case the customer got 'replay' errors on their end)

In the end it was as simple as changing the policy Level to 'unique', so both ends used a unique SA for each unique subnet pair. The tunnels were perfectly happy after that.

Share:
18,769

Related videos on Youtube

Eugene van der Merwe
Author by

Eugene van der Merwe

I fell in love with computers at an early age when I discovered the joy of programming and the ability to be creative using a machine. My first computer was the ZX-81 which had 1 kilobyte of memory, I progressed to a Commodore 64 and taught myself BASIC, Assembly Language (6502), Pascal, and Logo. I did some freelance programming after school which was followed by studying computer engineering in Cupertino, California. My first real IT job was in networking during the heyday of Novell NetWare 3.x. I thoroughly enjoyed becoming a network specialist but also branched out to implement email, database, and internet solutions. At the peak of my networking career I spend 2 years in the United Kingdom doing system migrations and project management. The bulk of my career is the 17 years that I had my own business called Snowball, which is a hosting and internet service provider. I built the business from scratch, using no capital, to eventually service 1000s of customers. Working at an ISP is high pace and high pressure and I learnt a lot about business. I was also the technical specialist in charge and put together all the systems, including doing the programming for systems automation. In January 2015 I sold the Snowball to Hero Telecoms where I am currently employed as a senior product manager. My job entails product research, pricing, digital strategy, and group communications. Hero Telecoms aims to list in the next two years and I find it fascinating to be part of a much larger picture compared to when I had my own firm.

Updated on September 18, 2022

Comments

  • Eugene van der Merwe
    Eugene van der Merwe over 1 year

    We have a client with 6 sites using IPsec. Every now and again, possibly once a week, sometimes once a month, data just stops flowing from the remote Fortigate VPN server to the local MikroTik IPsec VPN client.

    In order to demonstrate the symptoms of the problem I have attached a diagram. On the diagram Installed SAs tab you will notice a source IP address x.x.186.50 trying to communicate with x.x.7.3 but 0 current bytes. x.x.186.50 is the client's remote Fortigate IPsec server, and x.x.7.73 is a MikroTik based IPsec endpoint. It appears data from the remote side to us is not always flowing.

    Phase 1 and 2 are always established but traffic always refuses to flow from the remote side to us.

    We tried various things over time, such as rebooting, setting clocks, dabbling with configuration, rechecking and rechecking configuration but it appears the problem is entirely random. And sometimes random things fixes it. At one stage I had a theory that if the tunnel is initiated from their side it works, but fiddling with "Send Initial Contact" has not made any difference.

    We've had many chats to the client about this but they have many more international IPsec VPNs and only our MikroTik configuration is failing.

    Fortigate log:

    enter image description here http://kb.fortinet.com/kb/microsites/microsite.do?cmd=displayKC&externalId=11654

    Looking at Fortigate's knowledgebase it appears SPIs don't agree and DPD would make a difference. But I have tried every single combination of DPD on this side without avail. I would like to enable DPD on the other side but I cannot due to change control and also because the client is saying it's working on all the other sites exactly configuration the same. EDIT DPD was enabled

    Local VPN client diagram showing no traffic flow:

    enter image description here

    I have included a log file showing continuous loops of "received a valid R-U-THERE, ACK sent" MikroTik log file:

    echo: ipsec,debug,packet 84 bytes from x.x.7.183[500] to x.x.186.50[500]

    echo: ipsec,debug,packet sockname x.x.7.183[500]

    echo: ipsec,debug,packet send packet from x.x.7.183[500]

    echo: ipsec,debug,packet send packet to x.x.186.50[500]

    echo: ipsec,debug,packet src4 x.x.7.183[500]

    echo: ipsec,debug,packet dst4 x.x.186.50[500]

    echo: ipsec,debug,packet 1 times of 84 bytes message will be sent to x.x.186.50[500]

    echo: ipsec,debug,packet 62dcfc38 78ca950b 119e7a34 83711b25 08100501 bc29fe11 00000054 fa115faf

    echo: ipsec,debug,packet cd5023fe f8e261f5 ef8c0231 038144a1 b859c80b 456c8e1a 075f6be3 53ec3979

    echo: ipsec,debug,packet 6526e5a0 7bdb1c58 e5714988 471da760 2e644cf8

    echo: ipsec,debug,packet sendto Information notify.

    echo: ipsec,debug,packet received a valid R-U-THERE, ACK sent

    I've received various suggesions from IPsec experts and MikroTik themselves implying that the problem is at the remote side. However the situation is greatly compounded that 5 other sites are working and that the client's firewall is under change control. The setup also always worked for many years, so they claim it cannot be a configuration error on their side. This suggestion seems plausible but I cannot implement due to change control. I may only change the client side:

    Make sure the IPSec responder has both passive=yes and send-initial-contact=no set.

    This did not work.

    EDIT 9 Dec 2013

    I am pasting additional screenshots with the Fortigate configuration and what we believe are the Quick Mode selectors on the Mikrotik side.

    Phase 1 Fortigate screenshot

    Phase 2 Fortigate screenshot

    Quick Mode Selectors?

    Let me re-iterate that I don't think it's a configuration problem. I speculate it's a timing problem whereby side A or side B tries to send information too aggressively making the negotiation of the information (e.g. the SPI) out of sync.

    EDIT 11 Dec 2013

    Sadly I have to give up on this issue. Happily everything is working. Why it's working is still a mystery, but to further illustrate what we did I post another image inline.

    We fixed it by:

    1. Turning off PPPoE at client.
    2. Installing completely new router (Router B) and tested at Border. It worked at Border.
    3. Switching off new router B at border. AND THEN, WITHOUT MAKING A SINGLE CHANGE, the client's end-point Router A started working. So just adding a duplicate router at the border and taking this router offline again made the original router work.

    So add this fix to the list of things we've done:

    1. Reboot. That worked once.
    2. Create new tunnel with new IP. That worked once but only once. After changing IP back client endpoint came live again.
    3. Change time servers.
    4. Fiddle with every possible setting.
    5. Wait. Once, after a day, it just came right. This time, even after days, nothing came right.

    So I postulate that there is an incompatibility on either Fortigate or MikroTik side which only happens at very random situations. The only things we haven't been able to try is upgrade firmware on Fortigate. Maybe there is hidden corrupt configuration value or timing issue invisible to configurer.

    I further speculate that the issue is caused by timing issues causing SPI mismatch. And my guess is the Fortigate doesn't want to "forget" about the old SPI, as if DPD is not working. It just happens randomly and from what I can tell only when endpoint A is Fortigate and endpoint B is MikroTik. The constant aggressive attempts at trying to re-establish the connection "holds" on to old SPI values.

    I'll add to this post when it happens again.

    enter image description here

    EDIT 12 Dec 2013

    As expected it happened again. As you may recall we have 6 MikroTik client IPsec end-point routers configured exactly the same connecting to one Fortigate server. The latest incident was again to a random router, not the one I posted here about originally. Considering the last fix where we installed this duplicate router, I took this shortcut:

    1. Disable Router A, the router that does not want to receive packets from Fortigate any more.
    2. Copy Router A's IPsec configuration to a temporary router closer to the border of our network.
    3. Immediately disable the newly created configuration.
    4. Re-enable Router A.
    5. Automagically it just starts working.

    Looking at @mbrownnyc comment I believe that we are having an issue with Fortigate not forgetting stale SPIs even though DPD is on. I will investigate our client's firmware and post it.

    Here is a new diagram, much like the last, but just showing my "fix":

    enter image description here

    • mbrownnyc
      mbrownnyc over 10 years
      Here's some quick advice, but isn't an answer: Make sure everything matches. Everything (DPD, PFS). Enable autokey keep alive. Also, setup a ping from the remote site to a host at the destination site. What about your quick mode selectors (and whatever MikroTek calls them)? What about your fortigate debug logs (diag debug app ike -1?
    • Eugene van der Merwe
      Eugene van der Merwe over 10 years
      Thanks for the reply. As it turns out we have no access to the Fortigate and the client's argument is it works across all other 5 sites. In addition their Fortigate is under change control so they don't want to do anything on their side. But they said they'll try to help us again on Monday. I will mention all these settings to them.
    • mbrownnyc
      mbrownnyc over 10 years
      I would make sure that everything matches. Being that R-U-THERE is a function of DPD (which functions on phase 1, it seems like phase 1 is establishing (okay on the Aggressive versus main mode), but phase 2 might be failing. I'd say, what about PFS, but I already said verify each setting is exactly the same, particularly what Fortinet calls Quick Mode Selectors. It doesn't seem you have confirmed that you have verified every single setting. Can you post what they gave you (less IPs, shared key, etc), appending to your original post?
    • mbrownnyc
      mbrownnyc over 10 years
      the Fortigate doesn't want to "forget" about the old SPI, YES YES YES! I have had this happen to me. I'm on v4 MR3 patch 11.
    • Eugene van der Merwe
      Eugene van der Merwe over 10 years
      My client is on 620B v4 MR3 Patch 8. I'm going to try Fortigate official channels next as I am so sure this is going to happen again.
    • mbrownnyc
      mbrownnyc over 10 years
      I was on my way out when i responded. The cause: Adding an additional Firewall\Group to the Quick Mode Selectors that I had configured to allow a group name. Restarted the tunnel, and there were two tunnels up. There are bugs. Those guys should upgrade to at least the latest v4, if not v5. I had to restart the ipsec daemon, if I remember, with debug app test in global scope.
  • Eugene van der Merwe
    Eugene van der Merwe almost 10 years
    Thanks for the answer. This problem mostly died down so we ask, what has changed? 1. We had 24 hour radius session-timeout. We removed this, more stability. 2. The client had a primary and backup firewall. They went to and from the primary a couple of times. They can't / won't give me the info but I pray or suspect they also updated some firmware. 3. The last time it happened we used a disable/enable everything for IPsec technique. This solved it. I really like your answer. I will remember this when it happens again. I need an ARP debug tool though.