Strongswan VPN tunnel between two AWS instances won't connect

14,540

Solution 1

In VPC, the public IP address of an instance is never bound to the instance's stack, so you have to configure both the internal private address and the external public address. The invalid argument is presumably caused by trying to source traffic directly from the public IP address, which isn't known to your instance.

left=10.10.10.10         # instance private IP of local system
leftsourceip=10.10.10.10 # instance private IP of local system
leftid=203.x.x.x         # elastic IP of local system
leftsubnet=10.x.x.x/xx

rightsubnet=10.x.x.x/xx
right=198.x.x.x          # elastic IP of remote system

Solution 2

Problem fixed.

1) I did not properly follow Michael's config directions. I also configured a rightsourceip and leftsourceip together, thereby causing both instances to believe they were both initiators. I ensured that one was an initiator and one was a requestor; this fixed the IKE problem.

2) I figured out that I also had to explicitly set the esp parameter. Even though there is already a default (aes128-sha1,3des-sha1), the esp parameter still has to be set in order for the instance to know to use esp OR ah (but not both). I ended up using aes128-sha1-modp2048.

Share:
14,540

Related videos on Youtube

lobi
Author by

lobi

Updated on September 18, 2022

Comments

  • lobi
    lobi almost 2 years

    I am trying to set up a VPN tunnel using StrongSwan 5.1.2 between two Amazon AWS EC2 instances running Ubuntu 14.04.2 LTS. Prior to using StrongSwan, I used open(libre)swan on an Amazon RedHat AMI, which worked fine. For some reason I can't even get IKE to work here for StrongSwan. I triple checked my AWS configurations, and it all looks good, so it must be a problem with StrongSwan configuration.

    As you will see below, the error I am getting is "Error writing to socket: Invalid argument". I have looked online and really can't find the solution to this. I am convinced my strongswan ipsec.conf is improperly configured.

    Here is what I am working with:

    Instance #1: N.Virginia - 10.198.0.164 with public EIP 54.X.X.X
    Instance #2: Oregon - 10.194.0.176 with public EIP 52.Y.Y.Y
    

    The (simple) topology is as follows:

    [ Instance #1 within N.Virginia VPC <-> Public internet <-> Instance #2 within Oregon VPC ]
    

    I verified that the following AWS configs are correct:

    Security groups permit all
    IP information is correct
    Src/Dest disabled on both instances
    ACLs permit all
    routes are present and correct (route to 10.x will point to that local instance in order to be routed out to the VPN tunnel)
    

    Below is the /etc/ipsec.conf (this is from Oregon, however it is the same on the N.Virginia instance except the left|right values are reversed):

    config setup
            charondebug="dmn 2, mgr 2, ike 2, chd 2, job 2, cfg 2, knl 2, net 2, enc 2, lib 2"
    conn aws1oexternal-aws1nvexternal
            left=52.Y.Y.Y (EIP)
            leftsubnet=10.194.0.0/16
            right=54.X.X.X (EIP)
            rightsubnet=10.198.0.0/16
            auto=start
            authby=secret
            type=tunnel
            mobike=no
            dpdaction=restart
    

    Below is the /etc/ipsec.secrets *(reversed for other instance, obviously):

    54.X.X.X 52.Y.Y.Y : PSK "Key_inserted_here"
    

    Below is the /etc/strongswan.conf:

    charon {
            load_modular = yes
            plugins {
                    include strongswan.d/charon/*.conf
            }
    }
    

    Below is the /etc/sysctl.conf:

    net.ipv4.ip_forward=1
    net.ipv4.conf.all.accept_redirects = 0
    net.ipv4.conf.all.send_redirects = 0
    

    Here is the debug output from /var/log/syslog It seems the problem here is "error writing to socket: Invalid argument; after everything I tried, I continue to get this same error:

    Jun 17 17:34:48 ip-10-198-0-164 charon: 13[IKE] retransmit 5 of request with message ID 0
    Jun 17 17:34:48 ip-10-198-0-164 charon: 13[NET] sending packet: from 54.X.X.X[500] to 52.Y.Y.Y[500] (1212 bytes)
    Jun 17 17:34:48 ip-10-198-0-164 charon: 03[JOB] next event in 75s 581ms, waiting]
    Jun 17 17:34:48 ip-10-198-0-164 charon: 16[NET] sending packet: from 54.X.X.X[500] to 52.Y.Y.Y[500]
    Jun 17 17:34:48 ip-10-198-0-164 charon: 13[MGR] checkin IKE_SA aws1vexternal-aws1oexternal[1]
    Jun 17 17:34:48 ip-10-198-0-164 charon: 13[MGR] check-in of IKE_SA successful.
    Jun 17 17:34:48 ip-10-198-0-164 charon: 16[NET] error writing to socket: Invalid argument
    Jun 17 17:36:04 ip-10-198-0-164 charon: 03[JOB] got event, queuing job for execution
    Jun 17 17:36:04 ip-10-198-0-164 charon: 03[JOB] no events, waiting
    Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] checkout IKE_SA
    Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] IKE_SA aws1vexternal-aws1oexternal[1] successfully checked out
    Jun 17 17:36:04 ip-10-198-0-164 charon: 08[IKE] giving up after 5 retransmits
    Jun 17 17:36:04 ip-10-198-0-164 charon: 08[IKE] establishing IKE_SA failed, peer not responding
    Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] checkin and destroy IKE_SA aws1vexternal-aws1oexternal[1]
    Jun 17 17:36:04 ip-10-198-0-164 charon: 08[IKE] IKE_SA aws1vexternal-aws1oexternal[1] state change: CONNECTING => DESTROYING
    Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] check-in and destroy of IKE_SA successful
    

    Below is what I have tried so far:

    1) Verified layer 3

    2) rebooted machines

    3) Tried adding in leftid=

    4) Tried doing ipsec update then ipsec restart

    5) Tried adding nat_traversal=yes under confif setup (note that this shouldn't matter since ipsec statusall verified using IKEv2, which according to documentation automatically uses nat_traversal)

    6) Tried omitting virtual_private <-- Was used according to AWS openswan documentation so I included it in strongswan config.

    7) Tried disabling net.ipv4.conf.all.send_redirects = 0 and net.ipv4.conf.all.accept_redirects = 0 in /etc/sysctl.conf

    8) Tried using private IP instead of EIPs. I no longer get the socket error, however obviously the two IPs can't communicate to each other to peer...

    9) Tried adding this to strongswan.conf: load = aes des sha1 sha2 md5 gmp random nonce hmac stroke kernel-netlink socket-default updown

    10) Tried using leftfirewall=yes, didn't work

    Please help! Thanks!

    EDIT #1:

    Michael's response cleared the original problem, however I have a new problem related to routing. Both VPN instances are unable to ping each other. Furthermore, when I try to ping from a random instance in either subnet, to either another random instance or the far end VPN instance, I get the following ping response:

    root@ip-10-194-0-80:~# ping 10.198.0.164
    PING 10.198.0.164 (10.198.0.164) 56(84) bytes of data.
    From 10.194.0.176: icmp_seq=1 Redirect Host(New nexthop: 10.194.0.176)
    From 10.194.0.176: icmp_seq=2 Redirect Host(New nexthop: 10.194.0.176)
    From 10.194.0.176: icmp_seq=3 Redirect Host(New nexthop: 10.194.0.176)
    From 10.194.0.176: icmp_seq=4 Redirect Host(New nexthop: 10.194.0.176)
    

    Obviously this must be a routing issue between the two VPN instances (most likely due to strongswan config or instance routing table) since the 10.194.0.80 host in the Oregon subnet is able to receive a response from the Oregon VPN instance. Route table + traceroute on instance:

    root@ip-10-194-0-80:~# netstat -rn
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
    0.0.0.0         10.194.0.1      0.0.0.0         UG        0 0          0 eth0
    10.194.0.0      0.0.0.0         255.255.255.0   U         0 0          0 eth0
    
    root@ip-10-194-0-80:~# traceroute 10.198.0.164
    traceroute to 10.198.0.164 (10.198.0.164), 30 hops max, 60 byte packets
     1  10.194.0.176 (10.194.0.176)  0.441 ms  0.425 ms  0.409 ms^C
    

    When I was using openswan, it did not require me to make any manual modifications to each instance's routing table.

    Here is the Oregon VPN instance's routing table:

    root@ip-10-194-0-176:~# netstat -rn
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
    0.0.0.0         10.194.0.1      0.0.0.0         UG        0 0          0 eth0
    10.194.0.0      0.0.0.0         255.255.255.0   U         0 0          0 eth0
    

    I'm a bit stumped.

    EDIT #2:

    Looks like routing between the VPN instances might not be the problem: /var/log/syslog shows packets being received from one VPN instance public IP to the other VPN instance

    Jun 23 19:57:49 ip-10-194-0-176 charon: 10[NET] received packet: from 54.X.X.X[4500] to 10.194.0.176[4500] (76 bytes)
    

    Looks like it is an issue related to Child Security Associations:

    aws1oexternal-aws1nvexternal:   child:  10.194.0.0/16 === 10.198.0.0/16 TUNNEL, dpdaction=restart
    Security Associations (1 up, 0 **connecting**):
    

    /var/log/syslog:

    Jun 23 19:52:19 ip-10-194-0-176 charon: 02[IKE] failed to establish CHILD_SA, keeping IKE_SA
    Jun 23 19:52:48 ip-10-194-0-176 charon: 11[IKE] queueing CHILD_CREATE task
    Jun 23 19:52:48 ip-10-194-0-176 charon: 11[IKE]   activating CHILD_CREATE task
    Jun 23 19:52:48 ip-10-194-0-176 charon: 06[IKE] establishing CHILD_SA aws1oexternal-aws1nvexternal
    Jun 23 19:52:48 ip-10-194-0-176 charon: 10[IKE] received FAILED_CP_REQUIRED notify, no CHILD_SA built
    Jun 23 19:52:48 ip-10-194-0-176 charon: 10[IKE] failed to establish CHILD_SA, keeping IKE_SA
    Jun 23 19:52:49 ip-10-194-0-176 charon: 14[CFG] looking for a child config for 10.194.0.0/16 === 10.198.0.0/16 
    Jun 23 19:52:49 ip-10-194-0-176 charon: 14[CFG] found matching child config "aws1oexternal-aws1nvexternal" with prio 10
    Jun 23 19:52:49 ip-10-194-0-176 charon: 14[IKE] configuration payload negotiation failed, no CHILD_SA built
    Jun 23 19:52:49 ip-10-194-0-176 charon: 14[IKE] failed to establish CHILD_SA, keeping IKE_SA
    

    ***EDIT #3: Problem solved (uhh, actually see EDIT #4 below...)****

    Problem fixed.

    1) I did not properly follow Michael's config directions. I also configured a rightsourceip and leftsourceip together, thereby causing both instances to believe they were both initiators. I ensured that one was an initiator and one was a requestor; this fixed the IKE problem.

    2) I figured out that I also had to explicitly set the esp parameter. Even though there is already a default (aes128-sha1,3des-sha1), the esp parameter still has to be set in order for the instance to know to use esp OR ah (but not both). I ended up using aes128-sha1-modp2048.

    Hope this posting helps the next linux newbie set this up!!

    Cheers!

    EDIT #4: Problem (not really) solved

    While troubleshooting a separate issue related to strongswan, I changed the "leftfirewall" parameter, tested, didn't fix my separate issue, then reverted back to the orig config beforehand (commented out leftfirewall). I then noticed that I now couldn't ping across the tunnel. After going crazy for hours trying to figure out what happened, I commented out the esp parameter to see what would happen: I CAN NOW PING ACROSS THE TUNNEL AGAIN! <- so, there is a possibility there are some ipsec ghosts running around playing tricks on me and that the esp parameter isn't really the fix for the TS_UNACCEPTABLE errors (although other resources online state the esp parameter is the fix...)

    EDIT #5: Problem fully solved

    I ended up moving everything into a test environment and starting from scratch. I installed from source using the latest version (5.3.2) rather than the older version that was in the Ubuntu repo (5.1.2). This cleared the problem I was having above, and verified layer 7 connectivity using netcat (great tool!!) between multiple subnets over the VPN tunnel.

    Also: It is NOT required to enable DNS hostnames for the VPC (as I was incorrectly led to believe by Amazon), FYI>

    Hope this all helps!!!!!!

    Additional edit 2/11/2017:

    As per JustEngland's request, copying the working configuration below (leaving out certain details in order to prevent identification in any way):

    Side A:

    # ipsec.conf - strongSwan IPsec configuration file
    
    # basic configuration
    config setup
    # Add connections here.
    conn %default
     ikelifetime= You choose; must match other side
     keylife= You choose; must match other side
     rekeymargin= You choose; must match other side
     keyingtries=1
     keyexchange= You choose; must match other side
     authby=secret
     mobike=no
    
    conn side-a
     left=10.198.0.124
     leftsubnet=10.198.0.0/16
     leftid=54.y.y.y
     leftsourceip=10.198.0.124
     right=52.x.x.x
     rightsubnet=10.194.0.0/16
     auto=start
     type=tunnel
    # Add connections here.
    
    
    root@x:~# cat /etc/ipsec.secrets 
    A.A.A.A B.B.B.B : PSK "Your Password"
    

    Side B:

    # ipsec.conf - strongSwan IPsec configuration file
    
    # basic configuration
    config setup
    
    conn %default
     ikelifetime= You choose; must match other side
     keylife= You choose; must match other side
     rekeymargin= You choose; must match other side
     keyingtries=1
     keyexchange= You choose; must match other side
     authby=secret
     mobike=no
    
    conn side-b
     left=10.194.0.129
     leftsubnet=10.194.0.0/16
     leftid=52.x.x.x
     right=54.y.y.y
     rightsubnet=10.198.0.0/16
     rightsourceip=10.198.0.124
     auto=start
     type=tunnel
    
    root@x:~# cat /etc/ipsec.secrets 
    B.B.B.B A.A.A.A : PSK "Your Password"
    
    • JustEngland
      JustEngland over 7 years
      Could you post the working configuration.
    • lobi
      lobi over 7 years
      sure, will add configuration as an edit to my original question post. Please note that I no longer have access to the set up, so I can't verify 100% if the configurations are correct; however, they should be :)
  • lobi
    lobi about 9 years
    Hi Michael, this fixed the original problem, however now it seems there is a routing problem caused by the strongswan configuration. I am unable to ping from one VPN instance to the other VPN instance (timeouts), and if I try to ping from a different instance from within the subnet, I get the following: From 10.194.0.176: icmp_seq=4 Redirect Host(New nexthop: 10.194.0.176)
  • lobi
    lobi about 9 years
    I edited my original post
  • lobi
    lobi about 9 years
    Figured it out. I didn't implement Michaels' config correctly (I also included rightsourceip, thereby confusing which one was initiator and which one was requestor). I ALSO needed to explicitly set the esp parameter.
  • lobi
    lobi almost 9 years
    Not sure if this is 100% fixed. See edit #4 in original post.