GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) while connecting Polybase with Kerberos

12,579

The problem has itself taken care after we restarted the cluster. I think the problem was that the krb5.conf file in our Hadoop-Cluster could not be distributed on all nodes because of some running services. There was also a warning in the Cloudera Manager about a stale configuration regarding Kerberos. Many thanks to everyone!

Share:
12,579
Gigi
Author by

Gigi

Updated on June 15, 2022

Comments

  • Gigi
    Gigi about 2 years

    We want to connect our SQL Server 2016 Enterprise via Polybase with our Kerberized OnPrem Hadoop-Cluster with Cloudera 5.14.

    I followed the Microsoft PolyBase Guide to configure Polybase. After working few days on this topic I'm not able to continue because of an exception: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

    Microsoft has an built in diagnostic tool for troubleshooting the connectivity with PolyBase and Kerberos. On this troubleshooting guide from Microsoft there are 4 checkpoints and I'm stuck on checkpoint 4. Short information about the checkpoints (where I'm successfull):

    • Checkpoint 1: Successfull! Authenticated against the KDC and received a TGT
    • Checkpoint 2: Successfull! Regarding troubleshooting guide PolyBase will make an attempt to access the HDFS and fail because the request did not contain the necessary Service Ticket.
    • Checkpoint 3: Sucessfull! A second hex dump indicates that SQL Server successfully used the TGT and acquired the applicable Service Ticket for the name node's SPN from the KDC.
    • Checkpoint 4: Not successfull SQL Server was authenticated by Hadoop using the ST (Service Ticket) and a session was granted to access the secured resource.

    krb5.conf file

    [libdefaults]
    default_realm = COMPANY.REALM.COM
    dns_lookup_kdc = false
    dns_lookup_realm = false
    ticket_lifetime = 86400
    renew_lifetime = 604800
    forwardable = true
    default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
    default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
    permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
    udp_preference_limit = 1
    kdc_timeout = 3000
    [realms]
    COMPANY.REALM.COM = {
    kdc = ipadress.kdc.host
    admin_server = ipadress.kdc.host
    }
    [logging]
    default = FILE:/var/log/krb5/kdc.log
    kdc = FILE:/var/log/krb5/kdc.log
    admin_server = FILE:/var/log/krb5/kadmind.log
    

    core-site.xml for Polybase on SQL-Server

    <?xml version="1.0" encoding="utf-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!-- Put site-specific property overrides in this file. -->
    <configuration>
      <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
      </property>
      <property>
        <name>ipc.client.connect.max.retries</name>
        <value>2</value>
      </property>
      <property>
        <name>ipc.client.connect.max.retries.on.timeouts</name>
        <value>2</value>
      </property>
    
    <!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
    <property>
        <name>polybase.kerberos.realm</name>
        <value>COMPANY.REALM.COM</value>
      </property>
      <property>
        <name>polybase.kerberos.kdchost</name>
        <value>ipadress.kdc.host</value>
      </property>
      <property>
        <name>hadoop.security.authentication</name>
        <value>KERBEROS</value>
      </property>
    </configuration>
    

    hdfs-site.xml for Polybase on SQL-Server

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!-- Put site-specific property overrides in this file. -->
    <configuration>
      <property>
        <name>dfs.block.size</name>
        <value>268435456</value> 
      </property>
      <!-- Client side file system caching is disabled below for credential refresh and 
           settting the below cache disabled options to true might result in 
           stale credentials when an alter credential or alter datasource is performed
      -->
      <property>
        <name>fs.wasb.impl.disable.cache</name>
        <value>true</value>
      </property>
      <property>
        <name>fs.wasbs.impl.disable.cache</name>
        <value>true</value>
      </property>
      <property>
        <name>fs.asv.impl.disable.cache</name>
        <value>true</value>
      </property>
      <property>
        <name>fs.asvs.impl.disable.cache</name>
        <value>true</value>
      </property>
      <property>
        <name>fs.hdfs.impl.disable.cache</name>
        <value>true</value>
      </property>
    <!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
      <property>
        <name>dfs.namenode.kerberos.principal</name>
        <value>hdfs/[email protected]</value> 
      </property>
    </configuration>
    

    Polybase Exception

    [2018-06-22 12:51:50,349] WARN  2872[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
    [2018-06-22 12:51:53,568] WARN  6091[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
    [2018-06-22 12:51:56,127] WARN  8650[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
    [2018-06-22 12:51:58,998] WARN 11521[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
    [2018-06-22 12:51:59,139] WARN 11662[main] - org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:676) - Couldn't setup connection for [email protected] to IPADRESS_OF_NAMENODE:8020
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    

    Log Entry on NameNode

    Socket Reader #1 for port 8020: readAndProcess from client IP-ADRESS_SQL-SERVER threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: AES128 CTS mode with HMAC SHA1-96 encryption type not in permitted_enctypes list)]]
    
    Auth failed for IP-ADRESS_SQL-SERVER:60484:null (GSS initiate failed) with true cause: (GSS initiate failed)
    

    The confusing part for me is the log entry from our NameNode because AES128 CTS mode with HMAC SHA1-96 is already in the list of permitted enctypes as shown in krb5.conf and in Cloudera Manager UI

    Cloudera Manager UI krb_enc_types

    We appreciate your help!

    • wBob
      wBob about 6 years
      You might be best off opening a support call directly with Microsoft.