AWS Athena Import CSV file

11,881

To reproduce your situation, I did the following:

  • Created a text file using your sample data (gps.txt)
  • Uploaded it to an Amazon S3 bucket in its own folder (with no other files in that folder)
  • Created a table in Amazon Athena
    • Specified the location as the folder name (s3://my-bucket/gps/)
    • Specified 7 columns (since there are 7 string values in your sample file)

However, since the data has commas within each pair of numbers, I changed the SerDe to OpenCSVSerDe for Processing CSV - Amazon Athena:

CREATE EXTERNAL TABLE IF NOT EXISTS default.gps (
  `c1` string,
  `c2` string,
  `c3` string,
  `c4` string,
  `c5` string,
  `c6` string,
  `c7` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\") 

LOCATION 's3://my-bucket/gps/'
TBLPROPERTIES ('has_encrypted_data'='false');

I was then able to successfully query the table. A sample column value is: 41.9100687,-87.8805614

Share:
11,881
harunB10
Author by

harunB10

Updated on June 20, 2022

Comments

  • harunB10
    harunB10 almost 2 years

    I have this data stored in S3 as .csv (but it can be any other file format which is the best suitable for my requirement):

    "41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
    "41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
    "41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
    "41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
    

    and I would like to have one coordinate per column:

    Like this:

    Coordinates:

    1.  41.9100687,-87.8805614
    
    2.  41.9802511,-87.8803253
    
    3.  41.9806802,-87.8792417
    

    After importing S3 I choose CSV as data type... And then I add string column.

    enter image description here

    But instead I get some weird table output. Beside this I tried to import this as plain txt file with comma delimiter.. I get same weird output.

    enter image description here

    What am I doing wrong here?

    EDIT

    This test column screenshot is query from another but identical example. There should be gps_coordinates