BigQuery error when loading csv file from Google Cloud Storage
Solution 1
For me, it was an issue with the presence of new line and carriage return characters, try replacing the special characters. I have replaced the characters using below code and it resolved the loading part.
df= df.applymap(lambda x: x.replace("\r"," "))
df= df.applymap(lambda x: x.replace("\n"," "))
I have used lambda function as I don't know which column is string in my case. If you are sure about columns then replace its column wise.
Try to replace the characters and it will work for you as well.
Solution 2
You cannot have empty rows in your file without delimiters, otherwise BigQuery (and pretty much every other ingest engine) will think it's just one column.
For example, this will fail on row 3 with the error you describe:
119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts,Long Sleeve Shirts
119471,Fashion,Fashion Own,Womenswear,Womensswear Brands Other,Formal Shirts,Long Sleeve Shirts
This will succeed:
119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts,Long Sleeve Shirts
,,,,,,,
119471,Fashion,Fashion Own,Womenswear,Womensswear Brands Other,Formal Shirts,Long Sleeve Shirts
Solution 3
You either have an empty line
119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts
119472,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts
Or a line with quotes
119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts
"119471,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts"
119472,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts
I think there is a bug in the BigQuery response. The line number in the error is in fact the number of character before the error.
Related videos on Youtube
gvkleef
Updated on September 23, 2022Comments
-
gvkleef over 1 year
I'm trying to load the data of a
csv
file that is saved inGCS
intoBigQuery
. The csv file is in theUTF-8
format and it contains 7 columns. I've specified these columns in the data scheme (all strings and nullable) and I've checked the contents of the csv file which seems fine.When I try to load the data I get the following error:
Too many errors encountered. (error code: invalid) gs://gvk_test_bucket/sku_category.csv: CSV table references column position 1, but line starting at position:1750384 contains only 1 columns. (error code: invalid)
The weird thing is that the file only contains 680228 rows.
When I check the
allow jagged lines
options the table is being generated, but only the first column is filled with the entire comma separated string.Can someone help me?
Example row
119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts,Long Sleeve Shirts
-
gvkleef about 7 yearsThanks for your comment. The rownumber with errors were bigger then the max row number in my excel. So there were no empty rows within the dataset. It seems BQ does not stop reading the data after the last row in my csv.
-
Graham Polley about 7 yearsSorry, I don't follow you. What do you mean?
-
gvkleef about 7 yearsThat for example my csv contains 80000 rows and I get errors on row 81000, 82500 etc.
-
Graham Polley about 7 yearsSo, fix those lines were you are missing the delimiters.
-
Marl about 6 years@gvkleef I think there is a bug in the BigQuery response. The line number in the error is in fact the number of character before the error.