Azure data Factory escape character and quote issue - copy activity

10,916

I faced a similar problem.

What I think happens for you is this.

  • The data is 9 characters, like so "Gasunie\
  • The output is written "quoted" and uses \ as the escape character.
  • So the output will be "your_text", but any quotes in your_text are replaced with \"
  • So the output is "\"Gasunie\" - the outside quotes enclose your text and the inside one has been escaped with \

Now we come to read this back in: it seems to be parsed like this.

  • The first quote is the start of your quoted field value, so from here on I'm reading your text field value.
  • Then I see \" which is a quote character (that has been escaped).
  • Then I see Gasunie
  • Then I see \" which is a quote character (that has been escaped).
  • Then I see field delimiters but as I'm still thinking I'm inside a quoted field then they are just text, so are included in my output "Gasunie"|1|||||||||||...
  • I keep reading characters into this field until I reach the next double quote at which point I'm expecting a new delimiter to start the next field.

So the problem is that ADF is putting quotes round whatever string it has in hand and writing this to the output; while on input it's parsing left to right so any string ending in the escape character is a problem. I'm not sure if you'd call that a bug.

What can you do?

In your case, just change the escape character to something that's never seen in your input (maybe @ or { or something). Then the \" at the end of your output text is no longer an escaped quote.

My similar case - when is the escape character not an escape character?

I have a field that contains a comma, but comma is also the field separator. My data comes from a third party and they have handily escaped this comma for me, using a backslash, thus:

Field One, Field\,Two, Field Three

I have \ as my escape character so you'd think this will give me three output fields:

| Field One | Field,Two | Field Three |

Wrong. The escape character only works when it's inside the quoted field. My input is not quoted, so the backslash is just treated as text and the comma is a field separator meaning my output has four fields

| Field One | Field\ | Two | Field Three |

Solution: tell my ADF dataset there are no quote characters around my input - then it treats anything after a comma as the text field and applies the escape character as expected.

enter image description here

You might also be interested in this https://feedback.azure.com/forums/270578-data-factory/suggestions/35482144-text-format-escape-char-only-if-needed-or-per-fiel.

So if you're looking for "escaped comma in csv is creating extra field", I hope this saves you a bit of time!

Share:
10,916
marival
Author by

marival

Updated on June 05, 2022

Comments

  • marival
    marival almost 2 years

    I have ADF pipelines exporting (via copy activity) data from Azure SQL db to Data Lake (ADLS2) then from there to another Azure SQL db. It was working fine until some characters appeared.

    This is how the culprit record looks in the first Azure SQL db: "Gasunie\

    enter image description here

    This is how the dataset is set up in ADF to export it into ADLS: Column delimiter - pipe Row delimiter - autodetect Encoding - Default(UTF-8) Escape character - Backslash() Quote character - Double quote (")

    enter image description here

    This is how the exported file looks like in notepad++ (it's pipe-delimited file): "\"Gasunie\" enter image description here

    These are the settings for the adls dataset in ADF when loading it from adls to azure sql db: Column delimiter - comma Row delimiter - autodetect Encoding - Default(UTF-8) Escape character - Backslash() Quote character - Double quote (")

    Note it's comma-delimited now but that is not causing any problems.

    enter image description here

    But this is how it looks once loaded: "Gasunie"|1|||||||||||... The backslash that was originally there has somehow caused it to stop the delimiting for the next few columns. enter image description here

    I have tried many, many various different settings for the quotes and escape character but they create more problems for other data in the dataset.

    Does anyone know how I can correct it without having to ask for the source to be corrected?

    Note: there is a reason why it's stored in adls so it can't be a copy from Azure sql db to another Azure Sql db.

    It's built from a template by consultants, highly parameterized so inserting dataflows to process the files in adls would be a very lengthy process.

    Any help appreciated. Thank you.

  • marival
    marival almost 4 years
    Thanks charith87. What query do you mean? Do you mean the expression language in ADF?
  • charith87
    charith87 almost 4 years
    I am referring to the source tab on the copy activity. look at the new image I pasted