convert txt file to csv seperated with tabs

25,962

Solution 1

A web search for "replace space with comma" was very fruitful, didn't that work out for you first? Would've found lots of answers like this:

tr ' ' ',' < input > output

or for tabs:

tr '\t' ',' < input > output

and

sed 's/\s\+/,/g' input > output

\s is the space class (like [:space:]) and should replace any runs (+ (escaped) = one or more of the preceding character) of spaces or tabs or newlines too. This next one would only replace each single space or tab with a single comma (like running both above tr's) :

sed 's/[ \t]/,/g' input > output

And -i edits the file in-place (directly edits the file) in sed

Here's a sed that will match a space-number or a number-space, and replace them with a comma, using the OR command/symbol | escaped as \| below:

sed 's/ [0-9]\|[0-9] /,/g'

Solution 2

Ok, so you need to replace the first two and the last space in every line with a comma. You can't just replace every space, because the 3rd field may contain spaces itself. You can do this with regular expression replacement. Here's a sed script/command, that works:

sed -re 's/^(\S*) (\S*) (.*) (\S+)\s*$/\1,\2,\3,\4/' in.txt > out.csv

With the above example this returns:

Account,Units,Description,Delta
2281,19,Toshiba PX-1982GRSUB,0
9618,200,HP MX19942-228b,-25
19246,4,CompuCom HD300g Hard Drive,4

This is still quite fragile with handling empty fields and breaks entirely, if columns other than the 3rd contain spaces. It's very easy to introduce such malformed data if it is formatted manually as done by your boss. You should suggest to him to switch to a more robust table format (e. g. proper CSV & Co.) and editor (common spread sheet tools can manipulate CSV quite well and flexibly, e. g. LibreOffice/OpenOffice Calc, Microsoft Excel and Google Docs).

Share:
25,962

Related videos on Youtube

Bobby H.
Author by

Bobby H.

Self taught Computer Geek, before it was cool to be a geek. Working with Access/Excel now.

Updated on September 18, 2022

Comments

  • Bobby H.
    Bobby H. over 1 year

    Sorry people I'm new to Linux, and while I looked through the list of answered questions, I don't know enough to recognize if my question was answered. Or if I can adapt one of the answer to my particular little problem.

    I get a text file of data from my boss, who learned to use computers one way; and he won't change. The data is almost a csv file, except the fields are all separated by a space characters rather then a comma or tab character. And the text fields of data include embedded spaces also.

    Each field is either a number or is numbers and text, all fields are of varying lengths, and none are off-set with single or double quotes. The number fields predominate, and no text field is adjacent to any other text field. Rarely is an embedded number in a text field preceded or followed by a [space] character.

    Unfortunately not every [space] character can just be replaced. Instead because, generally, field breaks come in the form of either [space][0-9] or [0-9][space], this is how I determine if a [space] character should be converted to a [tab] character or not. If the [space] character is beside a digit its to be converted to a [tab] character.

    So using the Find/Replace function in Notepad for Windows, I search for a digit-space or a space-digit combination, converting that [space] character to a [tab] character. I have to do this ten times [0-9][space] and then ten more times [space][0-9]. I'm looking for a script to do this automatically.

    Here is an example of the file I get. It contains four fields separated by [space] characters (first line). Each following line is one record, so the second line is the first record. Account is 2281, Units are 19, Description is Toshiba PX-1982GRSUB{, and finally the Delta field contains the 0:

    Account Units Description Delta  
    2281 19 Toshiba PX-1982GRSUB 0  
    9618 200 HP MX19942-228b -25  
    19246 4 CompuCom HD300g Hard Drive 4
    

    So what I'm looking for is a script that will read the original file, convert the [space] characters that are field separators into characters and write it all to a new file. And I want the explanation -- so I don't keep asking the same questions over and over again.

    • kos
      kos over 8 years
      Can you post a sample input and a sample output? I mean I think it's clear enough, but just to be sure. Also please clarify whether fields containing text could also contain digits (although I'd assume they couldn't).
    • Wilf
      Wilf over 8 years
      Are any text fields with spaces formatted like "this with quotes or similar?"
    • Jacob Vlijm
      Jacob Vlijm over 8 years
      And the "fake tabs" are they always the same number of spaces? In short; we need an example, like @kos requested.
    • Bobby H.
      Bobby H. over 8 years
      Kos: Yep see what you mean; yes 'embedded numbers' are present in otherwise nominal text fields. Rarely the number is led or followed by a <space> rather then some other character. When that happens I either add an unused character like @$%^&*, so it gets skipped when I use "Replace All" or I manuelly edit that section prior to importing into the database or excel sheet. Wilf: No, the text fields are not seperated by single or double quotes. And the text fields can begin and end in a wide variety of ways. The number of different text fields is finite but very very large.
    • Bobby H.
      Bobby H. over 8 years
      Jacob Vlijm: No, again unfortunate. But all the fields have varing lenths. Pretty much the only standard is that a field either contains only numbers or numbers and text with the numbers rarely proceded or followed by a <Space> Character.
    • Bobby H.
      Bobby H. over 8 years
      Example: Account> Units> Description> Delta> 2281> 19> Toshiba PX-1982GRSUB> 0> 9618> 200> HP MX19942-228b> -25> 19246> 4> CompuCom HD300g Hard Drive> 4>
    • Bobby H.
      Bobby H. over 8 years
      well that example is pretty confusing. substitute <tab> characters where the ">" character is; and carriage return line feed every fourth field. Sorry in a hurry here ATT, don't have time to figure out the mini-Markdown formatting... promise to take sometime and do that also.....
    • Bobby H.
      Bobby H. over 8 years
      Example : Account Units Description Delta 2281 19 Toshiba PX-1982GRSUB 0 9618 200 HP MX19942-228b -25 19246 4 CompuCom HD300g Hard Drive 4 Well that didn't work either.
    • kos
      kos over 8 years
      Could you please add those informations and the various examples to your question by editing it so that users will be able to read them directly when reading the question? Thanks.
    • Rinzwind
      Rinzwind over 8 years
      And there is a reason for not using "calc"?
    • Bobby H.
      Bobby H. over 8 years
      I didn't know about "calc" -- Let me look at it and see. Thanks --
    • Bobby H.
      Bobby H. over 8 years
      Ok not sure how that helps? calc - arbitrary precision calculator (from man pages). I'm trying to convert a pure text file into csv, to import into a database or spread sheet format. It appears "calc" is calculator program; which I don't presently have loaded on the Linux machine I am using. Rinzwind, would you show me how it applies in this situation?
    • Jacob Vlijm
      Jacob Vlijm over 8 years
      @BobbyH. I am afraid we need a more detailed description; a structure that always matches the situation, including the wanted output. No script on earth is capable of deciding when a space should be a space or be converted into a tab, unless you tell it how to decide. The last one is essential, as you will understand. The question is simply not clear as it is.
    • Rinzwind
      Rinzwind over 8 years
      @BobbyH. libreoffice calc... the spreadsheet... those are made for converting from csv to txt.
    • Bobby H.
      Bobby H. over 8 years
      Thank you, I'll try that. But will it go from txt to csv? or even import directly?
    • Bobby H.
      Bobby H. over 8 years
      Jacob Vlijm -- I edited the original question, I hope that is more clear.
    • Bobby H.
      Bobby H. over 8 years
      Perhaps I should ask instead: Can I have a script read in a line of text from a file, look through the line (several times) for all instances of any two specific characters; for each instance change one of the two characters to a different character; and then write it out to a file?
  • Bobby H.
    Bobby H. over 8 years
    Xen2050 -- thank you. unfortunately 'tr' didn't work. I want to search for a token 2 characters in length. as far as I can tell 'tr' only parses 1 character. Still looking at 'sed'. and yes I did look (and am still looking) but I haven't yet found anything that does what I want.
  • Bobby H.
    Bobby H. over 8 years
    Xen2050 -- Thank you, again. 'sed' worked. The way I did it, its clunky, but it will save me about an hour (even clunky) every time I have to handle the text files the boss gives me. And I have a point from which to advance now. Again thank you. And thank you all for the help.
  • Xen2050
    Xen2050 over 8 years
    No problem. That is a shortcoming of tr, no multiple characters. sed works well with just about any string, something like sed 's/xy/,/g' is probably what you're using I'm guessing. Maybe sed 's/ [0-9]/,/g' and sed 's/[0-9] /,/g' Even does regex's that can get pretty complicated & useful too, I'm guessing one of them would match space-num or num-space... Good luck!
  • Xen2050
    Xen2050 over 8 years
    Actually, thinking about matching a number-space or a space-number, and replace them with a comma, this might work in one go too: sed 's/[ 0-9][ 0-9]/,/g' -> no, that one matches two space and also two numbers... close though. Found one that DOES match, with the OR command/symbol |, escaped as \|
  • Bobby H.
    Bobby H. over 8 years
    Yep Basically. But I am searching for a number-space/spce-number combination and then replacing it with number-tab/tab-number: where the number is one of 0-9 and the replaced number matches the searched for number. Now I'm searching through the 'man' pages to see how I can do it more elegantly.
  • Bobby H.
    Bobby H. over 8 years
    David -- sorry took me a while to figure out what your command did. It works beautifully for a four field text file. Unfortunately I get text files with a variable number of fields, so i would have too look it over to determine the number of fields and either modify the script or send it a switch to set the field count. The solution I chose would work regardless of t he number of fields. But your command is much prettier then my code; and compact also. Still thinking how I could empamnet your code in my solution though.. Thanks again to you and all the rest -- B
  • David Foerster
    David Foerster over 8 years
    How are the additional fields structured? Can you extend your question with a description of the new requirements? Or better yet, ask a follow-up question referring to this one. It might also be better to ask this on Stack Overflow, where you would get a much broader audience for your platform-independent text processing problem, since you don't care about the programming language (shell script, sed, Awk, Perl, Python…).