Notepad++ - Removing the first column in a comma separated file

22,003

Solution 1

Notepad++'s search and replace supports regular expressions (regex) which can be easily used for this.

Use the following regex to search for:

^[^,]+,(.+)

This matches the start of the line followed by as much characters as possible not being a comma followed by a comma followed by the rest of the line. The rest of the line is grouped as first submatch.

Globally replace with this:

\1

This denotes the first submatch (rest of the line). By that each line is replaced by everything after the first column and comma.

After I found the above way to do it in a single global replace (and updated my reply accordingly), I noticed that this reply is basically identical but also gives a comprehensive explanation of the regex used.


Note: The shorter regex ^[^,]+, can't be used for global replace with an empty string since Notepad++ will then replace all columns except the last: After replacing the first column, the second column (which now is the first and matches exactly the regex) will be replaced, then the third, and so on. However, the shorter regex works perfectly with other editors (e.g. with PSPad or vim).

Solution 2

Press Ctrl + H and perform the following replace:

Find what:          .*?,(.*)
Replace with:       \1
Wrap around:        checked
Regular expression: selected
. matches newline:  unchecked

Now press Alt + A to replace all occurrences.

How it works

  • The regular expression .*?,(.*) matches an entire line:

    • .*?, matches everything before the first comma, including the comma itself.

      .* means any number of occurrences of any character, and the question mark makes the quantifier lazy, i.e., it matches as few characters as possible.

    • (.*) matches everything after the first comma.

      Enclosing .* in parentheses converts it into a subpattern, so the mast can be accessed in the replace field.

  • \1 represents the first submatch (match for (.*)).

    As a result, Notepad++ replaces the line by everything that follows the first comma.

Solution 3

In Windows, you can do it as follows.

for /F "tokens=2,3,4,5,6 delims=," %i in (Input.csv) do @echo %i,%j,%k,%l,%m  >> output.csv

I assumed that you have only 6 columns. If you have many more columns, try experimenting with * in tokens field. Idea is taken from Windows for command

Solution 4

Assuming you have a linux system or some unix style environment (I like gow, or you can snarf the utilities off unixutils) I believe running the file through cut -d , -f2-6 should do the trick - it should, if i recall correctly will do the trick - -d sets the deliminator, and f2-6 prints out the second to 6th character.

cat input.csv | cut -d , -f2-6 > output.csv would do the trick taking input file and kicking out an output file. Its not using notepad, but its fast and really simple.

Solution 5

You should be able to load the CSV into excel and have it treat numbers as text (preventing it from converting to scientific numbers).

  1. Open Excel
  2. Data Tab
  3. From Text
  4. Choose Delimited
  5. Choose Other: ","
  6. For all columns select them in the Data Preview window, and choose Text
  7. Remove your column
  8. Save as CSV
Share:
22,003

Related videos on Youtube

MikeD
Author by

MikeD

Updated on September 18, 2022

Comments

  • MikeD
    MikeD over 1 year

    I have a large CSV file that I need to remove the first column of data. I cannot open it in Excel because Excel converts some of the values in the columns to scientific numbers.

    I am using Notepad++, and I am trying to string the first column from the file EXE,

    1,Value1,value2,value3,value4,value5
    3445,Value1,value2,value3,value4,value5
    12345,Value1,value2,value3,value4,value5
    1234,Value1,value2,value3,value4,value5
    11,Value1,value2,value3,value4,value5
    

    to look like

    Value1,value2,value3,value4,value5
    Value1,value2,value3,value4,value5
    Value1,value2,value3,value4,value5
    Value1,value2,value3,value4,value5
    Value1,value2,value3,value4,value5
    
  • MikeD
    MikeD almost 12 years
    Thanks, I just clicked on the link and I got a 403 error?
  • Thalys
    Thalys almost 12 years
    both links work for me - which is wierd. I usually find gow by googling for it - its on a github repo belonging to bmatzelle. Cygwin might also be an option, but its an overkill for this sorta thing
  • speakr
    speakr almost 12 years
    Why not just doing a single global replace with :%s/^[^,]\+,//g?
  • kenorb
    kenorb almost 12 years
    You could as well, this one is easy to use and to understand rather than regex:) Usually I'm always confused which character I've to escape, so I'm ending in typing the same regex many times.
  • simbabque
    simbabque almost 12 years
    Editing and saving CSV files in Excel often breaks numbers like EAN codes and US-style floats in European Excel. Even if you set up everything when importing, it happens to eat up some things. I cannot recommend it, though it would probably work. In a productive environment, I'll advice against it.
  • simbabque
    simbabque almost 12 years
    This is the way to go here. If the OP already has N++ this is the quickest way. I do this a lot with PSPad (which could do this in one go, btw). Also check out how the regex works: rubular.com/r/OiehkBT0vA
  • Dennis
    Dennis almost 12 years
    Notepad++ doesn't process the input line by line, but character by character. That has some neat advantages (like multi-line patterns).
  • speakr
    speakr almost 12 years
    Just got the same idea after noticing that ^[^,]+, globally replaced with an empty string won't work in Notepad++. (+1)
  • MikeD
    MikeD almost 12 years
    This worked great! Thnak you
  • Dennis
    Dennis almost 12 years
    +1 for the edit. Sadly, your answer is community wiki now.
  • speakr
    speakr almost 12 years
    @Dennis Yes, I edited too often since I wasn't aware of the 10-edits limit.
  • SeanC
    SeanC almost 12 years
    for arbitrary number of columns, use this: for /F "tokens=1* delims=," %i in (Input.csv) do @echo %j >> output.csv
  • nerkn
    nerkn almost 12 years
    Why not ^[^,]+, and replace with empty?
  • nerkn
    nerkn almost 12 years
    @speakr: oh yeah, right. Edited my comment to reflect that. And now I see that this is already covered by your answer. Sorry for the noise!
  • James Wood
    James Wood almost 12 years
    @simbabque I would say that's slighty unfair, I have used it successfully in production environments for large datasets which required manipulation - admittedly at times it was a nightmare. Excel does have a habit of altering data in unexpected ways, but I wouldn't say this risk was especially greater than other approaches.
  • simbabque
    simbabque almost 12 years
    I use it on occasion as well, but most of these times I don't like to do it. It's often a lot faster to use a text editor that supports regex search & replace if one knows how to handle it. No offense, though, as your answer was clear and concise.
  • James Wood
    James Wood almost 12 years
    o i wasnt taking offence :D
  • Nick
    Nick almost 12 years
    A tweak to the suggested regex from knittl would work. If it was ^.+?, - the ? makes any quantifier "non-greedy," so that it it matches as few characters as possible. It works in the Notepad++ parser. This is not a better way, just a different way that's in line with the other suggestion.
  • Nick
    Nick almost 12 years
    oops, somehow that didn't process originally - my first test case was incorrectly putting spaces at the beginning of the line, which seems to make Npp evauluate ^.+?, differently in a replace all. Sorry about that.