Notepad++ - Removing the first column in a comma separated file
Solution 1
Notepad++'s search and replace supports regular expressions (regex) which can be easily used for this.
Use the following regex to search for:
^[^,]+,(.+)
This matches the start of the line followed by as much characters as possible not being a comma followed by a comma followed by the rest of the line. The rest of the line is grouped as first submatch.
Globally replace with this:
\1
This denotes the first submatch (rest of the line). By that each line is replaced by everything after the first column and comma.
After I found the above way to do it in a single global replace (and updated my reply accordingly), I noticed that this reply is basically identical but also gives a comprehensive explanation of the regex used.
Note: The shorter regex ^[^,]+,
can't be used for global replace with an empty string since Notepad++ will then replace all columns except the last: After replacing the first column, the second column (which now is the first and matches exactly the regex) will be replaced, then the third, and so on. However, the shorter regex works perfectly with other editors (e.g. with PSPad or vim).
Solution 2
Press Ctrl + H and perform the following replace:
Find what: .*?,(.*)
Replace with: \1
Wrap around: checked
Regular expression: selected
. matches newline: unchecked
Now press Alt + A to replace all occurrences.
How it works
-
The regular expression
.*?,(.*)
matches an entire line:-
.*?,
matches everything before the first comma, including the comma itself..*
means any number of occurrences of any character, and the question mark makes the quantifier lazy, i.e., it matches as few characters as possible. -
(.*)
matches everything after the first comma.Enclosing
.*
in parentheses converts it into a subpattern, so the mast can be accessed in the replace field.
-
-
\1
represents the first submatch (match for(.*)
).As a result, Notepad++ replaces the line by everything that follows the first comma.
Solution 3
In Windows, you can do it as follows.
for /F "tokens=2,3,4,5,6 delims=," %i in (Input.csv) do @echo %i,%j,%k,%l,%m >> output.csv
I assumed that you have only 6 columns. If you have many more columns, try experimenting with * in tokens field. Idea is taken from Windows for command
Solution 4
Assuming you have a linux system or some unix style environment (I like gow, or you can snarf the utilities off unixutils) I believe running the file through cut -d , -f2-6
should do the trick - it should, if i recall correctly will do the trick - -d
sets the deliminator, and f2-6
prints out the second to 6th character.
cat input.csv | cut -d , -f2-6 > output.csv
would do the trick taking input file and kicking out an output file. Its not using notepad, but its fast and really simple.
Solution 5
You should be able to load the CSV into excel and have it treat numbers as text (preventing it from converting to scientific numbers).
- Open Excel
- Data Tab
- From Text
- Choose Delimited
- Choose Other: ","
- For all columns select them in the Data Preview window, and choose Text
- Remove your column
- Save as CSV
Related videos on Youtube
MikeD
Updated on September 18, 2022Comments
-
MikeD over 1 year
I have a large CSV file that I need to remove the first column of data. I cannot open it in Excel because Excel converts some of the values in the columns to scientific numbers.
I am using Notepad++, and I am trying to string the first column from the file EXE,
1,Value1,value2,value3,value4,value5 3445,Value1,value2,value3,value4,value5 12345,Value1,value2,value3,value4,value5 1234,Value1,value2,value3,value4,value5 11,Value1,value2,value3,value4,value5
to look like
Value1,value2,value3,value4,value5 Value1,value2,value3,value4,value5 Value1,value2,value3,value4,value5 Value1,value2,value3,value4,value5 Value1,value2,value3,value4,value5
-
MikeD almost 12 yearsThanks, I just clicked on the link and I got a 403 error?
-
Thalys almost 12 yearsboth links work for me - which is wierd. I usually find gow by googling for it - its on a github repo belonging to bmatzelle. Cygwin might also be an option, but its an overkill for this sorta thing
-
speakr almost 12 yearsWhy not just doing a single global replace with
:%s/^[^,]\+,//g
? -
kenorb almost 12 yearsYou could as well, this one is easy to use and to understand rather than regex:) Usually I'm always confused which character I've to escape, so I'm ending in typing the same regex many times.
-
simbabque almost 12 yearsEditing and saving CSV files in Excel often breaks numbers like EAN codes and US-style floats in European Excel. Even if you set up everything when importing, it happens to eat up some things. I cannot recommend it, though it would probably work. In a productive environment, I'll advice against it.
-
simbabque almost 12 yearsThis is the way to go here. If the OP already has N++ this is the quickest way. I do this a lot with PSPad (which could do this in one go, btw). Also check out how the regex works: rubular.com/r/OiehkBT0vA
-
Dennis almost 12 yearsNotepad++ doesn't process the input line by line, but character by character. That has some neat advantages (like multi-line patterns).
-
speakr almost 12 yearsJust got the same idea after noticing that
^[^,]+,
globally replaced with an empty string won't work in Notepad++. (+1) -
MikeD almost 12 yearsThis worked great! Thnak you
-
Dennis almost 12 years+1 for the edit. Sadly, your answer is community wiki now.
-
speakr almost 12 years@Dennis Yes, I edited too often since I wasn't aware of the 10-edits limit.
-
SeanC almost 12 yearsfor arbitrary number of columns, use this:
for /F "tokens=1* delims=," %i in (Input.csv) do @echo %j >> output.csv
-
nerkn almost 12 yearsWhy not
^[^,]+,
and replace with empty? -
nerkn almost 12 years@speakr: oh yeah, right. Edited my comment to reflect that. And now I see that this is already covered by your answer. Sorry for the noise!
-
James Wood almost 12 years@simbabque I would say that's slighty unfair, I have used it successfully in production environments for large datasets which required manipulation - admittedly at times it was a nightmare. Excel does have a habit of altering data in unexpected ways, but I wouldn't say this risk was especially greater than other approaches.
-
simbabque almost 12 yearsI use it on occasion as well, but most of these times I don't like to do it. It's often a lot faster to use a text editor that supports regex search & replace if one knows how to handle it. No offense, though, as your answer was clear and concise.
-
James Wood almost 12 yearso i wasnt taking offence :D
-
Nick almost 12 yearsA tweak to the suggested regex from knittl would work. If it was ^.+?, - the ? makes any quantifier "non-greedy," so that it it matches as few characters as possible. It works in the Notepad++ parser. This is not a better way, just a different way that's in line with the other suggestion.
-
Nick almost 12 yearsoops, somehow that didn't process originally - my first test case was incorrectly putting spaces at the beginning of the line, which seems to make Npp evauluate ^.+?, differently in a replace all. Sorry about that.