Notepad++ - Removing the first column in a comma separated file

notepad++ regex csv text-editing

22,003

Solution 1

Notepad++'s search and replace supports regular expressions (regex) which can be easily used for this.

Use the following regex to search for:

^[^,]+,(.+)

This matches the start of the line followed by as much characters as possible not being a comma followed by a comma followed by the rest of the line. The rest of the line is grouped as first submatch.

Globally replace with this:

\1

This denotes the first submatch (rest of the line). By that each line is replaced by everything after the first column and comma.

After I found the above way to do it in a single global replace (and updated my reply accordingly), I noticed that this reply is basically identical but also gives a comprehensive explanation of the regex used.

Note: The shorter regex ^[^,]+, can't be used for global replace with an empty string since Notepad++ will then replace all columns except the last: After replacing the first column, the second column (which now is the first and matches exactly the regex) will be replaced, then the third, and so on. However, the shorter regex works perfectly with other editors (e.g. with PSPad or vim).

Solution 2

Press Ctrl + H and perform the following replace:

Find what:          .*?,(.*)
Replace with:       \1
Wrap around:        checked
Regular expression: selected
. matches newline:  unchecked

Now press Alt + A to replace all occurrences.

How it works

The regular expression .*?,(.*) matches an entire line:
- .*?, matches everything before the first comma, including the comma itself.
  
  .* means any number of occurrences of any character, and the question mark makes the quantifier lazy, i.e., it matches as few characters as possible.
- (.*) matches everything after the first comma.
  
  Enclosing .* in parentheses converts it into a subpattern, so the mast can be accessed in the replace field.
\1 represents the first submatch (match for (.*)).

As a result, Notepad++ replaces the line by everything that follows the first comma.

Solution 3

In Windows, you can do it as follows.

for /F "tokens=2,3,4,5,6 delims=," %i in (Input.csv) do @echo %i,%j,%k,%l,%m  >> output.csv

I assumed that you have only 6 columns. If you have many more columns, try experimenting with * in tokens field. Idea is taken from Windows for command

Solution 4

Assuming you have a linux system or some unix style environment (I like gow, or you can snarf the utilities off unixutils) I believe running the file through cut -d , -f2-6 should do the trick - it should, if i recall correctly will do the trick - -d sets the deliminator, and f2-6 prints out the second to 6th character.

cat input.csv | cut -d , -f2-6 > output.csv would do the trick taking input file and kicking out an output file. Its not using notepad, but its fast and really simple.

Solution 5

You should be able to load the CSV into excel and have it treat numbers as text (preventing it from converting to scientific numbers).

Open Excel
Data Tab
From Text
Choose Delimited
Choose Other: ","
For all columns select them in the Data Preview window, and choose Text
Remove your column
Save as CSV

View more solutions

22,003

MikeD

Updated on September 18, 2022

Comments

MikeD over 1 year

I have a large CSV file that I need to remove the first column of data. I cannot open it in Excel because Excel converts some of the values in the columns to scientific numbers.

I am using Notepad++, and I am trying to string the first column from the file EXE,

1,Value1,value2,value3,value4,value5
3445,Value1,value2,value3,value4,value5
12345,Value1,value2,value3,value4,value5
1234,Value1,value2,value3,value4,value5
11,Value1,value2,value3,value4,value5

to look like

Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5

MikeD almost 12 years

Thanks, I just clicked on the link and I got a 403 error?
Thalys almost 12 years

both links work for me - which is wierd. I usually find gow by googling for it - its on a github repo belonging to bmatzelle. Cygwin might also be an option, but its an overkill for this sorta thing
speakr almost 12 years

Why not just doing a single global replace with :%s/^[^,]\+,//g?
kenorb almost 12 years

You could as well, this one is easy to use and to understand rather than regex:) Usually I'm always confused which character I've to escape, so I'm ending in typing the same regex many times.
simbabque almost 12 years

Editing and saving CSV files in Excel often breaks numbers like EAN codes and US-style floats in European Excel. Even if you set up everything when importing, it happens to eat up some things. I cannot recommend it, though it would probably work. In a productive environment, I'll advice against it.
simbabque almost 12 years

This is the way to go here. If the OP already has N++ this is the quickest way. I do this a lot with PSPad (which could do this in one go, btw). Also check out how the regex works: rubular.com/r/OiehkBT0vA
Dennis almost 12 years

Notepad++ doesn't process the input line by line, but character by character. That has some neat advantages (like multi-line patterns).
speakr almost 12 years

Just got the same idea after noticing that ^[^,]+, globally replaced with an empty string won't work in Notepad++. (+1)
MikeD almost 12 years

This worked great! Thnak you
Dennis almost 12 years

+1 for the edit. Sadly, your answer is community wiki now.
speakr almost 12 years

@Dennis Yes, I edited too often since I wasn't aware of the 10-edits limit.
SeanC almost 12 years

for arbitrary number of columns, use this: for /F "tokens=1* delims=," %i in (Input.csv) do @echo %j >> output.csv
nerkn almost 12 years

Why not ^[^,]+, and replace with empty?
nerkn almost 12 years

@speakr: oh yeah, right. Edited my comment to reflect that. And now I see that this is already covered by your answer. Sorry for the noise!
James Wood almost 12 years

@simbabque I would say that's slighty unfair, I have used it successfully in production environments for large datasets which required manipulation - admittedly at times it was a nightmare. Excel does have a habit of altering data in unexpected ways, but I wouldn't say this risk was especially greater than other approaches.
simbabque almost 12 years

I use it on occasion as well, but most of these times I don't like to do it. It's often a lot faster to use a text editor that supports regex search & replace if one knows how to handle it. No offense, though, as your answer was clear and concise.
James Wood almost 12 years

o i wasnt taking offence :D
Nick almost 12 years

A tweak to the suggested regex from knittl would work. If it was ^.+?, - the ? makes any quantifier "non-greedy," so that it it matches as few characters as possible. It works in the Notepad++ parser. This is not a better way, just a different way that's in line with the other suggestion.
Nick almost 12 years

oops, somehow that didn't process originally - my first test case was incorrectly putting spaces at the beginning of the line, which seems to make Npp evauluate ^.+?, differently in a replace all. Sorry about that.