regex replace text in xml file within node from the command line
5,785
A simple solution for simple cases - see my comment:
echo "<g:gtin>31806831001</g:gtin>" | sed 's|<g:gtin>.*</g:gtin>|<g:gtin></g:gtin>|'
Result:
<g:gtin></g:gtin>
It depends on the assumption that start and endtag are on the same line, and not more than one tag is on that line.
Since xml files are often generated the same way, over and over again, the assumption might hold.
Related videos on Youtube
Author by
crmpicco
Senior Analyst Developer (MoodleCloud) at Moodle, AWS Solutions Architect (Associate), Zend Certified Engineer and Google Analytics Qualified Individual
Updated on September 18, 2022Comments
-
crmpicco over 1 year
I have an XML file and I would like to replace everything that is between the open and closing tag within multiple instances of the g:gtin node with nothing.
Is this possible from the command line, using sed or something similar?
<g:gtin>31806831001</g:gtin>
-
user unknown about 12 yearsIs the whole tag always in a single line? Is there always only one such tag per line? For xml, which often spans across multiple lines, xmlstarlet is often a better alternative.
-
Gilles 'SO- stop being evil' about 12 years
-
-
crmpicco about 12 yearsDon't I have to pass the filename into sed though? Say i'm in a dir called /tests and I have a file called feed.xml, surely I need to say replace all instances of
<g:gtin>(.*?)</g:gtin>
with<g:gtin></g:gtin>
or something to that effect? -
user unknown about 12 yearsYes. You either pipe the output of a command through sed, or specify a file to work on.
sed 'sedcommands' feed.xml
would be your option. If you have multiple sed commands, you can put them into a file too:sed -f commands.sed feed.xml
. There are many options. An -i flag:sed -i 'commands' feed.xml
would change your file in place. -
user unknown about 12 years@crmpicco: The question mark behind .* is superfluous. .* means 0 or 1 or many; making this optional is meaningless.
-
jw013 about 12 yearsIt could be the case that @crmpicco is attempting to use lazy quantifiers, which are not available in the portable POSIX
sed
specification.sed
and regex are not the correct or most robust way to be handling XML anyways.