How to decode base64 text in xml file in Linux?
Solution 1
I'll say what I always do. Please NEVER use regular expressions to parse XML. It's bad news. XML has some various formatting which means semantically identical XML will match or not match certain regular expressions. Simple things like line wrapping, unary tags, etc.
This means you create brittle code, which one day might mysteriously break because of an upstream and perfectly valid change to your data flow.
For parsing your XML I would suggest using perl
and the quite excellent XML::Twig
module.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
use MIME::Base64;
#we take a "value" element, check it for an "encoding=base64" and if it is
#we rewrite the content and delete that attribute in the XML.
sub decode_value {
my ( $twig, $value ) = @_;
if ( $value->att('encoding')
and $value->att('encoding') eq "base64" )
{
my $decoded_text = decode_base64( $value->text );
if ( $decoded_text =~ m/[^\s\d\w\=\-\,\.]/ ) {
$decoded_text = "decoded";
}
$value->set_text($decoded_text);
$value -> del_att('encoding');
}
}
#twig handlers 'fires' a piece of code each time you hit a 'value' element.
#it passes this piece of code that chunk of XML to handle, which means
#you can do things like dynamic XML rewrites
#pretty print controls output XML rendering - there's a variety of options
#check the manpage.
my $twig = XML::Twig->new(
pretty_print => "indented",
twig_handlers => { 'value' => \&decode_value, }
);
$twig->parsefile('your_xml_file');
$twig->print;
This will give:
<directory-entries>
<entry dn="ads">
<attr name="memberof">
<value>CN=VPN-employee</value>
<value encoding="base64">hello world</value>
<value encoding="base64">decoded</value>
<value encoding="base64">decoded</value>
</attr>
</entry>
</directory-entries>
You could alternatively transform $decoded_text
like this:
$decoded_text =~ s/[^\s\d\w=,-. ]+/_/g;
(URI::Escape
module is worth a look here too, as it 'percent encodes' text URL style. )
Which would give instead:
<value encoding="base64">CN=Floppy - _ _,OU=Device Control,OU=Groups,OU=_,DC=hq,DC=bc</value>
<value encoding="base64">CN=USB-_ - _ _,OU=Device Control,OU=Groups,OU=_,DC=hq,DC=bc</value>
But you might also find using Net::LDAP
does what you need.
#!/usr/bin/perl
use strict;
use warnings;
use Net::LDAP;
my $ldap = Net::LDAP->new('host');
my $result = $ldap->bind(
'CN=informatica,OU=Accounts for System Purposes,OU=System Accounts,DC=hq,DC=bc',
'password'
);
if ( $result->code ) { die "Error connecting to LDAP server"; }
my $ldap_search = $ldap->search(
base => 'DC=hq,DC=bc',
scope => 'subtree',
filter => '(&(objectClass=organizationalPerson)(CN=*))',
attrs => [ 'employeeID', 'memberOf' ],
);
foreach my $entry ( $ldap_search->entries ) {
print "dn:\t", $entry->dn(), "\n";
foreach my $attr ( $entry->attributes ) {
print "$attr:";
foreach my $value ( $entry->get_value($attr) ) {
next unless defined $value;
if ( $value =~ m/[^\s\d\w,-=+@\'.()]/ ) { $value = "binary_data" }
chomp($value);
print "\t$value\n";
}
}
}
Solution 2
Compact Script
Assuming the xml is in file.xml
, just do:
sed -r 's/("base64">)([[:graph:]]+)/\1'"`grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d`"'/g' file.xml
This is a compact regex, which will do the task. Let me break it down and explain.
Break Down
First I select the base64 string using grep and decode it:
grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d
I could save this in a variable:
baseString=`grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d`
Then use sed
to replace the base64 with the decoded string saved in the variable:
sed -r 's/("base64">)([[:graph:]]+)/\1'"$baseString"'/g' file.xml
Related videos on Youtube
Meruyert
Updated on September 18, 2022Comments
-
Meruyert over 1 year
I'm new in linux (shell). I need to decode base64 text in xml file using linux shell script. Could you please help me to write linux shell script for decoding the values of those tags where attribute is encoding="base64" the structure of my file is
<directory-entries> <entry dn="ads"> <attr name="memberof"> <value>CN=VPN-employee</value> <value encoding="base64">aGVsbG8gd29ybGQ= </value> <value encoding="base64"> Q049RmxvcHB5IC0g0LTQvtGB0YLRg9C/INC30LDQutGA0YvRgixPVT1EZXZpY2UgQ29udHJv bCxPVT1Hcm91cHMsT1U90JHQkNCd0JosREM9aHEsREM9YmM= </value> <value encoding="base64"> Q049VVNCLdC00LjRgdC60LggLSDRgtC+0LvRjNC60L4g0YfRgtC10L3QuNC1LE9VPURldmlj ZSBDb250cm9sLE9VPUdyb3VwcyxPVT3QkdCQ0J3QmixEQz1ocSxEQz1iYw== </value> </attr> </entry> </directory-entries>
The wanted output is
<directory-entries> <entry dn="ads"> <attr name="memberof"> <value>CN=VPN-employee</value> <value encoding="base64">Hello world </value> <value encoding="base64"> decoded </value> <value encoding="base64"> decoded </value> </attr> </entry> </directory-entries>
I'm generating XML from Active Directory using ldapsearch. The script that I used to obtain this file is:
ldapsearch -h host -p 389 -D "CN=informatica,OU=Accounts for System Purposes,OU=System Accounts,DC=hq,DC=bc" -w password -s sub -B -E UTF-8 -X "(&(objectClass=organizationalPerson)(CN=*))" employeeID memberof > ldap_logins.xml
I don't know if it is possible to decode the text while generating the xml file. Thank you in advance!
-
Stephen Kitt about 9 yearsI don't have a complete answer, but a couple of hints. On the
ldapsearch
side, you can use the-t
option to output "non-printable" text to temporary files rather than Base64-encoded values. If you want to parse XML, check out XMLStarlet. Also, does the output need to be valid XML? Shouldn't the "encoded" attribute be dropped from the output? -
Meruyert about 9 yearsThank you for feedback. Yes, the output should be valid XML. I need decoded value, the attribute itself can be dropped from the output
-
shivams about 9 years@Meruyert I've provided a proper answer using an xml parser called
xmlstarlet
. Just check it, if it helps.
-
-
Meruyert about 9 yearsThank you for your answer! The script works for cases where values do not have line breaks. I have line breaks in values. I've updated the structure of the file in the question, added more examples. Do you have any ideas how to deal with those line breaks?
-
shivams about 9 yearsOh! Multi-line regex is very tricky using bash. For such cases, it is better advised to go for some proper xml parser. However, I will provide some solution using regex. Wait.
-
shivams about 9 yearsYes. Using an xml parser is always the only sane option. @Meruyert please use this solution (if it works fine) , rather than going for my regex based solution.
-
shivams about 9 yearsIt is unclear which language you are using. @Sobrique.
-
Sobrique about 9 yearsWow, that's impressive on my part. Amended answer to indicate that I do mean perl here ;)
-
shivams about 9 yearsSorry for my ignorance. But I am really a new kid. Born in the era of Python, rather than Perl. Done a lot of bash but never touched Perl :/ Perhaps, I should be ashamed :|
-
Sobrique about 9 yearsHardly. Perl and Python have very similar use cases. I'm crusty enough to pre-date python, and learned perl back when it was really the only option for extending shell scripting. Still like it though, not least because it remains pretty similar to shell, and very widely supported.
-
elysch almost 6 yearsI know this is old. I want to use the sed command, but it says "test" is not defined. Do you remember how it was defined?
-
shivams almost 6 years@elysch:
test
is not a command here. I used it to denote the file-name. I should have usedfile.xml
instead. I am correcting it. -
elysch almost 6 yearsI tried that but I get an error
sed: -e expression #1, char 297: unknown option to
s'`. Don't know how to find which value is causing problems -
elysch almost 6 yearsAnnother question: How would it know how to "select" each base64 string in the right place? Testing the grep command on its own, it shows all the base64 strings, not just one