In bash, how to convert 8 bytes to an unsigned int (64bit LE)?

12,465

Solution 1

Bash is the wrong tool altogether. Shells are good at gluing bits and pieces together; text processing and arithmetic are provided on the side, and data processing isn't in their purview at all.

I'd go for Python over Perl, because Python has bignums right off the bat. Use struct.unpack to unpack the data.

#!/usr/bin/env python
import os, struct, sys
fmt = "<" + "Q" * 8192
header_bytes = sys.stdin.read(65536)
header_ints = list(struct.unpack(fmt, header_bytes))
sys.stdin.seek(-65536, 2)
footer_bytes = sys.stdin.read(65536)
footer_ints = list(struct.unpack(fmt, header_bytes))
# your calculations here

Here's my answer to the original question. The revised question doesn't have much to do with the original, which was about converting one 8-byte sequence into the 64-bit integer it represents in little-endian order.

I don't think bash has any built-in feature for this. The following snippet sets a to a string that is the hexadecimal representation of the number that corresponds to the bytes in the specified string in big endian order.

a=0x$(printf "%s" "$string" |
      od -t x1 -An |
      tr -dc '[:alnum:]')

For little-endian order, reverse the order of the bytes in the original string. In bash, and for a string of known length, you can do

a=0x$(printf "%s" "${string:7:1}${string:6:1}${string:5:1}${string:4:1}${string:3:1}${string:2:1}${string:1:1}${string:0:1}" |
      od -t x1 -An |
      tr -dc '[:alnum:]')

You can also get your platform's prefered endianness if your od supports 8-byte types.

a=0x$(printf "%s" "$string" |
      od -t x8 -An |
      tr -dc '[:alnum:]')

Whether you can do arithmetic on $a will depend on whether your bash supports 8-byte arithmetic. Even if it does, it'll treat it as a signed value.

Alternatively, use Perl:

a=0x$(perl -e 'print unpack "Q<", $ARGV[0]' "$string")

If your perl is compiled without 64-bit integer support, you'll need to break the bytes up.

a=0x$(perl -e 'printf "%x%08x\n", reverse unpack "L<L<", $ARGV[0]' "$string")

(Replace < by > for big-endian or remove it to get the platform endianness.)

Solution 2

Gilles' python method is definitely faster, but I thought I'd just throw in this *bash***+***std-single-purpose-tools* as generall grist to the mill .. It's probably as much about 'bc' as anything else... It has a lot of Initialization stuff, to cater for input files which are less than 64k... The hash is initialized to the file's length, and then each of the 64-bit integers is successively added to it; causing (expected) integer overflow.. bc managed to do the trick...

# This script reads 8196 8-byte blocks (64 KiB) from the head and tail of a file
# Each 8-bytes block is interpreted as an unsigned 64-bit Little-Endian integer.
# The head integers and tail integers ar printed to stdout; one integer per line.
#
# INIT: If the file is smaller than 64k, calculate the number of unsigned ints to read 
# ====
  file="$1"
  flen=($(du -b "$file"))           # file length
  qlen=8                            # ui64 length in bytes
    ((flen<qlen)) && exit 1         # file is too short -- exit 
  bmax=$((64*1024))                 # byte end of read (== byte max to read)
    ((flen<bmax)) && ((bmax=flen))  # reduce byte max to file length
  qmax=$((bmax/qlen))               # ui64 end of read (== ui64 max to read)
    (((qmax*qlen)<bmax)) && ((bmax=(qmax*qlen))) # round down byte max (/8)
  hash=$(echo $flen |xxd -p -u)
# 
# MAIN
# ====
  for skip in 0  $((flen-bmax)) ;do
    hash=$(dd if="$file" bs=1 count=$bmax skip=$skip 2>/dev/null |
             xxd -p -u -c 8 |
             { echo -e " ibase=16 \n obase=10 \n scale=0 \n hash=$hash \n ouint=10000000000000000 "; \
               sed -re "s/(..)(..)(..)(..)(..)(..)(..)(..)/hash=(hash+\8\7\6\5\4\3\2\1)%ouint/"; \
               echo "hash"; } |bc)
  done
  echo $hash
#

# Output:
16A6528E803325FF
Share:
12,465
Peter.O
Author by

Peter.O

Updated on September 18, 2022

Comments

  • Peter.O
    Peter.O almost 2 years

    How can I 'read/interpret' 8 bytes as an unsigned int (Little Endian)?
    Perhaps there is a Bash-fu magic conversion for this?

    UPDATE:
    It seems that something got cross-wired in the interpretation of my question. Here is a broader example of what I am trying to do.

    I want to read the first (and last) 64k of a file. Each 8-byte word is to be interpreted as a 64-bit Little-Endian unsigned integer. These integers are to be used in a hashing computation which uniquely identifies the file. So there are a lot of calculations to make, ∴ speed is preferred, but not critical. (Why am I doing it? Because smplayer hashes the names of its played-media .ini files, and I want to access, and modify these files, so I am mimicking the smplayer's C++ code in Bash.)

    A solution which caters to accepting a piped input would be optimal, and is probably essential because of the way Bash variables can't handle \x00..

    I realize that something like this is probably better suited to the likes of Python, Perl, and C/C++, but I don't know Python and Perl, and although I could do it in C++, it's been years since I've used it and I'm trying to focus on Bash.

    Short Perl and Python snippets are good. Bash is preferred (but not at the sacrifice of speed).

    • Admin
      Admin about 13 years
      How do these bytes look like? 4 125 -19 0 can be seen as 4 bytes, "\t.-X" could be seen as 4 bytes (I choose 4 because it is shorter), but how about 319 and "ö§«¢"? Or is it the first 8 bytes in an arbitrary file?
    • Admin
      Admin about 13 years
      '4 125 -19 0' is not 4 bytes. It may be the decimal integer representation of 4 bytes which are being interpreted on its binary integer value, but it utilizes 8 bytes (excluding spaces).. '\t.-X' is a repretation of 4 bytes in the ASCII/'C' style; it is utilizing 5 bytes.. ö§«¢ are not bytes; they are nominally Unicode characters, which utilize 8 bytes when encoded in UTF-8 and UTF-32, and 16 bytes when encoded in UTF-32.. As often encountered in man pages, I am referring to 8-bit octets; just a plain ordinary 8-bit Byte.. I want to interpret 8 of them as a 64 bit unsigned Big-Endian int
    • Admin
      Admin about 13 years
      No, '4 125 -19 0' is a valid, possible representation or 4 bytes. The pixels on the screen utilize much more than 8 bytes. "ö§«¢" are of course bytes, since every digital information can be expressed in bytes - just as every distance can be represented as meters and as inches. My question is, how do you get the bytes, how are they represented? An 8-bit octet would be 00101101 for example.
    • Admin
      Admin about 13 years
      My input bytes aren't "represented" .. I am speaking of a byte being a byte... nothing more, nothing less... ie. raw data ... I then wish to interpret 8 of those raw bytes as an unsigned long long, ie. a 64bit unsigned int... I won't be visually representing these integers at all.. I will use them in a hash calculation. ... I am interested in the binary values of each byte... Come to think of it, that is the "representation"... I want to work with the intrinsic base 2 value of 8 bits, ie. I want to not treat the byte in any special way.
    • Admin
      Admin about 13 years
      A byte is a byte and a rose is a rose, but bash has no type system, has it? Well - I know strings and integer numbers and booleans in bash, but there is no byte type, so I have to take something else - strings, numbers, booleans, arrays, files ... Can you just show how you get these bytes in your script? Or are they directly encoded (raw?) in the script?
    • Admin
      Admin over 8 years
  • Peter.O
    Peter.O about 13 years
    @Gille;...I'm still looking at your options (speedwise; I'm processing 64k), but on a side note, I'm pretty sure that Little-Endian only has the lowest order byte moved up into the high-order postion... It isn't a full reversal for larger word sizes... it was a full reversal when words were 16 bits :)
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 13 years
    @fred: Ah, you didn't say you wanted speed. Perl is going to win hands down unless you massage that od output better. Which of the little-endian versions doesn't work? I get the output I expect.
  • Peter.O
    Peter.O about 13 years
    @Gilles: oops! sorry... You are right about the byte order; it is full reversal... (I've known(?) for a long time, that it was only the low-order byte, and it's a good thing I've never needed to use my knowledge before today :) .... re perl, I get this error while using "12345678" as the string..
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 13 years
    @fred: What error? What command did you run, what was the output, what did you expect it to be?
  • Peter.O
    Peter.O about 13 years
    @Gilles: The error message: Invalid type 'Q' in unpack at -e line 1. ... The command (using a string of 8 random bytes): perl -e 'print unpack "Q<", $ARGV[0]' "12345678" ... The output: only the error message ... perl vereion: v5.10.1
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 13 years
    @fred: Your perl doesn't support 64-bit integers. See my edit.
  • Peter.O
    Peter.O about 13 years
    @Gilles. that 32bit version works; thanks...It gives me the hex representatinof the integer (which is very useful in its own right). How can perl output the actual integer? ... but I think bash by default only handles signed ints .... maybe ther is a shopt for int handling.\
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 13 years
    @fred: I don't understand. The output of a program is a string, not an integer, so all you'll get is some string representation of that integer.
  • Peter.O
    Peter.O about 13 years
    @Gilles: it's somwhat academic now that you have posted the python example, but here is a test using your original exaples: paste.ubuntu.com/607289 ... Re the python example, it dumps the list of numbers about 5 times faster than the bash pipe dd|xxd|tr|sed|while..eval $((16#0001020304050607)), which as you mentioned is hamstrung by signed integer arithmetic... Thanks for the python... btw. I've noticed that <"Glider.flv" ztest.py works fine, but cat "Glider.flv" |ztest.py errors on the seek with this message, IOError: [Errno 29] Illegal seek .. Any idea why?
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 13 years
    @fred: My code calls seek to get the last N bytes, and you can't do that on pipe: you have to keep a sliding window of at least N bytes and read from beginning to end. If you have trouble with that, ask on SO.