When I encode/decode SMS PDU (GSM 7 Bit) user data, do I need prepend the UDH first?

12,565

Solution 1

no you don't include the UDH part when encoding, but you if read the GSM phase 2 specification on page 57, they mention this fact : "If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header". When you include a UDH part this could not be the case, so all you need to do is calculate the offset (= number of fill bits)

Calculating the offset, this code assumes that UDHPart is a AnsiString:

Len := Length(UDHPart) shr 1;
Offset := 7 - ((Len * 8) mod 7);  // fill bits

now when encoding the 7bit data, you proceed as normal but at the end, you shift the data Offset bits to the left, this code has the encoded data in variable result (ansistring):

 // fill bits
 if Offset > 0 then
  begin
   v := Result;
   Len := Length(v);
   BytesRemain := ceil(((Len * 7)+Offset) / 8);       
   Result := StringOfChar(#0, BytesRemain);
   for InPos := 1 to BytesRemain do
    begin
     if InPos = 1 then
      Byte(Result[InPos]) := Byte(v[InPos]) shl offset
     else
      Byte(Result[InPos]) := (Byte(v[InPos]) shl offset) or (Byte(v[InPos-1]) shr (8 - offset));
    end;
  end;

Decoding is same thing really, you first shift the 7 bit data offset bits to the right before decoding...

I hope this will set you onto the right track...

Solution 2

In your case Data is D06536FB0DBABFE56C32

Get first char is D0 => h (in first 7 bit, the 8th bit not use)

The rest is 6536FB0DBABFE56C32

In bin

(01100101)0011011011111011000011011011101010111111111001010110110000110010

Shift right to left. => each right 7 bit is a char!

001100100110110011100101101111111011101000001101111 1101100 110110(0 1100101)

I shift 7 to left. you can get string from above. but i do for easy show :D

(1100101)(1101100)(1101100)(1101111)(0100000)(1110111)(1101111)(1110010)(1101100)(1100100)00

And the string is "ello world"

combine with first char you get "hello world"

Share:
12,565
Doug
Author by

Doug

Updated on July 10, 2022

Comments

  • Doug
    Doug almost 2 years

    While I can successfully encode and decode the user data part of an SMS message when a UDH is not present, I'm having trouble doing so when a UDH is present (in this case, for concatenated SMS).

    When I decode or encode the user data, do I need to prepend the UDH to the text before doing so?

    This article provides an encoding routine sample that compensates for the UDH with padding bits (which I still don't completely understand) but it doesn't give an example of data being passed to the routine so I don't have a clear use case (and I could not find a decoding sample on the site): http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/.

    So far, I have been able to get some results if I prepend the UDH to the user data before decoding it, but I suspect this is just a coincidence.

    As an example (using values from https://en.wikipedia.org/wiki/Concatenated_SMS):

    UDH := '050003000302';
    ENCODED_USER_DATA_PART := 'D06536FB0DBABFE56C32'; // with padding, evidently
    DecodedUserData := Decode7Bit(UDH + ENCODED_USER_DATA_PART);
    Writeln(DecodedUserData);
    

    Output: "ß@ø¿Æ @hello world"

    EncodedUserData := Encode7Bit(DecodedUserData);
    DecodedUserData := Decode7Bit(EncodedEncodedUserData);
    Writeln(DecodedUserData);
    

    Same Output: "ß@ø¿Æ @hello world"

    Without prepending the UDH I get garbage:

    DecodedUserData := Decode7Bit(ENCODED_USER_DATA_PART);
    Writeln(DecodedUserData);
    

    Output: "PKYY§An§eYI"

    What is correct way of handling this?

    Am I supposed to include the UDH with the text when encoding the user data?

    Am I supposed to strip off the garbage characters after decoding, or am I (as I suspect) completely off base with this assumption?

    While the decoding algorithm here seems to work without a UDH it doesn't seem to take any UDH information into account: Looking for GSM 7bit encode/decode algorithm.

    I would be eternally grateful if someone could set me straight on the correct way to proceed. Any clear examples/code samples would be very much appreciated. ;-)

    I will also provide a small sample application that includes the algorithms if anyone feels it will help solve the riddle.

    EDIT 1:

    I'm using Delphi XE2 Update 4 Hotfix 1

    EDIT 2:

    Thanks to help from @whosrdaddy, I was able to successfully get my encoding/decoding routines to work.

    As a side note, I was curious as to why the user data needed to be on a 7-bit boundary when the UDH wasn't encoded with it, but the last sentence in the paragraph from the ETSI specification quoted by @whosrdaddy answered that:

    If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header. This is to ensure that the SM itself starts on an octet boundary so that an earlier phase mobile will be capable of displaying the SM itself although the TP-UD Header in the TP-UD field may not be understood

    My code is based in part on examples from the following resources:

    Looking for GSM 7bit encode/decode algorithm

    https://en.wikipedia.org/wiki/Concatenated_SMS

    http://mobiletidings.com/2009/02/18/combining-sms-messages/

    http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/

    http://mobileforensics.files.wordpress.com/2007/06/understanding_sms.pdf

    http://www.dreamfabric.com/sms/

    http://www.mediaburst.co.uk/blog/concatenated-sms/

    Here's the code for anyone else who's had trouble with SMS encoding/decoding. I'm sure it can be simplified/optimized (and comments are welcome), but I've tested it with several different permutations and UDH header lengths with success. I hope it helps.

    unit SmsUtils;
    
    interface
    
    uses Windows, Classes, Math;
    
    function Encode7Bit(const AText: string; AUdhLen: Byte;
      out ATextLen: Byte): string;
    
    function Decode7Bit(const APduData: string; AUdhLen: Integer): string;
    
    implementation
    
    var
      g7BitToAsciiTable: array [0 .. 127] of Byte;
      gAsciiTo7BitTable: array [0 .. 255] of Byte;
    
    procedure InitializeTables;
    var
      AsciiValue: Integer;
      i: Integer;
    begin
      // create 7-bit to ascii table
      g7BitToAsciiTable[0] := 64; // @
      g7BitToAsciiTable[1] := 163;
      g7BitToAsciiTable[2] := 36;
      g7BitToAsciiTable[3] := 165;
      g7BitToAsciiTable[4] := 232;
      g7BitToAsciiTable[5] := 223;
      g7BitToAsciiTable[6] := 249;
      g7BitToAsciiTable[7] := 236;
      g7BitToAsciiTable[8] := 242;
      g7BitToAsciiTable[9] := 199;
      g7BitToAsciiTable[10] := 10;
      g7BitToAsciiTable[11] := 216;
      g7BitToAsciiTable[12] := 248;
      g7BitToAsciiTable[13] := 13;
      g7BitToAsciiTable[14] := 197;
      g7BitToAsciiTable[15] := 229;
      g7BitToAsciiTable[16] := 0;
      g7BitToAsciiTable[17] := 95;
      g7BitToAsciiTable[18] := 0;
      g7BitToAsciiTable[19] := 0;
      g7BitToAsciiTable[20] := 0;
      g7BitToAsciiTable[21] := 0;
      g7BitToAsciiTable[22] := 0;
      g7BitToAsciiTable[23] := 0;
      g7BitToAsciiTable[24] := 0;
      g7BitToAsciiTable[25] := 0;
      g7BitToAsciiTable[26] := 0;
      g7BitToAsciiTable[27] := 0;
      g7BitToAsciiTable[28] := 198;
      g7BitToAsciiTable[29] := 230;
      g7BitToAsciiTable[30] := 223;
      g7BitToAsciiTable[31] := 201;
      g7BitToAsciiTable[32] := 32;
      g7BitToAsciiTable[33] := 33;
      g7BitToAsciiTable[34] := 34;
      g7BitToAsciiTable[35] := 35;
      g7BitToAsciiTable[36] := 164;
      g7BitToAsciiTable[37] := 37;
      g7BitToAsciiTable[38] := 38;
      g7BitToAsciiTable[39] := 39;
      g7BitToAsciiTable[40] := 40;
      g7BitToAsciiTable[41] := 41;
      g7BitToAsciiTable[42] := 42;
      g7BitToAsciiTable[43] := 43;
      g7BitToAsciiTable[44] := 44;
      g7BitToAsciiTable[45] := 45;
      g7BitToAsciiTable[46] := 46;
      g7BitToAsciiTable[47] := 47;
      g7BitToAsciiTable[48] := 48;
      g7BitToAsciiTable[49] := 49;
      g7BitToAsciiTable[50] := 50;
      g7BitToAsciiTable[51] := 51;
      g7BitToAsciiTable[52] := 52;
      g7BitToAsciiTable[53] := 53;
      g7BitToAsciiTable[54] := 54;
      g7BitToAsciiTable[55] := 55;
      g7BitToAsciiTable[56] := 56;
      g7BitToAsciiTable[57] := 57;
      g7BitToAsciiTable[58] := 58;
      g7BitToAsciiTable[59] := 59;
      g7BitToAsciiTable[60] := 60;
      g7BitToAsciiTable[61] := 61;
      g7BitToAsciiTable[62] := 62;
      g7BitToAsciiTable[63] := 63;
      g7BitToAsciiTable[64] := 161;
      g7BitToAsciiTable[65] := 65;
      g7BitToAsciiTable[66] := 66;
      g7BitToAsciiTable[67] := 67;
      g7BitToAsciiTable[68] := 68;
      g7BitToAsciiTable[69] := 69;
      g7BitToAsciiTable[70] := 70;
      g7BitToAsciiTable[71] := 71;
      g7BitToAsciiTable[72] := 72;
      g7BitToAsciiTable[73] := 73;
      g7BitToAsciiTable[74] := 74;
      g7BitToAsciiTable[75] := 75;
      g7BitToAsciiTable[76] := 76;
      g7BitToAsciiTable[77] := 77;
      g7BitToAsciiTable[78] := 78;
      g7BitToAsciiTable[79] := 79;
      g7BitToAsciiTable[80] := 80;
      g7BitToAsciiTable[81] := 81;
      g7BitToAsciiTable[82] := 82;
      g7BitToAsciiTable[83] := 83;
      g7BitToAsciiTable[84] := 84;
      g7BitToAsciiTable[85] := 85;
      g7BitToAsciiTable[86] := 86;
      g7BitToAsciiTable[87] := 87;
      g7BitToAsciiTable[88] := 88;
      g7BitToAsciiTable[89] := 89;
      g7BitToAsciiTable[90] := 90;
      g7BitToAsciiTable[91] := 196;
      g7BitToAsciiTable[92] := 204;
      g7BitToAsciiTable[93] := 209;
      g7BitToAsciiTable[94] := 220;
      g7BitToAsciiTable[95] := 167;
      g7BitToAsciiTable[96] := 191;
      g7BitToAsciiTable[97] := 97;
      g7BitToAsciiTable[98] := 98;
      g7BitToAsciiTable[99] := 99;
      g7BitToAsciiTable[100] := 100;
      g7BitToAsciiTable[101] := 101;
      g7BitToAsciiTable[102] := 102;
      g7BitToAsciiTable[103] := 103;
      g7BitToAsciiTable[104] := 104;
      g7BitToAsciiTable[105] := 105;
      g7BitToAsciiTable[106] := 106;
      g7BitToAsciiTable[107] := 107;
      g7BitToAsciiTable[108] := 108;
      g7BitToAsciiTable[109] := 109;
      g7BitToAsciiTable[110] := 110;
      g7BitToAsciiTable[111] := 111;
      g7BitToAsciiTable[112] := 112;
      g7BitToAsciiTable[113] := 113;
      g7BitToAsciiTable[114] := 114;
      g7BitToAsciiTable[115] := 115;
      g7BitToAsciiTable[116] := 116;
      g7BitToAsciiTable[117] := 117;
      g7BitToAsciiTable[118] := 118;
      g7BitToAsciiTable[119] := 119;
      g7BitToAsciiTable[120] := 120;
      g7BitToAsciiTable[121] := 121;
      g7BitToAsciiTable[122] := 122;
      g7BitToAsciiTable[123] := 228;
      g7BitToAsciiTable[124] := 246;
      g7BitToAsciiTable[125] := 241;
      g7BitToAsciiTable[126] := 252;
      g7BitToAsciiTable[127] := 224;
    
      // create ascii to 7-bit table
      ZeroMemory(@gAsciiTo7BitTable, SizeOf(gAsciiTo7BitTable));
      for i := 0 to High(g7BitToAsciiTable) do
      begin
        AsciiValue := g7BitToAsciiTable[i];
        gAsciiTo7BitTable[AsciiValue] := i;
      end;
    end;
    
    function ConvertAsciiTo7Bit(const AText: string; AUdhLen: Byte): AnsiString;
    const
      ESC = #27;
      ESCAPED_ASCII_CODES = [#94, #123, #125, #92, #91, #126, #93, #124, #164];
    var
      Septet: Byte;
      Ch: AnsiChar;
      i: Integer;
    begin
      for i := 1 to Length(AText) do
      begin
        Ch := AnsiChar(AText[i]);
        if not(Ch in ESCAPED_ASCII_CODES) then
          Septet := gAsciiTo7BitTable[Byte(Ch)]
        else
        begin
          Result := Result + ESC;
          case (Ch) of
            #12: Septet := 10;
            #94: Septet := 20;
            #123: Septet := 40;
            #125: Septet := 41;
            #92: Septet := 47;
            #91: Septet := 60;
            #126: Septet := 61;
            #93: Septet := 62;
            #124: Septet := 64;
            #164: Septet := 101;
          else Septet := 0;
          end;
        end;
        Result := Result + AnsiChar(Septet);
      end;
    end;
    
    function Convert7BitToAscii(const AText: AnsiString): string;
    const
      ESC = #27;
    var
      TextLen: Integer;
      Ch: Char;
      i: Integer;
    begin
      Result := '';
      TextLen := Length(AText);
      i := 1;
      while (i <= TextLen) do
      begin
        Ch := Char(AText[i]);
        if (Ch <> ESC) then
          Result := Result + Char(g7BitToAsciiTable[Ord(Ch)])
        else
        begin
          Inc(i); // skip ESC
          if (i <= TextLen) then
          begin
            Ch := Char(AText[i]);
            case (Ch) of
              #10: Ch := #12;
              #20: Ch := #94;
              #40: Ch := #123;
              #41: Ch := #125;
              #47: Ch := #92;
              #60: Ch := #91;
              #61: Ch := #126;
              #62: Ch := #93;
              #64: Ch := #124;
              #101: Ch := #164;
            end;
            Result := Result + Ch;
          end;
        end;
        Inc(i);
      end;
    end;
    
    function StrToHex(const AText: AnsiString): AnsiString; overload;
    var
      TextLen: Integer;
    begin
      // set the text buffer size
      TextLen := Length(AText);
      // set the length of the result to double the string length
      SetLength(Result, TextLen * 2);
      // convert the string to hex
      BinToHex(PAnsiChar(AText), PAnsiChar(Result), TextLen);
    end;
    
    function StrToHex(const AText: string): string; overload;
    begin
      Result := string(StrToHex(AnsiString(AText)));
    end;
    
    function HexToStr(const AText: AnsiString): AnsiString; overload;
    var
      ResultLen: Integer;
    begin
      // set the length of the result to half the Text length
      ResultLen := Length(AText) div 2;
      SetLength(Result, ResultLen);
      // convert the hex back into a string
      if (HexToBin(PAnsiChar(AText), PAnsiChar(Result), ResultLen) <> ResultLen) then
        Result := 'Error Converting Hex To String: ' + AText;
    end;
    
    function HexToStr(const AText: string): string; overload;
    begin
      Result := string(HexToStr(AnsiString(AText)));
    end;
    
    function Encode7Bit(const AText: string; AUdhLen: Byte;
      out ATextLen: Byte): string;
    // AText: Ascii text
    // AUdhLen: Length of UDH including UDH Len byte (e.g. '050003CC0101' = 6 bytes)
    // ATextLen: returns length of text that was encoded.  This can be different
    // than Length(AText) due to escape characters
    // Returns text as encoded PDU hex string
    var
      Text7Bit: AnsiString;
      Pdu: AnsiString;
      PduIdx: Integer;
      PduLen: Byte;
      PaddingBits: Byte;
      BitsToMove: Byte;
      Septet: Byte;
      Octet: Byte;
      PrevOctet: Byte;
      ShiftedOctet: Byte;
      i: Integer;
    begin
      Result := '';
      Text7Bit := ConvertAsciiTo7Bit(AText, AUdhLen);
      ATextLen := Length(Text7Bit);
      BitsToMove := 0;
      // determine how many padding bits needed based on the UDH
      if (AUdhLen > 0) then
        PaddingBits := 7 - ((AUdhLen * 8) mod 7)
      else
        PaddingBits := 0;
      // calculate the number of bytes needed to store the 7-bit text
      // along with any padding bits that are required
      PduLen := Ceil(((ATextLen * 7) + PaddingBits) / 8);
      // reserve space for the PDU bytes
      Pdu := AnsiString(StringOfChar(#0, PduLen));
      PduIdx := 1;
      for i := 1 to ATextLen do
      begin
        if (BitsToMove = 7) then
          BitsToMove := 0
        else
        begin
          // convert the current character to a septet (7-bits) and make room for
          // the bits from the next one
          Septet := (Byte(Text7Bit[i]) shr BitsToMove);
          if (i = ATextLen) then
            Octet := Septet
          else
          begin
            // convert the next character to a septet and copy the bits from it
            // to the octet (PDU byte)
            Octet := Septet or
              Byte((Byte(Text7Bit[i + 1]) shl Byte(7 - BitsToMove)));
          end;
          Byte(Pdu[PduIdx]) := Octet;
          Inc(PduIdx);
          Inc(BitsToMove);
        end;
      end;
      // The following code pads the pdu on the *right* by shifting it to the *left*
      // by <PaddingBits>. It does this by using the same bit storage convention as
      // the 7-bit compression routine above, by taking the most significant
      // <PaddingBits> from each PDU byte and moving them to the least significant
      // bits of the next PDU byte. If there is no room in the last PDU byte for the
      // high bits of the previous byte that were removed, then those bits are
      // placed into an additional byte reserved for this purpose.
      // Note: <PduLen> has already been set to account for the reserved byte if
      // it is required.
      if (PaddingBits > 0) then
      begin
        SetLength(Result, (PduLen * 2));
        PrevOctet := 0;
        for PduIdx := 1 to PduLen do
        begin
          Octet := Byte(Pdu[PduIdx]);
          if (PduIdx = 1) then
            ShiftedOctet := Byte(Octet shl PaddingBits)
          else
            ShiftedOctet := Byte(Octet shl PaddingBits) or
              Byte(PrevOctet shr (8 - PaddingBits));
          Byte(Pdu[PduIdx]) := ShiftedOctet;
          PrevOctet := Octet;
        end;
      end;
      Result := string(StrToHex(Pdu));
    end;
    
    function Decode7Bit(const APduData: string; AUdhLen: Integer): string;
    // APduData: Hex string representation of PDU data
    // AUdhLen: Length of UDH including UDH Len (e.g. '050003CC0101' = 6 bytes)
    // Returns decoded Ascii text
    var
      Pdu: AnsiString;
      NumSeptets: Byte;
      Septets: AnsiString;
      PduIdx: Integer;
      PduLen: Integer;
      by: Byte;
      currBy: Byte;
      left: Byte;
      mask: Byte;
      nextBy: Byte;
      Octet: Byte;
      NextOctet: Byte;
      PaddingBits: Byte;
      ShiftedOctet: Byte;
      i: Integer;
    begin
      Result := '';
      PaddingBits := 0;
      // convert hex string to bytes
      Pdu := AnsiString(HexToStr(APduData));
      PduLen := Length(Pdu);
      // The following code removes padding at the end of the PDU by shifting it
      // *right* by <PaddingBits>. It does this by taking the least significant
      // <PaddingBits> from the following PDU byte and moving them to the most
      // significant the current PDU byte.
      if (AUdhLen > 0) then
      begin
        PaddingBits := 7 - ((AUdhLen * 8) mod 7);
        for PduIdx := 1 to PduLen do
        begin
          Octet := Byte(Pdu[PduIdx]);
          if (PduIdx = PduLen) then
            ShiftedOctet := Byte(Octet shr PaddingBits)
          else
          begin
            NextOctet := Byte(Pdu[PduIdx + 1]);
            ShiftedOctet := Byte(Octet shr PaddingBits) or
              Byte(NextOctet shl (8 - PaddingBits));
          end;
          Byte(Pdu[PduIdx]) := ShiftedOctet;
        end;
      end;
      // decode
      // number of septets in PDU after excluding the padding bits
      NumSeptets := ((PduLen * 8) - PaddingBits) div 7;
      Septets := AnsiString(StringOfChar(#0, NumSeptets));
      left := 7;
      mask := $7F;
      nextBy := 0;
      PduIdx := 1;
      for i := 1 to NumSeptets do
      begin
        if mask = 0 then
        begin
          Septets[i] := AnsiChar(nextBy);
          left := 7;
          mask := $7F;
          nextBy := 0;
        end
        else
        begin
          if (PduIdx > PduLen) then
            Break;
          by := Byte(Pdu[PduIdx]);
          Inc(PduIdx);
          currBy := ((by AND mask) SHL (7 - left)) OR nextBy;
          nextBy := (by AND (NOT mask)) SHR left;
          Septets[i] := AnsiChar(currBy);
          mask := mask SHR 1;
          left := left - 1;
        end;
      end; // for
      // remove last character if unused
      // this is kind of a hack, but frankly I don't know how else to compensate
      // for it.
      if (Septets[NumSeptets] = #0) then
        SetLength(Septets, NumSeptets - 1);
      // convert 7-bit alphabet to ascii
      Result := Convert7BitToAscii(Septets);
    end;
    
    initialization
      InitializeTables;
    end.