When I encode/decode SMS PDU (GSM 7 Bit) user data, do I need prepend the UDH first?
Solution 1
no you don't include the UDH part when encoding, but you if read the GSM phase 2 specification on page 57, they mention this fact : "If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header". When you include a UDH part this could not be the case, so all you need to do is calculate the offset (= number of fill bits)
Calculating the offset, this code assumes that UDHPart is a AnsiString:
Len := Length(UDHPart) shr 1;
Offset := 7 - ((Len * 8) mod 7); // fill bits
now when encoding the 7bit data, you proceed as normal but at the end, you shift the data Offset bits to the left, this code has the encoded data in variable result (ansistring):
// fill bits
if Offset > 0 then
begin
v := Result;
Len := Length(v);
BytesRemain := ceil(((Len * 7)+Offset) / 8);
Result := StringOfChar(#0, BytesRemain);
for InPos := 1 to BytesRemain do
begin
if InPos = 1 then
Byte(Result[InPos]) := Byte(v[InPos]) shl offset
else
Byte(Result[InPos]) := (Byte(v[InPos]) shl offset) or (Byte(v[InPos-1]) shr (8 - offset));
end;
end;
Decoding is same thing really, you first shift the 7 bit data offset bits to the right before decoding...
I hope this will set you onto the right track...
Solution 2
In your case Data is D06536FB0DBABFE56C32
Get first char is D0 => h (in first 7 bit, the 8th bit not use)
The rest is 6536FB0DBABFE56C32
In bin
(01100101)0011011011111011000011011011101010111111111001010110110000110010
Shift right to left. => each right 7 bit is a char!
001100100110110011100101101111111011101000001101111 1101100 110110(0 1100101)
I shift 7 to left. you can get string from above. but i do for easy show :D
(1100101)(1101100)(1101100)(1101111)(0100000)(1110111)(1101111)(1110010)(1101100)(1100100)00
And the string is "ello world"
combine with first char you get "hello world"
Doug
Updated on July 10, 2022Comments
-
Doug almost 2 years
While I can successfully encode and decode the user data part of an SMS message when a UDH is not present, I'm having trouble doing so when a UDH is present (in this case, for concatenated SMS).
When I decode or encode the user data, do I need to prepend the UDH to the text before doing so?
This article provides an encoding routine sample that compensates for the UDH with padding bits (which I still don't completely understand) but it doesn't give an example of data being passed to the routine so I don't have a clear use case (and I could not find a decoding sample on the site): http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/.
So far, I have been able to get some results if I prepend the UDH to the user data before decoding it, but I suspect this is just a coincidence.
As an example (using values from https://en.wikipedia.org/wiki/Concatenated_SMS):
UDH := '050003000302'; ENCODED_USER_DATA_PART := 'D06536FB0DBABFE56C32'; // with padding, evidently DecodedUserData := Decode7Bit(UDH + ENCODED_USER_DATA_PART); Writeln(DecodedUserData);
Output: "ß@ø¿Æ @hello world"
EncodedUserData := Encode7Bit(DecodedUserData); DecodedUserData := Decode7Bit(EncodedEncodedUserData); Writeln(DecodedUserData);
Same Output: "ß@ø¿Æ @hello world"
Without prepending the UDH I get garbage:
DecodedUserData := Decode7Bit(ENCODED_USER_DATA_PART); Writeln(DecodedUserData);
Output: "PKYY§An§eYI"
What is correct way of handling this?
Am I supposed to include the UDH with the text when encoding the user data?
Am I supposed to strip off the garbage characters after decoding, or am I (as I suspect) completely off base with this assumption?
While the decoding algorithm here seems to work without a UDH it doesn't seem to take any UDH information into account: Looking for GSM 7bit encode/decode algorithm.
I would be eternally grateful if someone could set me straight on the correct way to proceed. Any clear examples/code samples would be very much appreciated. ;-)
I will also provide a small sample application that includes the algorithms if anyone feels it will help solve the riddle.
EDIT 1:
I'm using Delphi XE2 Update 4 Hotfix 1
EDIT 2:
Thanks to help from @whosrdaddy, I was able to successfully get my encoding/decoding routines to work.
As a side note, I was curious as to why the user data needed to be on a 7-bit boundary when the UDH wasn't encoded with it, but the last sentence in the paragraph from the ETSI specification quoted by @whosrdaddy answered that:
If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header. This is to ensure that the SM itself starts on an octet boundary so that an earlier phase mobile will be capable of displaying the SM itself although the TP-UD Header in the TP-UD field may not be understood
My code is based in part on examples from the following resources:
Looking for GSM 7bit encode/decode algorithm
https://en.wikipedia.org/wiki/Concatenated_SMS
http://mobiletidings.com/2009/02/18/combining-sms-messages/
http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/
http://mobileforensics.files.wordpress.com/2007/06/understanding_sms.pdf
http://www.dreamfabric.com/sms/
http://www.mediaburst.co.uk/blog/concatenated-sms/
Here's the code for anyone else who's had trouble with SMS encoding/decoding. I'm sure it can be simplified/optimized (and comments are welcome), but I've tested it with several different permutations and UDH header lengths with success. I hope it helps.
unit SmsUtils; interface uses Windows, Classes, Math; function Encode7Bit(const AText: string; AUdhLen: Byte; out ATextLen: Byte): string; function Decode7Bit(const APduData: string; AUdhLen: Integer): string; implementation var g7BitToAsciiTable: array [0 .. 127] of Byte; gAsciiTo7BitTable: array [0 .. 255] of Byte; procedure InitializeTables; var AsciiValue: Integer; i: Integer; begin // create 7-bit to ascii table g7BitToAsciiTable[0] := 64; // @ g7BitToAsciiTable[1] := 163; g7BitToAsciiTable[2] := 36; g7BitToAsciiTable[3] := 165; g7BitToAsciiTable[4] := 232; g7BitToAsciiTable[5] := 223; g7BitToAsciiTable[6] := 249; g7BitToAsciiTable[7] := 236; g7BitToAsciiTable[8] := 242; g7BitToAsciiTable[9] := 199; g7BitToAsciiTable[10] := 10; g7BitToAsciiTable[11] := 216; g7BitToAsciiTable[12] := 248; g7BitToAsciiTable[13] := 13; g7BitToAsciiTable[14] := 197; g7BitToAsciiTable[15] := 229; g7BitToAsciiTable[16] := 0; g7BitToAsciiTable[17] := 95; g7BitToAsciiTable[18] := 0; g7BitToAsciiTable[19] := 0; g7BitToAsciiTable[20] := 0; g7BitToAsciiTable[21] := 0; g7BitToAsciiTable[22] := 0; g7BitToAsciiTable[23] := 0; g7BitToAsciiTable[24] := 0; g7BitToAsciiTable[25] := 0; g7BitToAsciiTable[26] := 0; g7BitToAsciiTable[27] := 0; g7BitToAsciiTable[28] := 198; g7BitToAsciiTable[29] := 230; g7BitToAsciiTable[30] := 223; g7BitToAsciiTable[31] := 201; g7BitToAsciiTable[32] := 32; g7BitToAsciiTable[33] := 33; g7BitToAsciiTable[34] := 34; g7BitToAsciiTable[35] := 35; g7BitToAsciiTable[36] := 164; g7BitToAsciiTable[37] := 37; g7BitToAsciiTable[38] := 38; g7BitToAsciiTable[39] := 39; g7BitToAsciiTable[40] := 40; g7BitToAsciiTable[41] := 41; g7BitToAsciiTable[42] := 42; g7BitToAsciiTable[43] := 43; g7BitToAsciiTable[44] := 44; g7BitToAsciiTable[45] := 45; g7BitToAsciiTable[46] := 46; g7BitToAsciiTable[47] := 47; g7BitToAsciiTable[48] := 48; g7BitToAsciiTable[49] := 49; g7BitToAsciiTable[50] := 50; g7BitToAsciiTable[51] := 51; g7BitToAsciiTable[52] := 52; g7BitToAsciiTable[53] := 53; g7BitToAsciiTable[54] := 54; g7BitToAsciiTable[55] := 55; g7BitToAsciiTable[56] := 56; g7BitToAsciiTable[57] := 57; g7BitToAsciiTable[58] := 58; g7BitToAsciiTable[59] := 59; g7BitToAsciiTable[60] := 60; g7BitToAsciiTable[61] := 61; g7BitToAsciiTable[62] := 62; g7BitToAsciiTable[63] := 63; g7BitToAsciiTable[64] := 161; g7BitToAsciiTable[65] := 65; g7BitToAsciiTable[66] := 66; g7BitToAsciiTable[67] := 67; g7BitToAsciiTable[68] := 68; g7BitToAsciiTable[69] := 69; g7BitToAsciiTable[70] := 70; g7BitToAsciiTable[71] := 71; g7BitToAsciiTable[72] := 72; g7BitToAsciiTable[73] := 73; g7BitToAsciiTable[74] := 74; g7BitToAsciiTable[75] := 75; g7BitToAsciiTable[76] := 76; g7BitToAsciiTable[77] := 77; g7BitToAsciiTable[78] := 78; g7BitToAsciiTable[79] := 79; g7BitToAsciiTable[80] := 80; g7BitToAsciiTable[81] := 81; g7BitToAsciiTable[82] := 82; g7BitToAsciiTable[83] := 83; g7BitToAsciiTable[84] := 84; g7BitToAsciiTable[85] := 85; g7BitToAsciiTable[86] := 86; g7BitToAsciiTable[87] := 87; g7BitToAsciiTable[88] := 88; g7BitToAsciiTable[89] := 89; g7BitToAsciiTable[90] := 90; g7BitToAsciiTable[91] := 196; g7BitToAsciiTable[92] := 204; g7BitToAsciiTable[93] := 209; g7BitToAsciiTable[94] := 220; g7BitToAsciiTable[95] := 167; g7BitToAsciiTable[96] := 191; g7BitToAsciiTable[97] := 97; g7BitToAsciiTable[98] := 98; g7BitToAsciiTable[99] := 99; g7BitToAsciiTable[100] := 100; g7BitToAsciiTable[101] := 101; g7BitToAsciiTable[102] := 102; g7BitToAsciiTable[103] := 103; g7BitToAsciiTable[104] := 104; g7BitToAsciiTable[105] := 105; g7BitToAsciiTable[106] := 106; g7BitToAsciiTable[107] := 107; g7BitToAsciiTable[108] := 108; g7BitToAsciiTable[109] := 109; g7BitToAsciiTable[110] := 110; g7BitToAsciiTable[111] := 111; g7BitToAsciiTable[112] := 112; g7BitToAsciiTable[113] := 113; g7BitToAsciiTable[114] := 114; g7BitToAsciiTable[115] := 115; g7BitToAsciiTable[116] := 116; g7BitToAsciiTable[117] := 117; g7BitToAsciiTable[118] := 118; g7BitToAsciiTable[119] := 119; g7BitToAsciiTable[120] := 120; g7BitToAsciiTable[121] := 121; g7BitToAsciiTable[122] := 122; g7BitToAsciiTable[123] := 228; g7BitToAsciiTable[124] := 246; g7BitToAsciiTable[125] := 241; g7BitToAsciiTable[126] := 252; g7BitToAsciiTable[127] := 224; // create ascii to 7-bit table ZeroMemory(@gAsciiTo7BitTable, SizeOf(gAsciiTo7BitTable)); for i := 0 to High(g7BitToAsciiTable) do begin AsciiValue := g7BitToAsciiTable[i]; gAsciiTo7BitTable[AsciiValue] := i; end; end; function ConvertAsciiTo7Bit(const AText: string; AUdhLen: Byte): AnsiString; const ESC = #27; ESCAPED_ASCII_CODES = [#94, #123, #125, #92, #91, #126, #93, #124, #164]; var Septet: Byte; Ch: AnsiChar; i: Integer; begin for i := 1 to Length(AText) do begin Ch := AnsiChar(AText[i]); if not(Ch in ESCAPED_ASCII_CODES) then Septet := gAsciiTo7BitTable[Byte(Ch)] else begin Result := Result + ESC; case (Ch) of #12: Septet := 10; #94: Septet := 20; #123: Septet := 40; #125: Septet := 41; #92: Septet := 47; #91: Septet := 60; #126: Septet := 61; #93: Septet := 62; #124: Septet := 64; #164: Septet := 101; else Septet := 0; end; end; Result := Result + AnsiChar(Septet); end; end; function Convert7BitToAscii(const AText: AnsiString): string; const ESC = #27; var TextLen: Integer; Ch: Char; i: Integer; begin Result := ''; TextLen := Length(AText); i := 1; while (i <= TextLen) do begin Ch := Char(AText[i]); if (Ch <> ESC) then Result := Result + Char(g7BitToAsciiTable[Ord(Ch)]) else begin Inc(i); // skip ESC if (i <= TextLen) then begin Ch := Char(AText[i]); case (Ch) of #10: Ch := #12; #20: Ch := #94; #40: Ch := #123; #41: Ch := #125; #47: Ch := #92; #60: Ch := #91; #61: Ch := #126; #62: Ch := #93; #64: Ch := #124; #101: Ch := #164; end; Result := Result + Ch; end; end; Inc(i); end; end; function StrToHex(const AText: AnsiString): AnsiString; overload; var TextLen: Integer; begin // set the text buffer size TextLen := Length(AText); // set the length of the result to double the string length SetLength(Result, TextLen * 2); // convert the string to hex BinToHex(PAnsiChar(AText), PAnsiChar(Result), TextLen); end; function StrToHex(const AText: string): string; overload; begin Result := string(StrToHex(AnsiString(AText))); end; function HexToStr(const AText: AnsiString): AnsiString; overload; var ResultLen: Integer; begin // set the length of the result to half the Text length ResultLen := Length(AText) div 2; SetLength(Result, ResultLen); // convert the hex back into a string if (HexToBin(PAnsiChar(AText), PAnsiChar(Result), ResultLen) <> ResultLen) then Result := 'Error Converting Hex To String: ' + AText; end; function HexToStr(const AText: string): string; overload; begin Result := string(HexToStr(AnsiString(AText))); end; function Encode7Bit(const AText: string; AUdhLen: Byte; out ATextLen: Byte): string; // AText: Ascii text // AUdhLen: Length of UDH including UDH Len byte (e.g. '050003CC0101' = 6 bytes) // ATextLen: returns length of text that was encoded. This can be different // than Length(AText) due to escape characters // Returns text as encoded PDU hex string var Text7Bit: AnsiString; Pdu: AnsiString; PduIdx: Integer; PduLen: Byte; PaddingBits: Byte; BitsToMove: Byte; Septet: Byte; Octet: Byte; PrevOctet: Byte; ShiftedOctet: Byte; i: Integer; begin Result := ''; Text7Bit := ConvertAsciiTo7Bit(AText, AUdhLen); ATextLen := Length(Text7Bit); BitsToMove := 0; // determine how many padding bits needed based on the UDH if (AUdhLen > 0) then PaddingBits := 7 - ((AUdhLen * 8) mod 7) else PaddingBits := 0; // calculate the number of bytes needed to store the 7-bit text // along with any padding bits that are required PduLen := Ceil(((ATextLen * 7) + PaddingBits) / 8); // reserve space for the PDU bytes Pdu := AnsiString(StringOfChar(#0, PduLen)); PduIdx := 1; for i := 1 to ATextLen do begin if (BitsToMove = 7) then BitsToMove := 0 else begin // convert the current character to a septet (7-bits) and make room for // the bits from the next one Septet := (Byte(Text7Bit[i]) shr BitsToMove); if (i = ATextLen) then Octet := Septet else begin // convert the next character to a septet and copy the bits from it // to the octet (PDU byte) Octet := Septet or Byte((Byte(Text7Bit[i + 1]) shl Byte(7 - BitsToMove))); end; Byte(Pdu[PduIdx]) := Octet; Inc(PduIdx); Inc(BitsToMove); end; end; // The following code pads the pdu on the *right* by shifting it to the *left* // by <PaddingBits>. It does this by using the same bit storage convention as // the 7-bit compression routine above, by taking the most significant // <PaddingBits> from each PDU byte and moving them to the least significant // bits of the next PDU byte. If there is no room in the last PDU byte for the // high bits of the previous byte that were removed, then those bits are // placed into an additional byte reserved for this purpose. // Note: <PduLen> has already been set to account for the reserved byte if // it is required. if (PaddingBits > 0) then begin SetLength(Result, (PduLen * 2)); PrevOctet := 0; for PduIdx := 1 to PduLen do begin Octet := Byte(Pdu[PduIdx]); if (PduIdx = 1) then ShiftedOctet := Byte(Octet shl PaddingBits) else ShiftedOctet := Byte(Octet shl PaddingBits) or Byte(PrevOctet shr (8 - PaddingBits)); Byte(Pdu[PduIdx]) := ShiftedOctet; PrevOctet := Octet; end; end; Result := string(StrToHex(Pdu)); end; function Decode7Bit(const APduData: string; AUdhLen: Integer): string; // APduData: Hex string representation of PDU data // AUdhLen: Length of UDH including UDH Len (e.g. '050003CC0101' = 6 bytes) // Returns decoded Ascii text var Pdu: AnsiString; NumSeptets: Byte; Septets: AnsiString; PduIdx: Integer; PduLen: Integer; by: Byte; currBy: Byte; left: Byte; mask: Byte; nextBy: Byte; Octet: Byte; NextOctet: Byte; PaddingBits: Byte; ShiftedOctet: Byte; i: Integer; begin Result := ''; PaddingBits := 0; // convert hex string to bytes Pdu := AnsiString(HexToStr(APduData)); PduLen := Length(Pdu); // The following code removes padding at the end of the PDU by shifting it // *right* by <PaddingBits>. It does this by taking the least significant // <PaddingBits> from the following PDU byte and moving them to the most // significant the current PDU byte. if (AUdhLen > 0) then begin PaddingBits := 7 - ((AUdhLen * 8) mod 7); for PduIdx := 1 to PduLen do begin Octet := Byte(Pdu[PduIdx]); if (PduIdx = PduLen) then ShiftedOctet := Byte(Octet shr PaddingBits) else begin NextOctet := Byte(Pdu[PduIdx + 1]); ShiftedOctet := Byte(Octet shr PaddingBits) or Byte(NextOctet shl (8 - PaddingBits)); end; Byte(Pdu[PduIdx]) := ShiftedOctet; end; end; // decode // number of septets in PDU after excluding the padding bits NumSeptets := ((PduLen * 8) - PaddingBits) div 7; Septets := AnsiString(StringOfChar(#0, NumSeptets)); left := 7; mask := $7F; nextBy := 0; PduIdx := 1; for i := 1 to NumSeptets do begin if mask = 0 then begin Septets[i] := AnsiChar(nextBy); left := 7; mask := $7F; nextBy := 0; end else begin if (PduIdx > PduLen) then Break; by := Byte(Pdu[PduIdx]); Inc(PduIdx); currBy := ((by AND mask) SHL (7 - left)) OR nextBy; nextBy := (by AND (NOT mask)) SHR left; Septets[i] := AnsiChar(currBy); mask := mask SHR 1; left := left - 1; end; end; // for // remove last character if unused // this is kind of a hack, but frankly I don't know how else to compensate // for it. if (Septets[NumSeptets] = #0) then SetLength(Septets, NumSeptets - 1); // convert 7-bit alphabet to ascii Result := Convert7BitToAscii(Septets); end; initialization InitializeTables; end.