Problem to Decode H264 video over RTP with ffmpeg (libavcodec)

33,160

Solution 1

In RTP all H264 I-Frames (IDRs) are usualy fragmented. When you receive RTP you first must skip the header (usualy first 12 bytes) and then get to the NAL unit (first payload byte). If the NAL is 28 (1C) then it means that following payload represents one H264 IDR (I-Frame) fragment and that you need to collect all of them to reconstruct H264 IDR (I-Frame).

Fragmentation occurs because of the limited MTU, and much larger IDR. One fragment can look like this:

Fragment that has START BIT = 1:

First byte:  [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS] 
Second byte: [ START BIT | END BIT | RESERVED BIT | 5 NAL UNIT BITS] 
Other bytes: [... IDR FRAGMENT DATA...]

Other fragments:

First byte:  [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS]  
Other bytes: [... IDR FRAGMENT DATA...]

To reconstruct IDR you must collect this info:

int fragment_type = Data[0] & 0x1F;
int nal_type = Data[1] & 0x1F;
int start_bit = Data[1] & 0x80;
int end_bit = Data[1] & 0x40;

If fragment_type == 28 then payload following it is one fragment of IDR. Next check is start_bit set, if it is, then that fragment is the first one in a sequence. You use it to reconstruct IDR's NAL byte by taking the first 3 bits from first payload byte (3 NAL UNIT BITS) and combine them with last 5 bits from second payload byte (5 NAL UNIT BITS) so you would get a byte like this [3 NAL UNIT BITS | 5 NAL UNIT BITS]. Then write that NAL byte first into a clear buffer with all other following bytes from that fragment. Remember to skip first byte in a sequence since it is not a part of IDR, but only identifies the fragment.

If start_bit and end_bit are 0 then just write the payload (skipping first payload byte that identifies the fragment) to the buffer.

If start_bit is 0 and end_bit is 1, that means that it is the last fragment, and you just write its payload (skipping the first byte that identifies the fragment) to the buffer, and now you have your IDR reconstructed.

If you need some code, just ask in comment, I'll post it, but I think this is pretty clear how to do... =)

CONCERNING THE DECODING

It crossed my mind today why you get error on decoding the IDR (I presumed that you have reconstructed it good). How are you building your AVC Decoder Configuration Record? Does the lib that you use have that automated? If not, and you havent heard of this, continue reading...

AVCDCR is specified to allow decoders to quickly parse all the data they need to decode H264 (AVC) video stream. And the data is following:

  • ProfileIDC
  • ProfileIOP
  • LevelIDC
  • SPS (Sequence Parameter Sets)
  • PPS (Picture Parameter Sets)

All this data is sent in RTSP session in SDP under the fields: profile-level-id and sprop-parameter-sets.

DECODING PROFILE-LEVEL-ID

Prifile level ID string is divided into 3 substrings, each 2 characters long:

[PROFILE IDC][PROFILE IOP][LEVEL IDC]

Each substring represents one byte in base16! So, if Profile IDC is 28, that means it is actualy 40 in base10. Later you will use base10 values to construct AVC Decoder Configuration Record.

DECODING SPROP-PARAMETER-SETS

Sprops are usualy 2 strings (could be more) that are comma separated, and base64 encoded! You can decode both of them but there is no need to. Your job here is just to convert them from base64 string into byte array for later use. Now you have 2 byte arrays, first array us SPS, second one is PPS.

BUILDING THE AVCDCR

Now, you have all you need to build AVCDCR, you start by making new clean buffer, now write these things in it in the order explained here:

1 - Byte that has value 1 and represents version

2 - Profile IDC byte

3 - Prifile IOP byte

4 - Level IDC byte

5 - Byte with value 0xFF (google the AVC Decoder Configuration Record to see what this is)

6 - Byte with value 0xE1

7 - Short with value of the SPS array length

8 - SPS byte array

9 - Byte with the number of PPS arrays (you could have more of them in sprop-parameter-set)

10 - Short with the length of following PPS array

11 - PPS array

DECODING VIDEO STREAM

Now you have byte array that tells the decoder how to decode H264 video stream. I believe that you need this if your lib doesn't build it itself from SDP...

Solution 2

I don't know about the rest of your implementation, but it seems likely the 'fragments' you are receiving are NAL units. Therefore each, each may need the the NALU start-code (00 00 01 or 00 00 00 01) appended when you reconstruct the bitstream before sending it to ffmpeg.

At any rate, you might find the RFC for H264 RTP packetization useful:

http://www.rfc-editor.org/rfc/rfc3984.txt

Hope this helps!

Solution 3

I have an implementation of this @ https://net7mma.codeplex.com/ for c# but the process is the same everywhere.

Here is the relevant code

/// <summary>
    /// Implements Packetization and Depacketization of packets defined in <see href="https://tools.ietf.org/html/rfc6184">RFC6184</see>.
    /// </summary>
    public class RFC6184Frame : Rtp.RtpFrame
    {
        /// <summary>
        /// Emulation Prevention
        /// </summary>
        static byte[] NalStart = { 0x00, 0x00, 0x01 };

        public RFC6184Frame(byte payloadType) : base(payloadType) { }

        public RFC6184Frame(Rtp.RtpFrame existing) : base(existing) { }

        public RFC6184Frame(RFC6184Frame f) : this((Rtp.RtpFrame)f) { Buffer = f.Buffer; }

        public System.IO.MemoryStream Buffer { get; set; }

        /// <summary>
        /// Creates any <see cref="Rtp.RtpPacket"/>'s required for the given nal
        /// </summary>
        /// <param name="nal">The nal</param>
        /// <param name="mtu">The mtu</param>
        public virtual void Packetize(byte[] nal, int mtu = 1500)
        {
            if (nal == null) return;

            int nalLength = nal.Length;

            int offset = 0;

            if (nalLength >= mtu)
            {
                //Make a Fragment Indicator with start bit
                byte[] FUI = new byte[] { (byte)(1 << 7), 0x00 };

                bool marker = false;

                while (offset < nalLength)
                {
                    //Set the end bit if no more data remains
                    if (offset + mtu > nalLength)
                    {
                        FUI[0] |= (byte)(1 << 6);
                        marker = true;
                    }
                    else if (offset > 0) //For packets other than the start
                    {
                        //No Start, No End
                        FUI[0] = 0;
                    }

                    //Add the packet
                    Add(new Rtp.RtpPacket(2, false, false, marker, PayloadTypeByte, 0, SynchronizationSourceIdentifier, HighestSequenceNumber + 1, 0, FUI.Concat(nal.Skip(offset).Take(mtu)).ToArray()));

                    //Move the offset
                    offset += mtu;
                }
            } //Should check for first byte to be 1 - 23?
            else Add(new Rtp.RtpPacket(2, false, false, true, PayloadTypeByte, 0, SynchronizationSourceIdentifier, HighestSequenceNumber + 1, 0, nal));
        }

        /// <summary>
        /// Creates <see cref="Buffer"/> with a H.264 RBSP from the contained packets
        /// </summary>
        public virtual void Depacketize() { bool sps, pps, sei, slice, idr; Depacketize(out sps, out pps, out sei, out slice, out idr); }

        /// <summary>
        /// Parses all contained packets and writes any contained Nal Units in the RBSP to <see cref="Buffer"/>.
        /// </summary>
        /// <param name="containsSps">Indicates if a Sequence Parameter Set was found</param>
        /// <param name="containsPps">Indicates if a Picture Parameter Set was found</param>
        /// <param name="containsSei">Indicates if Supplementatal Encoder Information was found</param>
        /// <param name="containsSlice">Indicates if a Slice was found</param>
        /// <param name="isIdr">Indicates if a IDR Slice was found</param>
        public virtual void Depacketize(out bool containsSps, out bool containsPps, out bool containsSei, out bool containsSlice, out bool isIdr)
        {
            containsSps = containsPps = containsSei = containsSlice = isIdr = false;

            DisposeBuffer();

            this.Buffer = new MemoryStream();

            //Get all packets in the frame
            foreach (Rtp.RtpPacket packet in m_Packets.Values.Distinct()) 
                ProcessPacket(packet, out containsSps, out containsPps, out containsSei, out containsSlice, out isIdr);

            //Order by DON?
            this.Buffer.Position = 0;
        }

        /// <summary>
        /// Depacketizes a single packet.
        /// </summary>
        /// <param name="packet"></param>
        /// <param name="containsSps"></param>
        /// <param name="containsPps"></param>
        /// <param name="containsSei"></param>
        /// <param name="containsSlice"></param>
        /// <param name="isIdr"></param>
        internal protected virtual void ProcessPacket(Rtp.RtpPacket packet, out bool containsSps, out bool containsPps, out bool containsSei, out bool containsSlice, out bool isIdr)
        {
            containsSps = containsPps = containsSei = containsSlice = isIdr = false;

            //Starting at offset 0
            int offset = 0;

            //Obtain the data of the packet (without source list or padding)
            byte[] packetData = packet.Coefficients.ToArray();

            //Cache the length
            int count = packetData.Length;

            //Must have at least 2 bytes
            if (count <= 2) return;

            //Determine if the forbidden bit is set and the type of nal from the first byte
            byte firstByte = packetData[offset];

            //bool forbiddenZeroBit = ((firstByte & 0x80) >> 7) != 0;

            byte nalUnitType = (byte)(firstByte & Common.Binary.FiveBitMaxValue);

            //o  The F bit MUST be cleared if all F bits of the aggregated NAL units are zero; otherwise, it MUST be set.
            //if (forbiddenZeroBit && nalUnitType <= 23 && nalUnitType > 29) throw new InvalidOperationException("Forbidden Zero Bit is Set.");

            //Determine what to do
            switch (nalUnitType)
            {
                //Reserved - Ignore
                case 0:
                case 30:
                case 31:
                    {
                        return;
                    }
                case 24: //STAP - A
                case 25: //STAP - B
                case 26: //MTAP - 16
                case 27: //MTAP - 24
                    {
                        //Move to Nal Data
                        ++offset;

                        //Todo Determine if need to Order by DON first.
                        //EAT DON for ALL BUT STAP - A
                        if (nalUnitType != 24) offset += 2;

                        //Consume the rest of the data from the packet
                        while (offset < count)
                        {
                            //Determine the nal unit size which does not include the nal header
                            int tmp_nal_size = Common.Binary.Read16(packetData, offset, BitConverter.IsLittleEndian);
                            offset += 2;

                            //If the nal had data then write it
                            if (tmp_nal_size > 0)
                            {
                                //For DOND and TSOFFSET
                                switch (nalUnitType)
                                {
                                    case 25:// MTAP - 16
                                        {
                                            //SKIP DOND and TSOFFSET
                                            offset += 3;
                                            goto default;
                                        }
                                    case 26:// MTAP - 24
                                        {
                                            //SKIP DOND and TSOFFSET
                                            offset += 4;
                                            goto default;
                                        }
                                    default:
                                        {
                                            //Read the nal header but don't move the offset
                                            byte nalHeader = (byte)(packetData[offset] & Common.Binary.FiveBitMaxValue);

                                            if (nalHeader > 5)
                                            {
                                                if (nalHeader == 6)
                                                {
                                                    Buffer.WriteByte(0);
                                                    containsSei = true;
                                                }
                                                else if (nalHeader == 7)
                                                {
                                                    Buffer.WriteByte(0);
                                                    containsPps = true;
                                                }
                                                else if (nalHeader == 8)
                                                {
                                                    Buffer.WriteByte(0);
                                                    containsSps = true;
                                                }
                                            }

                                            if (nalHeader == 1) containsSlice = true;

                                            if (nalHeader == 5) isIdr = true;

                                            //Done reading
                                            break;
                                        }
                                }

                                //Write the start code
                                Buffer.Write(NalStart, 0, 3);

                                //Write the nal header and data
                                Buffer.Write(packetData, offset, tmp_nal_size);

                                //Move the offset past the nal
                                offset += tmp_nal_size;
                            }
                        }

                        return;
                    }
                case 28: //FU - A
                case 29: //FU - B
                    {
                        /*
                         Informative note: When an FU-A occurs in interleaved mode, it
                         always follows an FU-B, which sets its DON.
                         * Informative note: If a transmitter wants to encapsulate a single
                          NAL unit per packet and transmit packets out of their decoding
                          order, STAP-B packet type can be used.
                         */
                        //Need 2 bytes
                        if (count > 2)
                        {
                            //Read the Header
                            byte FUHeader = packetData[++offset];

                            bool Start = ((FUHeader & 0x80) >> 7) > 0;

                            //bool End = ((FUHeader & 0x40) >> 6) > 0;

                            //bool Receiver = (FUHeader & 0x20) != 0;

                            //if (Receiver) throw new InvalidOperationException("Receiver Bit Set");

                            //Move to data
                            ++offset;

                            //Todo Determine if need to Order by DON first.
                            //DON Present in FU - B
                            if (nalUnitType == 29) offset += 2;

                            //Determine the fragment size
                            int fragment_size = count - offset;

                            //If the size was valid
                            if (fragment_size > 0)
                            {
                                //If the start bit was set
                                if (Start)
                                {
                                    //Reconstruct the nal header
                                    //Use the first 3 bits of the first byte and last 5 bites of the FU Header
                                    byte nalHeader = (byte)((firstByte & 0xE0) | (FUHeader & Common.Binary.FiveBitMaxValue));

                                    //Could have been SPS / PPS / SEI
                                    if (nalHeader > 5)
                                    {
                                        if (nalHeader == 6)
                                        {
                                            Buffer.WriteByte(0);
                                            containsSei = true;
                                        }
                                        else if (nalHeader == 7)
                                        {
                                            Buffer.WriteByte(0);
                                            containsPps = true;
                                        }
                                        else if (nalHeader == 8)
                                        {
                                            Buffer.WriteByte(0);
                                            containsSps = true;
                                        }
                                    }

                                    if (nalHeader == 1) containsSlice = true;

                                    if (nalHeader == 5) isIdr = true;

                                    //Write the start code
                                    Buffer.Write(NalStart, 0, 3);

                                    //Write the re-construced header
                                    Buffer.WriteByte(nalHeader);
                                }

                                //Write the data of the fragment.
                                Buffer.Write(packetData, offset, fragment_size);
                            }
                        }
                        return;
                    }
                default:
                    {
                        // 6 SEI, 7 and 8 are SPS and PPS
                        if (nalUnitType > 5)
                        {
                            if (nalUnitType == 6)
                            {
                                Buffer.WriteByte(0);
                                containsSei = true;
                            }
                            else if (nalUnitType == 7)
                            {
                                Buffer.WriteByte(0);
                                containsPps = true;
                            }
                            else if (nalUnitType == 8)
                            {
                                Buffer.WriteByte(0);
                                containsSps = true;
                            }
                        }

                        if (nalUnitType == 1) containsSlice = true;

                        if (nalUnitType == 5) isIdr = true;

                        //Write the start code
                        Buffer.Write(NalStart, 0, 3);

                        //Write the nal heaer and data data
                        Buffer.Write(packetData, offset, count - offset);

                        return;
                    }
            }
        }

        internal void DisposeBuffer()
        {
            if (Buffer != null)
            {
                Buffer.Dispose();
                Buffer = null;
            }
        }

        public override void Dispose()
        {
            if (Disposed) return;
            base.Dispose();
            DisposeBuffer();
        }

        //To go to an Image...
        //Look for a SliceHeader in the Buffer
        //Decode Macroblocks in Slice
        //Convert Yuv to Rgb
    }

There are also implementations for various other RFC's which help getting the media to play in a MediaElement or in other software or just saving it to disk.

Writing to a container format is underway.

Share:
33,160
Admin
Author by

Admin

Updated on July 11, 2022

Comments

  • Admin
    Admin almost 2 years

    I set profile_idc, level_idc, extradata et extradata_size of AvCodecContext with the profile-level-id et sprop-parameter-set of the SDP.

    I separate the decoding of Coded Slice, SPS, PPS and NAL_IDR_SLICE packet :

    Init:

    uint8_t start_sequence[]= {0, 0, 1}; int size= recv(id_de_la_socket,(char*) rtpReceive,65535,0);

    Coded Slice :

    char *z = new char[size-16+sizeof(start_sequence)];
        memcpy(z,&start_sequence,sizeof(start_sequence));
        memcpy(z+sizeof(start_sequence),rtpReceive+16,size-16);
        ConsumedBytes = avcodec_decode_video(codecContext,pFrame,&GotPicture,(uint8_t*)z,size-16+sizeof(start_sequence));
        delete z;
    

    Result: ConsumedBytes >0 and GotPicture >0 (often)

    SPS and PPS :

    identical code. Result: ConsumedBytes >0 and GotPicture =0

    It's normal I think

    When I find a new couple SPS/PPS, I update extradata and extrada_size with the payloads of this packet and their size.

    NAL_IDR_SLICE :

    The Nal unit type is 28 =>idr Frame are fragmented therefor I tryed two method to decode

    1) I prefix the first fragment (without RTP header) with the sequence 0x000001 and send it to avcodec_decode_video. Then I send the rest of fragments to this function.

    2) I prefix the first fragment (without RTP header) with the sequence 0x000001 and concatenate the rest of fragments to it. I send this buffer to decoder.

    In both cases, I have no error (ConsumedBytes >0) but I detect no frame (GotPicture = 0) ...

    What is the problem ?

  • Scott
    Scott almost 14 years
    I don't have enough karma to comment on your question or answer below, but are you appending the NALU startcode before to EACH 'fragment'?
  • Cipi
    Cipi almost 14 years
    You don't need to do that... Fragments are parts of one IDR. NALU is transmitted only in first fragment, not each one. To decode it, you totally don't need to add no start code, because NAL unit defines the H264 payload that follows it (lower 5 bits do that).
  • Admin
    Admin over 13 years
    This library can build it itself but I build it myself.With ffmpeg, this parameter are stored in a structure (AvCodecContext). I will try building ACDR with your method. thx
  • Cipi
    Cipi over 13 years
    Ok, then you are not reconstructing IDR like you should... check once more the process. Hope I helped... =)
  • Admin
    Admin over 13 years
    It's good: ACDR is recognized by the decoder and parameters are set. Decoder does not decode the rest but it is due to another parameters of ffmpeg I think. I thank you for your help : I have already made significant progress.
  • Alexander Olsson
    Alexander Olsson almost 12 years
    This is a really good answer, unfortunately you wrote the second byte of the FU-A incorrect. It should be [ START | END | RESERVED | TYPE ] that is, END and RESERVED should change places. See RFC3984 (ietf.org/rfc/rfc3984.txt).
  • Cipi
    Cipi almost 12 years
    Yes I see, thank you for the comment! I did the start_bitand the end_bit bit masking good thought... :P
  • Frank
    Frank over 8 years
    @Cipi, I read that "If you need some code, just ask in comment, I'll post it, but I think this is pretty clear how to do..." May I ask you to post the code for the reconstruction of the IDR? Thank you very much.