The Composition Time(CTS) when wrapping H.264 NALU's

11,407

Solution 1

For MPEG-4 H.264 transcoders that deliver I-frame, P-frame, and B-frame NALUs inside an MPEG-2 transport, the resulting packetized elementary streams (PES) are timestamped with presentation time stamps (PTS) and decoder timestamps (DTS) in time units of 1/90000 of a second.

The NALUs come in DTS timestamp order in a repeating pattern like

I P B B B P B B B ...  

where the intended playback rendering is

I B B B P B B B P ... 

(This transport strategy ensures that both frames that the B-frame bridges are in the decoder before the B-frame is processed.)

For FLV, the Timestamp (FLV spec p.69) tells when the frame should be fed to the decoder in milliseconds, which is

timestamp = DTS / 90.0

The CompositionTime (FLV spec p.72) tells the renderer when to perform ("compose") the video frame on the display device in milliseconds after it enters the decoder; thus it is

compositionTime = (PTS - DTS) / 90.0 

(Because the PTS >= DTS, this delta is never negative.)

Solution 2

I think I have understood the CTS. it is only for B-frames. Because B-frames may depends fowarding frames to decode, so the CTS means when this B-frame can be decoded, usually that means all the depended frames are received.

Share:
11,407

Related videos on Youtube

Mr.Wang from Next Door
Author by

Mr.Wang from Next Door

Updated on June 04, 2022

Comments

  • Mr.Wang from Next Door
    Mr.Wang from Next Door about 2 years

    The h.264 hardware compression card procedures NALU's from captured video.

    I am trying to wrap the NALU's into FLV and I almost succeed.

    I don't know how to fill the Composition Time field in FLV for each NALU.

    According to the FLV spec, http://download.macromedia.com/f4v/video_file_format_spec_v10_1.pdf, E.4.3.1.

    CompositionTime Composition time offset

    See ISO 14496-12, 8.15.3 for an explanation of composition times. The offset in an FLV file is always in milliseconds

    Then look into the ISO 14496-12,8.15.3 , Page 24 and 26

    provides the offset between decoding time and composition time. Since decoding time must be less than the composition time, the offsets are expressed as unsigned numbers such that CT(n) = DT(n) + CTTS(n) where CTTS(n) is the (uncompressed) table entry for sample n.

    How can I know the DT and CTTS in each NALU? or how to caculate the CT without DT and CTTS?

    Thank you

    • Mr.Wang from Next Door
      Mr.Wang from Next Door
      When storing video stream with B-Frames, PTS (Presentation timestamp) may be larger than DTS (Decoder timestamp). It happens because b-frame requires frames following after it do be decoded.
  • melih
    melih over 11 years
    What about frame type? Are you also setting the frame type to 2 while streaming B frames?
  • TOP
    TOP over 8 years
    How can we calculate the DTS of a frame in NAL stream?
  • kippsoftware
    kippsoftware over 8 years
    Both PTS and DTS are encoded in the PES packet as defined in Table 2-21 of ISO 13818-1 (MPEG-2 specification).
  • ZijingWu
    ZijingWu almost 8 years
    The compositionTime is unsigned int in MP4 but signed int in FLV. And it is possible the PTS <= DTS in FLV.