Using cURL to download a web stream

6,144

how I can force cURL to attempt to "reconnect" and keep parsing the stream into the same file, even though it thinks it's finished

A general way (not specific to curl) is:

while true; do curl -o - … ; done >file

The point is curl writes to stdout. We redirect the entire loop to a file, no matter how many restarts it takes. This way the output from multiple consecutive curl processes is concatenated and goes to a single file.

To run this for 12 hours:

timeout 12h sh -c 'while true; do curl -o - … ; done >file'

To run non-stop and create a new file every 12 hours:

while true; do
   timeout 12h sh -c 'while true; do curl -o - … ; done >"audio_feed_$(date +%Y-%m-%d_%T)"'
done

To start a new file on demand just kill the current timeout or sh that is its child.


Note I don't know if a stream with few seconds missing from time to time, yet concatenated, will result in a playable file. I expect protocols/containers/codecs designed to be streamed via network in real time to be able to re-synchronize after a missing fragment; this should work regardless if data comes from a server or a file. If the stream you want to capture is like this, then you will experience "missing content" while playing later (at timestamps when one curl exited and the next one hasn't taken over yet), but this "hiccup" should not stop a player that really knows how to handle such stream.

Share:
6,144

Related videos on Youtube

gunter
Author by

gunter

Updated on September 18, 2022

Comments

  • gunter
    gunter over 1 year

    I am trying to download a streaming audio feed, from an online radio station. The station used to operate on a MP3 shoutcast feed, but now has upgraded to an AAC HTTP audio feed.

    I used to use "streamripper" in terminal to rip the station for my car rides, but now streamripper fails to rip the new stream. I'm pretty sure the station is utilizing http chunked transfer encoding now for their stream, which streamripper does not support.

    I have come up with a new solution, where I isolated the audio feed from the station's web player, and I am utilizing cURL to rip the feed into an audio file I can take with me.

    However I am constantly getting "completions" on my cURL, when it should endlessly record. I have even set the max time parameter to 43200s (12 hours), but I just end up with varying file sizes. Usually each resultant file is not longer than 1 hour. Sometimes it can be longer than an hour though. The file sizes and times are different. The file "breaks/completes" after a short period and I have to use a script to restart the cURL recording. I end up with a large folder of fractal recordings, when I should just have 2 recordings per day, (every 12 hours). When I look at the verbose of the cURL transfer, it just ends with "cURL connection left in tact". There is no error in the cURL log, therefore I am not sure how I can force cURL to attempt to "reconnect" and keep parsing the stream into the same file, even though it thinks it's finished.

    I have also tried using "wget" and "Jdownloader", They both have the same result, where it finishes after a short amount of time.

    I am not sure what I can do to essentially force a reconnect and keep downloading into the same file without overwriting it.

    What can I do to make sure my recordings don't "break"? Is there anything I can do to either force a reconnect? Or perhaps there is some way to tell cURL to wait even if the cURL transfer speed drops to 0?

    Any thoughts would be highly appreciated.

    Thank you

    • Spiff
      Spiff over 4 years
      Can you post a link to the main page or main stream URL for this station? Several modern HTTP-based streaming formats, such as HLS and DASH, are automatically posted in chunks rather than a single continuous file containing the whole program.
    • gunter
      gunter over 4 years
      (zas4.ndx.co.za/proxy/bushradio?mp=/stream) is the URL to the direct audio feed. My recording durations vary, sometimes it is 10 minutes sometimes it is 3 hours. But I can never get a 12 hour file. If there is any information you can provide regarding the stream that would be great! I would love it if I can get a continuous feed without breaks.
  • gunter
    gunter over 4 years
    I currently use "cat" in terminal to join all my file parts into one file to take with me, and I just try to ignore gaps in the audio. Would the process you described not result in something similar? I already have a script that throws cURL into a loop and makes it endlessly run when it finishes. I do like your method because it skips the need for "cat", which is still very useful. But is there no way to salvage the connection in cURL somehow? Establishing a new connection creates too much of a gap. Perhaps I can make it download the stream in chunks instead, allowing more of a buffer?
  • Kamil Maciorowski
    Kamil Maciorowski over 4 years
    @gunter Yes, my method is like cat, but it works on the fly. This way you don't waste time, disk space and disk operations to read few files and write one big file. At the moment I cannot help you with curl itself.
  • gunter
    gunter over 4 years
    Thank you for your suggestion