Flutter http response.body bad utf8 encoding

6,721

Solution 1

I just do:

utf8.decode(response.bodyBytes);

even if you are geting a JSON

jsonDecode(utf8.decode(response.bodyBytes))

Solution 2

Solution 1

HTTP in absence of a defined charset is assumed to be encoded in ISO-8859-1 (Latin-1). And body from its description is consistent with this behaviour. If the server response sets the Content-Type header to application/json; charset=utf-8 the body should work as expected.

The problem of course is that there are servers out there that do not set charset for JSON (which is valid), but which is also a bit of a grey area in between the two specs:

JSON is always supposed to be UTF-8, and for that reason says you don't need to set charset, but .. HTTP is always by default ISO-8859-1, unless the charset is explicitly set. A "smart" HTTP client could choose to follow the JSON definition closer than the HTTP definition and simply say any application/json is by default UTF-8 - technically violating the HTTP standard. However, the most robust solution is ultimately for the server to explicitly state the charset which is valid according to both standards.

  HttpClientRequest request = await HttpClient().post(_host, 4049, path) /*1*/
    ..headers.contentType = ContentType.json /*2*/
    ..write(jsonEncode(jsonData)); /*3*/
  HttpClientResponse response = await request.close(); /*4*/
  await response.transform(utf8.decoder /*5*/).forEach(print);

Solution 2 (flutter)

use replaceAll to replace response.body

newString.replaceAll('�', '');

Solution 3 (php)

use php file to get content first then use your url and use str_replace php

       $curlSession = curl_init();
        curl_setopt($curlSession, CURLOPT_URL, 'YOUR-URL');
        curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
        curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);

        $jsonData = curl_exec($curlSession);
echo $bodytag = str_replace("�", "", $jsonData);

        curl_close($curlSession);

Hope it helps.

Share:
6,721
Guilherme Salomao
Author by

Guilherme Salomao

Updated on December 12, 2022

Comments

  • Guilherme Salomao
    Guilherme Salomao over 1 year

    I'm starting to learn Flutter and I'm doing it by making my own manga reading app, in which I scrape all the data from the website I use the most.

    My problem is that only one of the mangas I read I can't scrape the data because of this error:

    FormatException (FormatException: Bad UTF-8 encoding 0x22 (at offset 369))
    

    My scraper code:

        Future<Manga> getMangaInfo(source) async{
        final response =  await _client.get(source);
        var manga;
        print(response.body);//error occurs here
        final document = parse(response.body);
    
        final mangaInfo = document.getElementsByClassName('tamanho-bloco-perfil');
        for(Element infos in mangaInfo){
          final infoCont = infos.getElementsByClassName('row');
          //get titulo
          Element tituloCont = infoCont[0];
          final tituloH = tituloCont.getElementsByTagName('h2');
          Element tituloCont2 = tituloH[0];
          String titulo = '['+tituloCont2.text+']';
          //print(titulo);
    
          //get capa
    
          Element capaCont = infoCont[2];
          final capaImg = capaCont.getElementsByTagName('img');
          Element capaCont2 = capaImg[0];
          final capaUrl = capaCont2.attributes['src'];
    
          //get caprecente
          final capsPorNumero = document.getElementsByClassName('row lancamento-linha');
          final caps = capsPorNumero[0].getElementsByTagName('a');
          Element info = caps[0];
          final numero = info.text.split(' ')[1];
          final capRecenteUrl = info.attributes['href'];
    
          manga = Manga(null,source,titulo,capaUrl,numero,capRecenteUrl);
    
    
        }
        return manga;
    
      }
    

    The response.body that gives the error

    I also tried using response.bodyBytes and decoding but still can't fix it

    Here's the link to the page: https://unionleitor.top/perfil-manga/kimetsu-no-yaiba

    What I guess is the problem is the � character on the following meta tag on the html head

    <meta name="description" content="Kimetsu no Yaiba - Novo mangá sobrenatural da Shonen Jump. O mangá conta a história de Tanjiro, o filho mais velho de uma família que �">
    

    I couldn't find the solution yet, maybe I just looked the wrong places. Can anyone help me to solve this issue ?
    Thanks!

  • Guilherme Salomao
    Guilherme Salomao about 4 years
    Thanks for the reply but they have set the charset with utf-8 (that's my fault for not putting it here) Here they set the charset <meta charset="utf-8">
  • Guilherme Salomao
    Guilherme Salomao about 4 years
    I tried that earlier and it still gave me an error code, but I will try to store it in a new String and then pass it to parse
  • Mikel Tawfik
    Mikel Tawfik about 4 years
    @GuilhermeSalomao then Solution 3
  • Guilherme Salomao
    Guilherme Salomao about 4 years
    Thanks for the patience, I'm not familiar with php yet so I'll give it a try later. Just tried again the replaceAll() and still gave the error.
  • Gilson Araujo
    Gilson Araujo over 2 years
    Works like a charm. OP should mark this as the accepted answer.
  • Shayan
    Shayan about 2 years
    What about http.read()? It returns a String and it's not showing the utf-8 chars properly.