Flutter http response.body bad utf8 encoding
Solution 1
I just do:
utf8.decode(response.bodyBytes);
even if you are geting a JSON
jsonDecode(utf8.decode(response.bodyBytes))
Solution 2
Solution 1
HTTP in absence of a defined charset is assumed to be encoded in ISO-8859-1 (Latin-1). And body from its description is consistent with this behaviour. If the server response sets the Content-Type header to application/json; charset=utf-8 the body should work as expected.
The problem of course is that there are servers out there that do not set charset for JSON (which is valid), but which is also a bit of a grey area in between the two specs:
JSON is always supposed to be UTF-8, and for that reason says you don't need to set charset, but .. HTTP is always by default ISO-8859-1, unless the charset is explicitly set. A "smart" HTTP client could choose to follow the JSON definition closer than the HTTP definition and simply say any application/json is by default UTF-8 - technically violating the HTTP standard. However, the most robust solution is ultimately for the server to explicitly state the charset which is valid according to both standards.
HttpClientRequest request = await HttpClient().post(_host, 4049, path) /*1*/
..headers.contentType = ContentType.json /*2*/
..write(jsonEncode(jsonData)); /*3*/
HttpClientResponse response = await request.close(); /*4*/
await response.transform(utf8.decoder /*5*/).forEach(print);
Solution 2 (flutter)
use replaceAll to replace response.body
newString.replaceAll('�', '');
Solution 3 (php)
use php file to get content first then use your url and use str_replace php
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, 'YOUR-URL');
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$jsonData = curl_exec($curlSession);
echo $bodytag = str_replace("�", "", $jsonData);
curl_close($curlSession);
Hope it helps.
Guilherme Salomao
Updated on December 12, 2022Comments
-
Guilherme Salomao over 1 year
I'm starting to learn Flutter and I'm doing it by making my own
manga
reading app, in which I scrape all the data from the website I use the most.My problem is that only one of the
mangas
I read I can't scrape the data because of this error:FormatException (FormatException: Bad UTF-8 encoding 0x22 (at offset 369))
My scraper code:
Future<Manga> getMangaInfo(source) async{ final response = await _client.get(source); var manga; print(response.body);//error occurs here final document = parse(response.body); final mangaInfo = document.getElementsByClassName('tamanho-bloco-perfil'); for(Element infos in mangaInfo){ final infoCont = infos.getElementsByClassName('row'); //get titulo Element tituloCont = infoCont[0]; final tituloH = tituloCont.getElementsByTagName('h2'); Element tituloCont2 = tituloH[0]; String titulo = '['+tituloCont2.text+']'; //print(titulo); //get capa Element capaCont = infoCont[2]; final capaImg = capaCont.getElementsByTagName('img'); Element capaCont2 = capaImg[0]; final capaUrl = capaCont2.attributes['src']; //get caprecente final capsPorNumero = document.getElementsByClassName('row lancamento-linha'); final caps = capsPorNumero[0].getElementsByTagName('a'); Element info = caps[0]; final numero = info.text.split(' ')[1]; final capRecenteUrl = info.attributes['href']; manga = Manga(null,source,titulo,capaUrl,numero,capRecenteUrl); } return manga; }
The
response.body
that gives the errorI also tried using
response.bodyBytes
and decoding but still can't fix itHere's the link to the page: https://unionleitor.top/perfil-manga/kimetsu-no-yaiba
What I guess is the problem is the � character on the following meta tag on the html head
<meta name="description" content="Kimetsu no Yaiba - Novo mangá sobrenatural da Shonen Jump. O mangá conta a história de Tanjiro, o filho mais velho de uma família que �">
I couldn't find the solution yet, maybe I just looked the wrong places. Can anyone help me to solve this issue ?
Thanks! -
Guilherme Salomao about 4 yearsThanks for the reply but they have set the charset with utf-8 (that's my fault for not putting it here) Here they set the charset <meta charset="utf-8">
-
Guilherme Salomao about 4 yearsI tried that earlier and it still gave me an error code, but I will try to store it in a new String and then pass it to parse
-
Mikel Tawfik about 4 years@GuilhermeSalomao then Solution 3
-
Guilherme Salomao about 4 yearsThanks for the patience, I'm not familiar with php yet so I'll give it a try later. Just tried again the replaceAll() and still gave the error.
-
Gilson Araujo over 2 yearsWorks like a charm. OP should mark this as the accepted answer.
-
Shayan about 2 yearsWhat about
http.read()
? It returns aString
and it's not showing the utf-8 chars properly.