regex: file_get_contents() and cyrillic

samedi 9 mai 2015

file_get_contents() and cyrillic

I am trying to get the source of a website using file_get_contents() function, then find something there using regex, and display it on the screen.

The problem is the data I want to extract is cyrillic, and when I look at the output, it is just some strange characters:

[6]=> array(1) { [0]=> string(83) "ĐĄĐ ĐĐĐĐĐ Đ˘ĐĐ¨ĐĐĐĄĐĐĐ ĐĐĐ ĐĐŁĐĐĐ 28 " } }

I also tried converting the encoding of the source into UTF-8 (all my files are utf-8), but I have a lot of those websites, and each of them could have different encodings.

$source = @file_get_contents($url, false, $context);
$source = iconv(mb_detect_encoding($source), 'UTF-8', $source);

This is what I tried, but it doesn't work.

Setting source encoding manually in iconv just changes the characters, but it's still not cyrillic.

How can I solve this?

regex

samedi 9 mai 2015

file_get_contents() and cyrillic

Aucun commentaire:

Enregistrer un commentaire