string - php remove "questionmarks" � from fail-encoded text -
i´m extracting text weblink file_get_contents, have no influence on text, bits talk malformed in sourcecode of weblink got contents , , sth. :
/$%§&fdsgfkgfd � fdsfdsfs � � --> <h1>m�lll</h1> <h1>m�lll</h1> <h1>m�lll</h1> <h1>m�lll</h1> <h1>m�lll</h1> <h1>m�lll</h1>
or
<<<!-- � födns
my php file not meant "be" html file string im dealing with,
i searched internet difficult icon,
i want remove them because not necessary, how can remove them ?
ps: i´m not looking through browser, var_dump text in console
solution:
i use tthis function first cast string utf-8 string
function convtoutf8($str) { if( mb_detect_encoding($str,"utf-8, iso-8859-1, gbk")!="utf-8" ) { return iconv("gbk","utf-8",$str); } else { return $str; } }
you can discard characters not supported encoding, iconv()
:
$converted = iconv($input_encoding, $output_encoding . '//ignore', $original);
there 2 drawbacks:
- you need know input encoding, and
as can read in a user comment in manual,
iconv()
has bug'//ignore'
not work recent versions of iconv library. suggested workaround (here utf-8):ini_set('mbstring.substitute_character', 'none'); $text = mb_convert_encoding($text, 'utf-8', 'utf-8');
however, better attempt detect input encoding , convert input output encoding. leads to:
function recode ($input, $output_encoding) { $input_encoding = mb_detect_encoding($input); if ($input_encoding === false) { $old_substitute = mb_substitute_character(); mb_substitute_character('none'); $converted = mb_convert_encoding($input, $output_encoding, $output_encoding); mb_substitute_character($old_substitute); } else { $converted = ($output_encoding !== $input_encoding) ? iconv($input_encoding, $output_encoding, $input) : $input; } return $converted; }
Comments
Post a Comment