c# - Why are ASCII values of a byte different when cast as Int32? -
i'm in process of creating program scrub extended ascii characters text documents. i'm trying understand how c# interpreting different character sets , codes, , noticing oddities.
consider:
namespace asciitest { class program { static void main(string[] args) { string value = "slide™1½”c4®"; byte[] asciivalue = encoding.ascii.getbytes(value); // byte array char[] array = value.tochararray(); // char array console.writeline("char\tbyte\tint32"); (int = 0; < array.length; i++) { char letter = array[i]; byte bytevalue = asciivalue[i]; int32 int32value = array[i]; // console.writeline("{0}\t{1}\t{2}", letter, bytevalue, int32value); } console.readline(); } } }
output program
char byte int32 s 83 83 l 108 108 105 105 d 100 100 e 101 101 t 63 8482 <- trademark symbol 1 49 49 ½ 63 189 <- fraction " 63 8221 <- smartquotes c 67 67 4 52 52 r 63 174 <- registered trademark symbol
in particular, i'm trying understand why extended ascii characters (the ones notes added right of third column) show correct value when cast int32
, show 63
when cast byte
value. what's going on here?
ascii.getbytes
conversion replaces all characters outside of ascii range (0-127) question mark (code 63).
so since string contains characters outside of range asciivalue
have ?
instead of interesting symbols ™
- char
(unicode) repesentation 8482 indeed outside of 0-127 range.
converting string char array not modify values of characters , still have original unicode codes (char
int16
) - casting longer integer type int32
not change value.
below possible conversion of character byte/integers:
var value = "™"; var ascii = encoding.ascii.getbytes(value)[0]; // 63(`?`) - outside 0-127 range var casttobyte = (byte)(value[0]); // 34 = 8482 % 256 var int16 = (int16)value[0]; // 8482 var int32 = (int16)value[0]; // 8482
details available @ asciiencoding class
asciiencoding corresponds windows code page 20127. because ascii 7-bit encoding, ascii characters limited lowest 128 unicode characters, u+0000 u+007f. if use default encoder returned encoding.ascii property or asciiencoding constructor, characters outside range replaced question mark (?) before encoding operation performed.
Comments
Post a Comment