c# - Why are ASCII values of a byte different when cast as Int32? -


i'm in process of creating program scrub extended ascii characters text documents. i'm trying understand how c# interpreting different character sets , codes, , noticing oddities.

consider:

namespace asciitest {     class program     {         static void main(string[] args)         {             string value = "slide™1½”c4®";             byte[] asciivalue = encoding.ascii.getbytes(value);   // byte array             char[] array = value.tochararray();                   // char array             console.writeline("char\tbyte\tint32");              (int = 0; < array.length; i++)             {                 char  letter     = array[i];                 byte  bytevalue  = asciivalue[i];                 int32 int32value = array[i];                  //                 console.writeline("{0}\t{1}\t{2}", letter, bytevalue, int32value);             }             console.readline();         }     } } 

output program

char    byte    int32 s       83      83 l       108     108       105     105 d       100     100 e       101     101 t       63      8482      <- trademark symbol 1       49      49 ½       63      189       <- fraction "       63      8221      <- smartquotes c       67      67 4       52      52 r       63      174       <- registered trademark symbol 

in particular, i'm trying understand why extended ascii characters (the ones notes added right of third column) show correct value when cast int32, show 63 when cast byte value. what's going on here?

ascii.getbytes conversion replaces all characters outside of ascii range (0-127) question mark (code 63).

so since string contains characters outside of range asciivalue have ? instead of interesting symbols - char (unicode) repesentation 8482 indeed outside of 0-127 range.

converting string char array not modify values of characters , still have original unicode codes (char int16) - casting longer integer type int32 not change value.

below possible conversion of character byte/integers:

var value = "™"; var ascii = encoding.ascii.getbytes(value)[0]; // 63(`?`) - outside 0-127 range var casttobyte = (byte)(value[0]); // 34 = 8482 % 256 var int16 = (int16)value[0]; // 8482  var int32 = (int16)value[0]; // 8482  

details available @ asciiencoding class

asciiencoding corresponds windows code page 20127. because ascii 7-bit encoding, ascii characters limited lowest 128 unicode characters, u+0000 u+007f. if use default encoder returned encoding.ascii property or asciiencoding constructor, characters outside range replaced question mark (?) before encoding operation performed.


Comments

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

jQuery Mobile app not scrolling in Firefox -

how to receive file in java(servlet/jsp) -