c++ - C Tokenize String -


im trying tokenize string delimiters while keeping delimiters. tried using strtok(string, delimiters) function doesn't keep delimiters. example, if string is:

"my name < is|john >hi" 

i want split when see symbols "space", "<", ">".

the tokens be:

my, space, name, space, < , space, is, |, john, space, <, hi 

at first, tried read char char until saw delimiter symbol. if didnt see symbol, append read char string before it. example, string "hi|bye". read "h", read next char. "i" append "h". read next symbol, delimiter put "hi" array , "|" array. repeat until done. ran issues doing this.

here's code doesn't work:

int main() {   char *line = "command1 | command2 command3 > command4 < command5";   do_tokenize(line);   return 0; } void do_tokenize(char *line) {   char *tokenized[100];   char token[100];   int tokencounter = 0;   int tokenlength = 0;   int i;   int newtoken = 1;   int tokennum = 0;   for(i=0; line[i] !='\0'; i++)     {       if(line[i] != ' ' && line[i] != '<' && line[i] != '>' && line[i] != '|')     {       token[tokenlength] = line[i];       tokenlength++;       newtoken = 1;     }       else     {       if(newtoken == 1)         {           token[tokenlength] = '\0';           tokenized[tokennum] = token;           tokenlength = 0;           tokennum++;           newtoken = 0;            token[tokenlength] = line[i];           token[tokenlength+1] = '\0';           tokenized[tokennum] = token;           tokenlength = 0;           tokennum++;         }        else         {           token[tokenlength] = line[i];           token[tokenlength+1] = '\0';           tokenized[tokennum] = token;           tokenlength = 0;           tokennum++;           newtoken = 0;         }     }//end else     }//end    token[tokenlength] = '\0';   tokenized[tokennum] = token;   tokennum++;    //print saying of tokenized[j] last token ie command5   int j=0;   for(j; j<tokennum; j++)     printf("%s\n", tokenized[j]);  } 

when try print out entire array (tokenized[j]), saying of last token, "command5". done in c.

it appears have elements of tokenized array point each token found in 'line'. code faithfully copies each character of token char array 'token'. after entire token loaded 'token' 0 terminated.

- 

all of good; next step seems code flawed. next ponter in tokenized array set point @ 'token'. problem 'token' working storage, , content of 'token' re-build each new token.

- 

at end, affected pointers in 'tokenize" array point same place; specifically, point 'token'. of course, content of 'token' last parsed token.

- 

hence, when printing out 'tokenize' array, , being point 'token', , being content of token last parsed token ("command5")...


Comments

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

jQuery Mobile app not scrolling in Firefox -

how to receive file in java(servlet/jsp) -