c++ - C Tokenize String -
im trying tokenize string delimiters while keeping delimiters. tried using strtok(string, delimiters) function doesn't keep delimiters. example, if string is:
"my name < is|john >hi" i want split when see symbols "space", "<", ">".
the tokens be:
my, space, name, space, < , space, is, |, john, space, <, hi at first, tried read char char until saw delimiter symbol. if didnt see symbol, append read char string before it. example, string "hi|bye". read "h", read next char. "i" append "h". read next symbol, delimiter put "hi" array , "|" array. repeat until done. ran issues doing this.
here's code doesn't work:
int main() { char *line = "command1 | command2 command3 > command4 < command5"; do_tokenize(line); return 0; } void do_tokenize(char *line) { char *tokenized[100]; char token[100]; int tokencounter = 0; int tokenlength = 0; int i; int newtoken = 1; int tokennum = 0; for(i=0; line[i] !='\0'; i++) { if(line[i] != ' ' && line[i] != '<' && line[i] != '>' && line[i] != '|') { token[tokenlength] = line[i]; tokenlength++; newtoken = 1; } else { if(newtoken == 1) { token[tokenlength] = '\0'; tokenized[tokennum] = token; tokenlength = 0; tokennum++; newtoken = 0; token[tokenlength] = line[i]; token[tokenlength+1] = '\0'; tokenized[tokennum] = token; tokenlength = 0; tokennum++; } else { token[tokenlength] = line[i]; token[tokenlength+1] = '\0'; tokenized[tokennum] = token; tokenlength = 0; tokennum++; newtoken = 0; } }//end else }//end token[tokenlength] = '\0'; tokenized[tokennum] = token; tokennum++; //print saying of tokenized[j] last token ie command5 int j=0; for(j; j<tokennum; j++) printf("%s\n", tokenized[j]); } when try print out entire array (tokenized[j]), saying of last token, "command5". done in c.
it appears have elements of tokenized array point each token found in 'line'. code faithfully copies each character of token char array 'token'. after entire token loaded 'token' 0 terminated.
- all of good; next step seems code flawed. next ponter in tokenized array set point @ 'token'. problem 'token' working storage, , content of 'token' re-build each new token.
- at end, affected pointers in 'tokenize" array point same place; specifically, point 'token'. of course, content of 'token' last parsed token.
- hence, when printing out 'tokenize' array, , being point 'token', , being content of token last parsed token ("command5")...
Comments
Post a Comment