c++ - C Tokenize String -
im trying tokenize string delimiters while keeping delimiters. tried using strtok(string, delimiters)
function doesn't keep delimiters. example, if string is:
"my name < is|john >hi"
i want split when see symbols "space", "<", ">"
.
the tokens be:
my, space, name, space, < , space, is, |, john, space, <, hi
at first, tried read char char until saw delimiter symbol. if didnt see symbol, append read char string before it. example, string "hi|bye". read "h", read next char. "i" append "h". read next symbol, delimiter put "hi" array , "|" array. repeat until done. ran issues doing this.
here's code doesn't work:
int main() { char *line = "command1 | command2 command3 > command4 < command5"; do_tokenize(line); return 0; } void do_tokenize(char *line) { char *tokenized[100]; char token[100]; int tokencounter = 0; int tokenlength = 0; int i; int newtoken = 1; int tokennum = 0; for(i=0; line[i] !='\0'; i++) { if(line[i] != ' ' && line[i] != '<' && line[i] != '>' && line[i] != '|') { token[tokenlength] = line[i]; tokenlength++; newtoken = 1; } else { if(newtoken == 1) { token[tokenlength] = '\0'; tokenized[tokennum] = token; tokenlength = 0; tokennum++; newtoken = 0; token[tokenlength] = line[i]; token[tokenlength+1] = '\0'; tokenized[tokennum] = token; tokenlength = 0; tokennum++; } else { token[tokenlength] = line[i]; token[tokenlength+1] = '\0'; tokenized[tokennum] = token; tokenlength = 0; tokennum++; newtoken = 0; } }//end else }//end token[tokenlength] = '\0'; tokenized[tokennum] = token; tokennum++; //print saying of tokenized[j] last token ie command5 int j=0; for(j; j<tokennum; j++) printf("%s\n", tokenized[j]); }
when try print out entire array (tokenized[j]), saying of last token, "command5"
. done in c
.
it appears have elements of tokenized array point each token found in 'line'. code faithfully copies each character of token char array 'token'. after entire token loaded 'token' 0 terminated.
-
all of good; next step seems code flawed. next ponter in tokenized array set point @ 'token'. problem 'token' working storage, , content of 'token' re-build each new token.
-
at end, affected pointers in 'tokenize" array point same place; specifically, point 'token'. of course, content of 'token' last parsed token.
-
hence, when printing out 'tokenize' array, , being point 'token', , being content of token last parsed token ("command5")...
Comments
Post a Comment