A program to determine the number of words and average word length is given in Program. It uses the strtok function to separate the words in a given string. The program given below uses the same technique to separate the words in a given string and determine and print the frequency of these words.
The program uses a structure named word to store a word and its count. The word string is stored in an array of 20 characters, which is adequate for most words in the English language. The main function uses an array of this structure, named words, to store the distinct words in the given string and their counts. It is assumed that the given string may have at most 100 distinct words.
The main function first initializes character array str with a string literal. It then displays this string and calls the word_freq function to determine the distinct words in string strand their counts in array words. The contents of this array are then printed.
The word_freq function accepts two parameters: str (the string to be processed) and words (an array of struct word). It returns an integer value representing the count of distinct words in the given string. Since the strtok function modifies the string being processed, the word_freq function creates a duplicate of string str using the strdup function and operates on this string. A character pointer tmp_st r is used to point to this string.
A character array punct_str is used as a string of punctuation characters and is initialized with the string literal ” .,;:!?’\””; This string is used by the strtok function to separate the words in string tmp_str.
The word_freq function uses a local variable nword as a counter of distinct words in string str. Initially, this counter is initialized to zero and the strtok function is used to separate the first word in string tmp_str. A character pointer wptr is used to point to this word. Then a while loop is setup to process the entire string.
In each iteration of the loop, first the current word pointed to by wptr is searched for in the words array. If it is found, its count is incremented; otherwise, it is added to the words array as a new distinct word, its count is set to 1 and the nword counter is incremented. Then, the strtok function is called to separate the next word in string tmp_str.
/* Determine word frequency in a given string */ #include <stdio.h> #include <string.h> #include <conio.h> struct word { char str[20]; /* word string: assume max 19 characters */ int count; /* word count */ }; int word_freq(const char *str, struct word words[]); int main () { char str[] = "Alexander said, \"I came, I saw, I conquered!\""; struct word words[100]; /* assume max. 100 distinct words */ int nword; /* no of words */ int i; printf("Given string:\n%s\n", str); nword = word_freq(str, words); puts("\nWord frequency:"); for(i = 0; i < nword; i++) printf(" %s: %d\n", words[i].str, words[i] .count); return 0; } /* calculate frequency of words in a given string */ int word_freq(const char *str, struct word words[]) { char punct_str [] =" .,;:!?'\""; /* punctuator list*/ char *tmp_str; /* pointer to a copy of given string */ char *wptr; /* pointer to a word */ int nword; /* number of distinct words */ int i; nword = 0; tmp_str = strdup(str); /* copy of given string */ wptr = strtok(tmp_str, punct_str); /* get ptr to first word */ while (wptr != NULL) { /* search current word in 'words' array */ for(i = 0; i < nword; i++) { if (strcmp(wptr, words[i].str) == 0) break; /* current word found, stop search */ } /* if current word is not in words array, add it at loc nword */ if (i < nword) /* current word already in 'words' array */ words[i] .count++; /* increment its count */ else { /* current word not in 'words' array */ strcpy(words[nword].str, wptr); /*add word at pos. nword */ words[nword].count= 1; /* set freq count to 1 */ ++nword; /*increment words count*/ } wptr = strtok(NULL, punct_str); /* get ptr to next word */ } free(tmp_str); /* release memory allocated to tmp_str */ return nword; }
The program output is given below.