2

I'm trying to create a program that reads a dictionary and then stores the words into the hash table, then read another file checks every word of that file if it is in the hash table if it is not then it will be outputted as a misspelled word. I'm first trying to check if I can load the dictionary file into my hash table and then output the words in the hash table yet my code seems to crash whenever I try to run it. The hash function I use was taken from the Internet. I'm also still very new with data structures, and having a hard time understanding.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;

typedef struct node
{
    char* word;
    struct node *next;
}
node;

node *table[10];

// hash function
unsigned int hash(char *word)
{
// TODO
    unsigned int hash = 5381;
    int c = 0;

    while (c == *word++)
        hash = ((hash << 5) + hash) + c;

    return hash % 10;
}

int main(void)
{
    // initialize array heads to NULL
    for (int i = 0; i < N; i++)
    {
        table[i] = NULL;
    }

    // Open file to read
    FILE *indata = fopen(dictionary, "r");   
    if (indata == NULL)
    {
        printf("cant open\n");
        return 1;
    }

    // variable to store words read from the file
    char *words = malloc(sizeof(char) * 20);
    if (words == NULL)
    {
        printf("no memory\n");
        return 1;
    }

    // While loop to read through the file
    while (fgets(words, 20, indata))
    {
        // get the index of the word using hash function
        int index = hash(words);

        // create new node
        node *newNode = malloc(sizeof(node));
        if (newNode == NULL)
        {
            printf("here\n");
            return 1;
        }

        // make the new node the new head of the list
        strcpy(newNode->word, words);
        newNode->next = table[index];
        table[index] = newNode;

        // free memory
        free(newNode);
    }
    // free memory
    free(words);
    // loop to print out the values of the hash table
    for (int i = 0; i < N; i++)
    {
        node *tmp = table[i];
        while (tmp->next != NULL)
        {
            printf("%s\n", tmp->word);
            tmp = tmp->next;
        }
    }

    // loop to free all memory of the hash table
    for (int i = 0; i < N; i++)
    {
        if (table[i] != NULL)
        {
            node *tmp = table[i]->next;
            free(table[i]);
            table[i] = tmp;
        }
    }

    // close the file
    fclose(indata);
}
Ojou Nii
  • 244
  • 4
  • 11
  • You may want to read these two links: [How to debug small programs?](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) and [How to create a minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – Andreas Wenzel Apr 15 '20 at 14:59

2 Answers2

0

From a cursory glance I can see two problems:

  1. You don't allocate space for your word in the node; you simply strcopy the word into an undefined pointer. You might want to use strdup instead.

  2. You free the memory of the node after you added it to the list. The table is an array of pointers, so you store the point in the table and then throw away the memory that it points to.

Oh, three: and in the final loop you free the unallocated memory again...

Oliver Mason
  • 2,240
  • 2
  • 15
  • 23
  • `strcpy(newNode->word, words);` Are you talking about this line of code in 1.?? I thought when you use `node *newNode = malloc(sizeof(node));` newNode will be allocated memory for both `char *word; and struct node *next;` – Ojou Nii Apr 15 '20 at 15:14
  • I tried using print statements to locate the error, my program seems to crash after taking 1 loop of `while (fgets(words, 20, indata))` and then seems to crash on the next loop then after the getting the index for the next word.. – Ojou Nii Apr 15 '20 at 15:18
  • @OjouNii The `malloc` gets memory for the **pointers**, not for the content. Your second comment is consistent with the second error. – Oliver Mason Apr 15 '20 at 15:23
0

At least three bugs that independently caused a segfault:

First, newNode->word is used unitialized, so it points to random memory, so the strcpy would segfault. Better to use strdup

Also, after you put newNode in the table, you do free(newNode) making what it points to invalid. This causes the second loop to segfault

Third, in the second loop, if table[i] is null, the while (tmp->next != NULL) will segfault

I've annotated and corrected your code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// file to read
#define dictionary "dictionary.txt"

// No. of buckets
const unsigned int N = 10;

typedef struct node {
    char *word;
    struct node *next;
} node;

node *table[10];

// hash function
unsigned int
hash(char *word)
{
// TODO
    unsigned int hash = 5381;
    int c = 0;

    while (c == *word++)
        hash = ((hash << 5) + hash) + c;

// NOTE: not a bug but probably better
#if 0
    return hash % 10;
#else
    return hash % N;
#endif
}

int
main(void)
{
    // initialize array heads to NULL
    for (int i = 0; i < N; i++) {
        table[i] = NULL;
    }

    // Open file to read
    FILE *indata = fopen(dictionary, "r");

    if (indata == NULL) {
        printf("cant open\n");
        return 1;
    }

    // variable to store words read from the file
    char *words = malloc(sizeof(char) * 20);

    if (words == NULL) {
        printf("no memory\n");
        return 1;
    }

    // While loop to read through the file
    while (fgets(words, 20, indata)) {
        // get the index of the word using hash function
        int index = hash(words);

        // create new node
        node *newNode = malloc(sizeof(node));

        if (newNode == NULL) {
            printf("here\n");
            return 1;
        }

        // make the new node the new head of the list
// NOTE/BUG: word is never set to anything valid -- possible segfault here
#if 0
        strcpy(newNode->word, words);
#else
        newNode->word = strdup(words);
#endif
        newNode->next = table[index];
        table[index] = newNode;

        // free memory
// NOTE/BUG: this will cause the _next_ loop to segfault -- don't deallocate
// the node you just added to the table
#if 0
        free(newNode);
#endif
    }

    // free memory
    free(words);

    // loop to print out the values of the hash table
    for (int i = 0; i < N; i++) {
        node *tmp = table[i];
// NOTE/BUG: this test fails if the tmp is originally NULL (i.e. no entries
// in the given hash index)
#if 0
        while (tmp->next != NULL) {
#else
        while (tmp != NULL) {
#endif
            printf("%s\n", tmp->word);
            tmp = tmp->next;
        }
    }

    // loop to free all memory of the hash table
    for (int i = 0; i < N; i++) {
        if (table[i] != NULL) {
            node *tmp = table[i]->next;

            free(table[i]);
            table[i] = tmp;
        }
    }

    // close the file
    fclose(indata);
}

UPDATE:

I made a linked list program before that stores an integer in the list, int number; struct node *next; and I used newNode->number = 5; and it worked, why is it in this case it doesn't?? Is it because I am working with strings here??

The difference is that word is a pointer. It must be assigned a value before it can be used. strcpy does not assign a value to word. It tries to use the contents of word as the destination address of the copy.

But, the other two bugs happen regardless of word being a char * vs number being int.

If you had defined word not as a pointer, but as a fixed array [not as good in this usage], the strcpy would have worked. That is, instead of char *word;, if you had done (e.g.) char word[5];

But, what you did is better [with the strdup change] unless you can guarantee that the length of word can hold the input. strdup will guarantee that.

But, notice that I [deliberately] made word have only five chars to illustrate the problem. It means that the word to be added can only be 4 characters in length [we need one extra byte for the nul terminator character]. You'd need to use strncpy instead of strcpy but strncpy has issues [it does not guarantee to add the nul char at the end if the source length is too large].

Conincidentally, there is another question today that has an answer that may help shed some more light on the differences of your word struct member: Difference between memory allocations of struct member (pointer vs. array) in C

Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • I made a linked list program before that stores an integer in the list, `int number; struct node *next;` and I used `newNode->number = 5;` and it worked, why is it in this case it doesn't?? Is it because I am working with strings here?? – Ojou Nii Apr 15 '20 at 15:45
  • Also I am actually taking CS50x course, and this is for a problem set, I used strdup() function on the online ide and I got an error ` implicit declaration of function 'strdup' is invalid in C99` I have included tho... – Ojou Nii Apr 15 '20 at 15:57
  • I've added an update to my answer to help clarify your first comment. But, in a way, I'm surprised that you could use `strcpy` _without_ the `#include` but needed it for `strdup`. Both functions are defined in `string.h`. My best guess: I'm _slightly_ familiar with `cs50` because of other posters here. It has you include a custom `.h` file. That file _may_ define `strcpy` but _not_ `strdup`. I had been looking for a downloadable version of the `cs50` source files, but it appears that they're only doing web based development now? – Craig Estey Apr 15 '20 at 16:28