0

I'm working on hash table in C++ language and I need a hash function for string data. One hash function that I have tried is add ascii code and use modulo (%100).

My actual requirement is to find the words which exactly matches or started with a given pattern.

Ex: Given pattern is "comp". Then I want get all the words starting with comp. (Ex: company, computer, comp etc) Can I do this using a hash because the tried hash function can find only exact matches. So can anyone suggest me a hash function suitable for this requirement.

Anupama Pathirage
  • 631
  • 2
  • 10
  • 22

2 Answers2

1

Prefix matched is better handled with a trie.

Basically this is a tree structure that holds on each node one character from the key. The concatenating the characters from the different nodes in the path from the root to a given node will produce the key for that node.

Searching is a matter of descending the trie comparing each character of the searched key with the child nodes. Once you consumed all the characters, the remaining subtree are all the keys that have as prefix the searched key.

jsantander
  • 4,972
  • 16
  • 27
  • Thanks a lot for the reply. I already tried with the trie. It matches my requirement of searching well. But the problem is my data is very large and it is frequently updating. So I need to rebuild the tree for each update. Updates are very slow due to that. That is why I thought to move for hash like structure. Do you have any idea on fast updating data structure which matches my requirement. – Anupama Pathirage Apr 17 '15 at 05:20
  • @user1308004, I believe *updating* the tree should not be expensive (with delete/add/query operations [complexity](http://stackoverflow.com/questions/13032116/trie-complexity-and-searching) depending on the length of the key, not on the number of elements). Now if your problem is that you don't really know what data changed and need to *rebuild* the tree from scratch on every update, then that's a different problem: perhaps you can add a side hash-map that can help you determine what's new and what's old. – jsantander Apr 17 '15 at 07:42
0

Sounds like what you really need is lexicographical sorting. You can do that by using a sorted data structure, like a std set or map, or by using vector and the std::sort algorithm. Note that C++ sort is faster than std C qsort.

Erik Alapää
  • 2,585
  • 1
  • 14
  • 25
  • Thanks a lot. I need to search for data in very fast manner. So I'm not sure whether the performance of std map or vector is enough with large set of data – Anupama Pathirage Apr 17 '15 at 05:22
  • Well, std::sort internally uses the qsort algorithm (but with C++ optimizations like compile-time binding of comparison function instead of function pointers like in C), so it is pretty much as fast as it gets. And the data structures based on balanced trees, i.e. map and set, insert and search in logarithmic time. Also, you have no problem of re-hashing, the performance is guaranteed 100% of the time, not 99% like a hash table. – Erik Alapää Apr 18 '15 at 10:16