1

Goal of the Container

I am building a container that is meant to store sorted unsigned integer values in a very RAM efficient manner. The idea is to group values by common radix. Instead of

std::vector<unsigned int> V = {1234,1254,1264,1265,1267,1268,1271,1819,1832,1856,
                                  1867,1892,3210,3214,3256,3289};

I would have something like

MyContainer V =
{
   {12..
      ..34, ..54, ..64, ..65, ..67, ..68, ..71
   },
   {18..
      ..19, ..32, ..56, ..67, ..92
   }
   {32..
      ..10, ..14, ..56, ..89
   }
};

In the above example, I grouped the values by blocks of a 100. It would be more logical to group data by a group of 2^16. Assuming each unsigned int is 4 bytes and each unsigned short is 2 bytes, the first vector of 16 elements would take at least 16*4 = 64 bytes and the second vector would take 19*2=38 bytes of RAM.

Brief container description

Briefly, here is the organization of this container...

class MyContainer
{
   // attribute
   std::vector<Block> data;

   // methods
   // ..
};

class Block
{
   // attributes
   unsigned short radix;
   std::vector<unsigned short> suffixs; // contains all the suffix associated with this radix

   // methods
   // ..
};

While advice about this data structure would be welcome, the core of my question is about the implementation of the iterator.

Iterator

I am having issues when building the iterator. My iterator needs to allow forward exploration and random access as well. It is my first time building a classic iterator design pattern and I am likely making mistakes. Here are the attributes of my iterator

class Iterator
{
   // attributes
   std::vector<Block>::iterator bigP;             // points to a Block
   std::vector<unsigned short>::iterator smallP;  // points to an unsigned int within the Block pointed by bigP
   std::vector<Block>::iterator bigPEnd;

  // methods
  // ...
};

Q1: Should MyContainer::iterator contains iterators (as it is currently the case) or should it contain pointers? Why?

I thought that when the iterator points to the last unsigned int of a Block, then the operator++() should push bigP to the next Block and push smallP to the first element of the

It seems wrong to me that I have to include an iterator to the data.end() (called bigPEnd) but I ended up adding it when I realize that when the operator++() is called while the MyContainer::iterator points to the last unsigned int of the last Block, I had to know that I can't set smallP to bigP->begin() as this would lead to a segmentation fault as *bigP does not exist.

Q2: Do I need a pointer to the last element of data? How to avoid it?

I am also facing a similar issue when building an MyContainer::iterator for an empty vector. Typically, I would construct an iterator with

MyContainer::iterator MyContainer::begin()
{
   return iterator(data.begin(), data.front().suffixs.begin(), data.end());
        //            ^^                 ^^                       ^^
        //           bigP              smallP                    bigPEnd
}

However, data.front() will lead to a segmentation fault when data is empty. If I used pointers I could set smallP to nullptr when data is empty and when bigP == data.end() whether or not the data is empty.

Q3: How can I deal with smallP when there is nothing to point to?

Can you please give me some advice on the implementation of this iterator?

Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • 1
    Your implementation does not meet the requirements for a container. This may or may not be a problem. `std::vector` is not a container either, for exactly the same reason (neither data structure can provide true references to stored data). – n. m. could be an AI May 19 '19 at 05:28
  • `[container] to store sorted [integers] in a very RAM efficient manner` If this is the main goal (and not the mechanics of handling the two-level design presented), thinking *sparse set* may help. (*Data compression* is not as likely to be helpful as there still is little re `random access`.) Try to code The Simplest Thing That Could Possibly Work (and some harness to show it at work): you can try to sort out issues you have with that approach on [Code Review](https://codereview.stackexchange.com/help/on-topic). – greybeard May 19 '19 at 06:43
  • It may be useful to sketch the requirements for the `container` in addition to those for the `iterator`. – greybeard May 19 '19 at 06:46

1 Answers1

1

Q1

Should MyContainer::iterator contain iterators (as it is currently the case) or should it contain pointers? Why?

Performance and memory-wise it doesn't matter whether pointers or iterators are used. Iterators are just object-oriented equivalents of pointer logic. Like everything object-oriented, they're meant to make the code easier to read and maintain, i.e. decrease the chances that bugs are introduced.

If you use STL, it makes sense to also use iterators. They are two concepts that come from the same basic idea - extending C with object-oriented programming concepts. It's also an industry standard to prefer iterators over pointers.

Q2

Do I need a pointer to the last element of data? How to avoid it?

You can hold a reference to the underlying vector of Blocks and get the end iterator through the reference. See the code below. This is preferable to holding a constant end value, because the iteration will not break if new elements are pushed back to the vector of Blocks after the iterator is constructed. I have not found any guarantees in the C++ documentation that the value of vector::end() does not change when the vector changes. It's also worth considering what happens if the vector is changed to a different underlying container type in future versions of your code.

Q3

How can I deal with smallP when there is nothing to point to?

You can leave it as it is and call it an undefined value. This approach is consistent with the standard library's approach to uninitialized iterators - they also have an undefined value that should not be dereferenced.


You can play with a working minimal example below, you might find it helpful:

Playground: https://ideone.com/eeAKYN

#include<initializer_list>
#include<iostream>
#include<vector>


// Declarations:

// Stores a collection of numbers in the form radix*RADIX_MULTIPLIER + suffix,
// where 0 <= suffix < RADIX_MULTIPLIER.
class Block {
public:
    using SUFFIX_VECTOR_T = std::vector<unsigned short>;

    constexpr static unsigned int RADIX_MULTIPLIER = 100;

private:
    const unsigned short radix;

    // Contains all the suffixes associated with this radix.
    SUFFIX_VECTOR_T suffixes;

public:
    Block(unsigned short radix);
    unsigned short getRadix() const;
    void pushSuffix(const unsigned short suffix);
    std::size_t size() const;
    unsigned int operator[](std::size_t idx) const;
    SUFFIX_VECTOR_T::const_iterator begin() const;
    SUFFIX_VECTOR_T::const_iterator cbegin() const;
    SUFFIX_VECTOR_T::const_iterator end() const;
    SUFFIX_VECTOR_T::const_iterator cend() const;
};

using DATA_VECTOR_T = std::vector<Block>;

class MyIterator :
    public std::iterator<std::input_iterator_tag, unsigned int> {
    const DATA_VECTOR_T& data;
    DATA_VECTOR_T::const_iterator block_it;

    Block::SUFFIX_VECTOR_T::const_iterator suffix_it;

public:
    MyIterator(
        const DATA_VECTOR_T& data,
        const DATA_VECTOR_T::const_iterator start_block_it);
    MyIterator& operator++();
    MyIterator operator++(int);
    bool operator==(const MyIterator& rhs) const;
    bool operator!=(const MyIterator& rhs) const;
    unsigned int operator*() const;
};

// Read-only container which stores a sorted collection of numbers
// memory-efficiently in Blocks.
class MyContainer {
public:
    using const_iterator = MyIterator;

private:
    DATA_VECTOR_T data;

public:
    // The initializer list must be sorted in non-descending order.
    MyContainer(std::initializer_list<unsigned int> il);
    unsigned int operator[](std::size_t idx) const;
    MyContainer::const_iterator begin() const;
    MyContainer::const_iterator cbegin() const;
    MyContainer::const_iterator end() const;
    MyContainer::const_iterator cend() const;
};


// Definitions:

// class Block
Block::Block(unsigned short radix): radix(radix) {}

unsigned short Block::getRadix() const {
    return radix;
}

void Block::pushSuffix(const unsigned short suffix) {
    suffixes.push_back(suffix);
}

std::size_t Block::size() const {
    return suffixes.size();
}

unsigned int Block::operator[](std::size_t idx) const {
    return
        (unsigned int)(radix)*RADIX_MULTIPLIER +
        (unsigned int)(suffixes[idx]);
}

Block::SUFFIX_VECTOR_T::const_iterator Block::begin() const {
    return suffixes.begin();
}

Block::SUFFIX_VECTOR_T::const_iterator Block::cbegin() const {
    return begin();
}

Block::SUFFIX_VECTOR_T::const_iterator Block::end() const {
    return suffixes.end();
}

Block::SUFFIX_VECTOR_T::const_iterator Block::cend() const {
    return end();
}

// class MyContainer
// The initializer list must be sorted in non-descending order.
MyContainer::MyContainer(std::initializer_list<unsigned int> il) {
    if (il.size() == 0) {
        return;
    }

    unsigned short radix = *il.begin() / Block::RADIX_MULTIPLIER;
    data.push_back(Block(radix));
    for (const auto x : il) {
        radix = x / Block::RADIX_MULTIPLIER;
        if (data.back().getRadix() != radix) {
            data.push_back(Block(radix));
        }

        unsigned short suffix = x % Block::RADIX_MULTIPLIER;
        data.back().pushSuffix(suffix);
    }
}

unsigned int MyContainer::operator[](std::size_t idx) const {
    auto data_it = data.begin();

    // Similarly to std::vector::operator[], if idx is out of bounds of the
    // container, the behavior is undefined.
    while (idx >= data_it->size()) {
        idx -= data_it->size();
        ++data_it;
    }

    return (*data_it)[idx];
}

MyContainer::const_iterator MyContainer::begin() const {
    return MyIterator(data, data.cbegin());
}

MyContainer::const_iterator MyContainer::cbegin() const {
    return begin();
}

MyContainer::const_iterator MyContainer::end() const {
    return MyIterator(data, data.end());
}

MyContainer::const_iterator MyContainer::cend() const {
    return end();
}

// class MyIterator
MyIterator::MyIterator(
        const DATA_VECTOR_T& data,
        const DATA_VECTOR_T::const_iterator start_block_it):
        data(data), block_it(start_block_it) {
    if (data.cend() != block_it) {
        suffix_it = block_it->cbegin();
    }
}

MyIterator& MyIterator::operator++() {
    if (data.cend() == block_it) {
        return *this;
    }

    ++suffix_it;

    if (block_it->cend() == suffix_it) {
        ++block_it;

        if (data.cend() != block_it) {
            suffix_it = block_it->cbegin();
        }
    }

    return *this;
}

MyIterator MyIterator::operator++(int) {
    MyIterator tmp = *this;
    operator++();
    return tmp;
}

bool MyIterator::operator==(const MyIterator& rhs) const {
    // If this iterator has reached the end:
    if (data.cend() == block_it) {
        // Only return true if both iterators point to the end of the same
        // object.
        return data.cend() == rhs.block_it;
    }

    return block_it == rhs.block_it
        && suffix_it == rhs.suffix_it;
}

bool MyIterator::operator!=(const MyIterator& rhs) const {
    return !(*this == rhs);
}

unsigned int MyIterator::operator*() const {
    const std::size_t idx = suffix_it - block_it->cbegin();
    return (*block_it)[idx];
}


// Entry point:

int main() {
    std::vector<unsigned int> v = {
        1234, 1254, 1264, 1265, 1267, 1268, 1271, 1819, 1832, 1856, 1867, 1892,
        3210, 3214, 3256, 3289
    };

    // Print the entire vector.
    for (const auto x: v) {
        std::cout << x << "\t";
    }
    std::cout << std::endl;

    // Print a randomly accessed element from the vector.
    std::cout << v[10] << std::endl;


    MyContainer c = {
        1234, 1254, 1264, 1265, 1267, 1268, 1271, 1819, 1832, 1856, 1867, 1892,
        3210, 3214, 3256, 3289
    };

    // Print the entire container.
    for (const auto x: c) {
        std::cout << x << "\t";
    }
    std::cout << std::endl;

    // Print a randomly accessed element from the container.
    std::cout << c[10] << std::endl;

    return 0;
}

Resources:

Miłosz Łakomy
  • 996
  • 5
  • 12