3

I am trying to check input validity. I am given a file, where there are two columns, first one of strings, second one of ints. The separator is any number of spaces. The problem is, I cannot figure out how to properly check if the second item really is an int. Consider this code:

string line; // parameter with a string to be parsed
string name;
int num;
istringstream lineStream(line);
lineStream >> name;
lineStream >> num;

by calling lineStream.good() I can detect inputs such as "abcd g", but when I put something like "abcd 12a" in, it is parsed to "abcd" and 12. Same with "abcd 12.2" etc.

In Java I could use String.split() to an array and then parse each whole element, not ignoring any characters, but I do not know what is the correct approach here.

Thanks for hints

WhozCraig
  • 65,258
  • 11
  • 75
  • 141
Martin Melka
  • 7,177
  • 16
  • 79
  • 138
  • You can effect the same "split" strategy in C++ assuming your split is based on whitespace into strings, then validate the individual entities, but ultimately the question is a fair point. It seems you're asking how to validate the *element* being extracted is, in fact, complete and concise, and perhaps how to glean that information via stream properties. Good question. I'll be interested in different approaches to this. – WhozCraig Mar 11 '13 at 17:42
  • I was wondering if there was a way to do this simply, i.e. using streams. One could hassle with creating an array of char*, then tokenize them and check validity, that's true. – Martin Melka Mar 11 '13 at 17:46

2 Answers2

1

You could simply check if you reached the end of the line std::stringstream after you've extracted the int. If the int extraction fails, there was no integer at all. If it succeeds but there is still content in the stream, then there is too much in the line. I would approach the problem like so:

std::string line, name;
int num;
while (std::getline(file, line)) {
  std::stringstream line_stream(line);
  if (line_stream >> name >> num &&
      line_stream.eof()) {
    // Line was of the correct format
  }
}

Basically, there are two checks:

  1. line_stream >> name >> num will fail if it can't extract either the string or int.
  2. After these extractions, the eof bit will be set if they reached the end of the stream. So line_stream.eof() will fail if there is still content left in the stream.

Obviously this is a pretty fragile method that only works for the particular situation and would need adapting to pretty much any change in line format.

Joseph Mansfield
  • 108,238
  • 20
  • 242
  • 324
  • @JerryCoffin That's a [common misconception](http://stackoverflow.com/q/14615671/150634) (almost as common as thinking `!eof()` is a good `while` condition). `eof` is set when the extraction reaches the end - it's just not a good condition to use before reading. – Joseph Mansfield Mar 11 '13 at 17:48
  • This won't work if there are any whitespaces after the second field, will it? – Martin Melka Mar 11 '13 at 17:48
  • @MartinMelka Indeed it won't. It'll only allow specifically ` `. Do you want to allow whitespace at the end? – Joseph Mansfield Mar 11 '13 at 17:48
  • I think I do. Maybe if I called `line_stream >> somestring` in the end, it would consume the whitespaces and therefore moved to an eof? (I could then check whether `somestring.length()==0` to eliminate the possibility of an another stuff in the stream) ? – Martin Melka Mar 11 '13 at 17:51
  • @MartinMelka That sounds plausible. – Joseph Mansfield Mar 11 '13 at 17:53
  • @JerryCoffin That's a different situation. Because you're extracting a `char`, it only reads a single character and doesn't hit the end. When extracting an `int`, it keeps reading until it hits the end of file and then sets the eof bit. Using `!eof()` fails when you haven't hit the end yet your next extraction will fail anyway - such as with a text file (which has a hidden extra `\n` at the end). – Joseph Mansfield Mar 11 '13 at 17:59
  • @JerryCoffin I agree. I've rephrased the last sentence to make that a bit clearer. – Joseph Mansfield Mar 11 '13 at 18:23
0

You can read two strings and then check if second string is realy a number. Something like this:

std::string line; // parameter with a string to be parsed
std::string name;
std::string strNum;
int num;
std::istringstream lineStream(line);
lineStream >> name;
lineStream >> strNum;

if (isNumber(strNum))
{
   num = atoi(strNum.c_str());
}

//

bool isNumber(const std::string& strNum)
{
   bool lIsNumber = true;
   std::string::const_iterator it = strNum.begin();
   while (it != strNum.end())
   {
      if (!std::isdigit(*it))
      {
         lIsNumber = false;
         break;
      }
      ++it;
  }
   return lIsNumber;
}
WormholeWizard
  • 366
  • 1
  • 3
  • I suppose that should work. It won't take the rest of the line into consideration, but that can be easily done. Why haven't I thought of this? :O – Martin Melka Mar 11 '13 at 17:55