0

This is a html snippet:

<input value="1" style="width: 90%;" type="text"></th><th style="cursor: default;" class=" filter">
<input value="2" class="tooltip" style="font-size: 0.8em; width: 90%;" placeholder="Datum" title="Datumsfilter 
Mögliche Operatoren: < | > | <= | >= | <> < | > | <= | >= | <> | .. 
Beispiele: 
2016 | 2016-03 | 2016-03-24 (nur Jahr[-Monat[-Tag]]) 
>2015-02 | <=2016-09-15 (ab/bis angegebenem Jahr[-Monat[-Tag]] inkl./exkl.)
2016-03..2016-04-15 (angegebener Bereich) 
<>2016-03 (ungleich)" type="text">

I must extract the value and it is possible that the type attribute is in any order.

/(?=<input.*?type="text"[^>]*?>).*?value="([^"]*)/

Works fine for

<input value="1" style="width: 90%;" type="text"></th><th style="cursor: default;" class=" filter">

But it breaks on ">" inside the title attribute from the second. How I can fix this?

Frank
  • 1,901
  • 20
  • 27
  • 4
    Don't parse HTML with a regex. – melpomene Sep 25 '15 at 12:10
  • There is no other option. – Frank Sep 25 '15 at 12:22
  • 3
    Why not? What language are you using? – melpomene Sep 25 '15 at 12:22
  • Frank - bad idea, and it's annoying to ask this on this forum without at least SOME explanation of "WHY" it must use regex. That said... and let it sink in a bit... just a bit longer... it's hard to do what you ask with regex... let that sink in a bit... anything you produce in regex that does what you want is any where from fragile to criminally fragile... but it can be done – Code Jockey Sep 25 '15 at 13:31

2 Answers2

-2

Why not use

/\<input.{1,}?value="(.{1,}?)"/

it's a less complex: Just fetch the first "value"-tag (greedy quantifier) after each "input"-tag. Then get the value.

Note: I chose the greedy quantifier {1,}? instead of the * here to make sure, that the next occurrence of value will be selected and not any later occurrence.

But as mentioned: There might be better solutions than using regex here, for example if you are using php (? I don't know if you do ?). Depending on which language you are using you might have to change the regex-expression a little bit...

https://regex101.com/r/kB3wR2/1

fjellfly
  • 388
  • 1
  • 13
  • Why did you replace `*` by `+` and then `+` by `{1,}`? – melpomene Sep 25 '15 at 13:05
  • If someone were to be stupid, careless or mean enough to put `value="10"` into a different attribute (that is parsed before the desired attribute), this breaks. If a document being parsed uses single quotes (`'`) around the value of the value attribute, this breaks... ahh, using regex to parse HTML is fun, isn't it? – Code Jockey Sep 25 '15 at 15:05
  • @fjellfly My question had nothing to do with greediness. The original regex was already non-greedy. – melpomene Sep 25 '15 at 15:16
  • @CodeJockey: This is true. But for the given example this expression is doing its job. And since there's no more background information provided I think this is enough. To get the whole more flexible a different approach should be chosen, that's what I mentioned in the answer as well. Melpomene: I know that the original regex was non-greedy - but my approach is. I explained why I used a greedy quantifier. Don't know what you want to know, pleas ask more precise. – fjellfly Sep 26 '15 at 13:06
-2

This is Fragile...

It can be broken, circumvented and violated...

It is dangerous to use it...

Good luck for all who ignore these warnings...

There are problems galore to encounter and ways to pick nits out the ying yang, but if you have well formed HTML... ... ...

you can try: <input(?: (?:value=(['"])(?<value>(?:(?!\1).)*)\1|\w+=(['"])(?:(?!\3).)*\3))*>

And you can mess around with it here, if you dare.

Community
  • 1
  • 1
Code Jockey
  • 6,611
  • 6
  • 33
  • 45