2

I want to detect if a long text string (input from "somewhere") contains mathematical expressions encoded in LaTeX. This means searching for substrings (denoted ... in what follows) enclosed inside either of:

  1. $...$
  2. \[...\]
  3. \(...\)
  4. \begin{displaymath} ... \end{displaymath}

There are some variations of item 3 with other keywords than displaymath, and there may be a whitespace inside the brace, etc., but I suppose I can figure out the rest once I get (1), (2), (3) working.

For (1), I suppose I can do the following:

import re
if re.search(r"$(\w+)$", str):
  (do something)`

But I am having problems with the others, especially when it has the \. Help would be appreciated.

The python version should be 2.7.12 but ideally code that works for both versions 2.x and 3.x will be preferred.

  • You need to catch `$$` pairs too and whitespace in braces is syntax error by default. Is this a duplicate http://stackoverflow.com/questions/14182879/regex-to-match-latex-equations ? – percusse Sep 06 '16 at 06:27
  • Oh, did not know that it is syntax error. Yes, the question looks similar. I could not easily find one in search. Thanks. Yes, `$$` pairs need to be caught but they would anyway be caught in the `$...$` search. – Notions and Notes Sep 06 '16 at 06:42

1 Answers1

1

You need to escape \,[,],{,},(,) as they have special meaning in regular expression.

So, you need to add an extra \ before them, when you want to match them literally.

For your second pattern, use:

\\\[(.+?)\\\]

For third pattern, use:

\\\((.+?)\\\)

For fourth pattern,

\\begin\{displaymath\}(.+?)\\end\{displaymath\}

You can see the demo for the fourth pattern here.

Ahsanul Haque
  • 10,676
  • 4
  • 41
  • 57