I think no compiler actually does that specific optimization at the moment. In many situations it is unlikely that the programmer would want this behavior because they expect stores from other threads to become visible as quickly as possible.
I think is part of the reason that compilers are very conservative in applying optimizations to atomics, especially ones that would eliminate loads/stores, although I don't see any problem with applying the optimization in your specific example.
However, from the perspective of the ISO C++ standard, the compiler would be allowed to cache the read in a register and reuse it in general.
The compiler only has to make sure that a write to the atomic variable from another thread becomes visible to this thread in a finite time, i.e. an infinite loop reading the atomic should not keep reading a cached register value.
It should also make them visible in "a reasonable amount of time" ([atomics.order]/11). It depends on how exactly you interpret "reasonable" to determine to what degree such caching would be ok.
Also note that atomics are much more than just a guarantee that reads and writes happen atomically for all threads. Even with only std::memory_order_relaxed (the weakest variant) operations on it, it also guarantees that all threads agree to a single global modification order in which writes to the atomic happen which is consistent with sequencing in each individual thread and that read-modify-write operations happen atomically and consistent with that order. This is not guaranteed for non-atomic objects.
And with other std::memory_order_* options (which are the default) the operations also provide memory ordering guarantees with respect to other objects than the single atomic object itself.