Is it possible to compute a regular expression over byte sequences to match against specific ranges of Unicode glyphs encoded using UTF-8? This comes up when you're stuck using lexer generators that only understand byte sequences.
I have to assume that it must be possible, because at worst you can just union together the UTF-8 byte sequences for every glyph in the range. However, given how large some ranges can be, this would probably be very inefficient. The better question is whether or not there is a smarter way of constructing your regular-expression or DFA. Though right now I'm not inclined to think about it too deeply as I know if it really comes down to doing this, I can just do it the blindingly stupid way and write some DFA minimization code.