Answer by warren for Regex breaks with “/” character instead of newline

Instead of using rex, this can all be done with eval and mvexpand

A run-anywhere example:

| makeresults
| eval urls="https://www.example.org/|http://example.com/|ca.gov|http://blade.example.com/bikes/airplane.php|http://alarm.example.com/|smugmug.com|shop-pro.jp|https://example.org/|qq.com|pcworld.com|symantec.com|360.cn|http://example.com/?brother=bike|http://www.example.com/behavior/bead.php|army.mil|https://example.com/boy/bedroom.php|https://example.com/|https://www.example.com/brother?activity=believe|https://www.example.net/achiever/bottle.html|http://believe.example.com/bit?bait=base&bone=ball|aboutads.info|http://www.example.com/|http://www.example.edu/afternoon|livejournal.com|http://border.example.com/box/afterthought|oaic.gov.au|https://www.example.edu/base.php|house.gov|smh.com.au|http://www.example.edu/|https://www.example.org/|lycos.com|https://border.example.com/?bridge=basket&blood=animal|hibu.com|http://example.com/"
| eval urls=split(urls,"|")
| mvexpand urls
| eval busted=split(urls,":")
| eval busted=mvindex(trim(split(trim(replace(mvfilter(match(busted,"\.")),"\/"," "))," ")),0)

I combined the last several steps into one line, but this is what it’s doing:

  • break the URL list based on the pipe ("|") character
  • mvexpand the multivalue field
  • split each individual URL on the : character (if it’s not there, there’s nothing to split
  • select the 0th (first) element of the following matched split in an mvfilter:
    • everything that has a period (".") that
    • has slashes ("/") replace with a space (" ") and
    • is split on the space (" ")

Your desired fqdn is now in busted

Extracting the TLD is now trivial. Append the following:

| rex field=busted "(?<tld>[0-9a-zA-Z][0-9a-zA-Z_\-]+?\.[0-9a-zA-Z]+)$"

Or, to keep with just an eval, skipping rex entirely, do this:

| eval tld=mvindex(split(busted,"."),-2) +"."+ mvindex(split(busted,"."),-1)

from User warren – Stack Overflow https://stackoverflow.com/questions/76162456/regex-breaks-with-character-instead-of-newline/76163721#76163721
via IFTTT