Tag Archives: Regex

Splunk Field Extractions for Symantec Messaging Gateway A.K.A Brightmail Syslogs

The Symantec Messaging Gateway formally known as Brightmail is a spam filtering appliance, you can read more about it from Symantec here. The appliance appears to run on Linux and it has both a web-interface and a command line interface accessible via SSH. It also has the ability to log system and application level logs via syslog.

The system level logs include processes such as sshd, crond and sudo; the application application level mail logs consist of two processes: ecelerity and bmserver. In this post I focus on the application level logs, those beginning with the <142> prefix. Symantec has some not so helpful documentation on this appliance’s log formats here: https://support.symantec.com/en_US/article.HOWTO15282.html

From what I see in Splunk the logs are in the format: <identifier>date time server-name process[process-number]: process-id|message-id|event|variable-log-format. There appears to be 18 different application level log events all with a different format. Those events are: IRCPTACTION, ACCEPT, VERDICT, TRACKERID, UNTESTED, FIRED, SENDER, LOGICAL_IP, EHLO, MSG_SIZE, MSGID, SOURCE, SUBJECT, ORCPTS, DELIVER, ATTACH, UNSCANNABLE and VIRUS. These different formats make it impossible to use Splunk’s built in field extraction interface. An alternative solution is to write a custom regex extraction.

Lacking complete documentation I had to reverse engineer a regex extraction from the logs being sent to my Splunk server. With this in mind be warned that my final regex extraction may contain errors. Also be aware that the fields in the extraction are names that I assigned and are not the official field names as I could not find complete documentation on this system’s log format.

I used Regex101.com to help me craft and test my regular expression extraction. You can view my saved regex and test string with anonymized syslog entries at: https://regex101.com/r/kR0iS8/1. If you visit this link, and are familiar with regular expressions, you will notice that I used multiple positive look-behinds. This is the best solution I could come up with to deal with the variable log formats produced by the SMG. My skill with regular expressions is intermediate at best, so there may very well be better solutions out there. If someone with more knowledge of regular expressions reads this article and cares to correct me, feel free to leave a comment.

Below is the final regular expression that I came up with. This seems to work for the majority of the logs that Splunk processes from Symantec Messaging Gateway, but may need further tweaking.

^<142>(?P<date>\w+\s+\d+)\s+(?P<time>[^ ]+)\s+(?P<server>\w+)\s+(?P<process_name>[a-z]+)\[(?P<process_number>\d+)[^ \n]* (?P<process_id>[^\|]+)\|(?P<message_id>[^\|]+)\|(?P<action>IRCPTACTION|VERDICT|UNTESTED|FIRED|SENDER|LOGICAL_IP|EHLO|MSG_SIZE|MSGID|SOURCE|SUBJECT|ORCPTS|TRACKERID|ATTACH|UNSCANNABLE|VIRUS|DELIVER|ACCEPT)(?:(?:(?<=ACCEPT|DELIVER|LOGICAL_IP)\|(?P<src>[^:\s]+)(?::(?P<port>[0-9]+))?(?:\|(?P<to>[^\s]+))?)|(?:(?<=FIRED|IRCPTACTION|ORCPTS|TRACKERID|UNTESTED|VERDICT)\|(?P<recipient>[^\s\|]+)(?:\|)?(?P<result>[a-z][^\|\s]+)?(?:\|(?P<result_2>[a-z][^\|]+))?(?:\|(?P<result_3>.+))?)|(?:(?<=SENDER)\|(?P<from>[^\s]+))|(?:(?<=MSG_SIZE)\|(?P<msg_size>\w+))|(?:(?<=SUBJECT)\|(?P<subject>.*))|(?:(?<=ATTACH)\|(?P<attachment>.+))|(?:(?<=UNSCANNABLE)\|(?P<reason>.+))|(?:(?<=VIRUS)\|(?P<virus_name>.+))|(?:(?<=EHLO)\|(?P<fqdn>.+)))?

If readers have any questions of comments about this extraction, feel free to leave a comment and I will try to respond in a timely manner.