The Symantec Messaging Gateway formally known as Brightmail is a spam filtering appliance, you can read more about it from Symantec here. The appliance appears to run on Linux and it has both a web-interface and a command line interface accessible via SSH. It also has the ability to log system and application level logs via syslog.
The system level logs include processes such as sshd, crond and sudo; the application application level mail logs consist of two processes: ecelerity and bmserver. In this post I focus on the application level logs, those beginning with the <142> prefix. Symantec has some not so helpful documentation on this appliance’s log formats here: https://support.symantec.com/en_US/article.HOWTO15282.html
From what I see in Splunk the logs are in the format: <identifier>date time server-name process[process-number]: process-id|message-id|event|variable-log-format. There appears to be 18 different application level log events all with a different format. Those events are: IRCPTACTION, ACCEPT, VERDICT, TRACKERID, UNTESTED, FIRED, SENDER, LOGICAL_IP, EHLO, MSG_SIZE, MSGID, SOURCE, SUBJECT, ORCPTS, DELIVER, ATTACH, UNSCANNABLE and VIRUS. These different formats make it impossible to use Splunk’s built in field extraction interface. An alternative solution is to write a custom regex extraction.
Lacking complete documentation I had to reverse engineer a regex extraction from the logs being sent to my Splunk server. With this in mind be warned that my final regex extraction may contain errors. Also be aware that the fields in the extraction are names that I assigned and are not the official field names as I could not find complete documentation on this system’s log format.
I used Regex101.com to help me craft and test my regular expression extraction. You can view my saved regex and test string with anonymized syslog entries at: https://regex101.com/r/kR0iS8/1. If you visit this link, and are familiar with regular expressions, you will notice that I used multiple positive look-behinds. This is the best solution I could come up with to deal with the variable log formats produced by the SMG. My skill with regular expressions is intermediate at best, so there may very well be better solutions out there. If someone with more knowledge of regular expressions reads this article and cares to correct me, feel free to leave a comment.
Below is the final regular expression that I came up with. This seems to work for the majority of the logs that Splunk processes from Symantec Messaging Gateway, but may need further tweaking.
^<142>(?P<date>\w+\s+\d+)\s+(?P<time>[^ ]+)\s+(?P<server>\w+)\s+(?P<process_name>[a-z]+)\[(?P<process_number>\d+)[^ \n]* (?P<process_id>[^\|]+)\|(?P<message_id>[^\|]+)\|(?P<action>IRCPTACTION|VERDICT|UNTESTED|FIRED|SENDER|LOGICAL_IP|EHLO|MSG_SIZE|MSGID|SOURCE|SUBJECT|ORCPTS|TRACKERID|ATTACH|UNSCANNABLE|VIRUS|DELIVER|ACCEPT)(?:(?:(?<=ACCEPT|DELIVER|LOGICAL_IP)\|(?P<src>[^:\s]+)(?::(?P<port>[0-9]+))?(?:\|(?P<to>[^\s]+))?)|(?:(?<=FIRED|IRCPTACTION|ORCPTS|TRACKERID|UNTESTED|VERDICT)\|(?P<recipient>[^\s\|]+)(?:\|)?(?P<result>[a-z][^\|\s]+)?(?:\|(?P<result_2>[a-z][^\|]+))?(?:\|(?P<result_3>.+))?)|(?:(?<=SENDER)\|(?P<from>[^\s]+))|(?:(?<=MSG_SIZE)\|(?P<msg_size>\w+))|(?:(?<=SUBJECT)\|(?P<subject>.*))|(?:(?<=ATTACH)\|(?P<attachment>.+))|(?:(?<=UNSCANNABLE)\|(?P<reason>.+))|(?:(?<=VIRUS)\|(?P<virus_name>.+))|(?:(?<=EHLO)\|(?P<fqdn>.+)))?
If readers have any questions of comments about this extraction, feel free to leave a comment and I will try to respond in a timely manner.
Hi
This is great!
We have been struggling with this for some time.
How did you set up the initial inputs.conf for getting the data into Splunk?
Which sourcetype do you assign to the application level logs you receive from the SMGs?
Best
Soren
Soren, we are importing logs from the SMG via syslog on a unique port and it was setup via the Splunk GUI. Being configured this way there was no need for us to directly edit the inputs.conf file. The source type chosen was purely arbitrary and it’s the same for all the syslogs from the SMG device. I haven’t found a need to differentiate sourcetypes for the system level and application level logs. I have been writing searched based off of the process field that I extracted.
Here is mine which is based on slightly different looking logs :
source=”local1.log” index=”test” sourcetype=”sbg” | rex “(?P\w+\s+\d+)\s+(?P[^ ]+)\s+(?P\w+)\s+(?P[a-z]+)\[(?P\d+)[^ \n]*\s\[Brightmail\]\s\((?P\w+)\:(?P[^\)]+)\)\:\s+\[(?P
[^\]]+)\]\s+\[\s+\<(?P[^ \>]+)\>\](?P.*)\.$"
log format (masked information) :
Jun 12 15:12:30 bmgp7 bmserver[27775]: [Brightmail] (WARNING:27775.950114048): [2886] [ ] Feature "Symantec DNS URI Server: smg.ultra.brightmail.com" restored.