Friday, September 1, 2017

Using SQL date_trunc to group log records for troubleshooting

Did you ever notice a recurring error message in your log files, only to wonder if the problem has recently become more severe, or has actually been happening -- unnoticed -- for quite a while? If your log data is in SQL format, or any format you can query using SQL, a good technique is to group the records by time period to see if the error has become more prevalent over time.

Here's a real-world example. In a SQL table named log, which includes columns named timestamp (the time the record was written) and message (text giving details of what was logged), I stumbled across many records where message  had the value 'ConnectionError in application_cache'. I wanted to learn whether this error had suddenly become more frequent. This query gave me the answer:

select date_trunc('day', timestamp) as d, count(*) from log where message = 'ConnectionError in application_cache' group by d order by d

The date_trunc function takes a timestamp and "rounds it down" to the nearest specified unit -- the nearest day in my example. Instead of 'day', you can specify any unit from 'microseconds' all the way up to 'millennium'.

The results of my query, a count of the number of matching messages per day, looked like this:

That told me what I needed to know: This message happens many times per day, and the frequency hasn't increased recently.

My database is PostgreSQL 9.x, and the PostgreSQL date_trunc function is documented here. Whichever SQL database your using, a similar function is likely available, and may come in very handy.

No comments:

Post a Comment