StarRocks Best Practice Guide 5/5: Monitoring

StarRocks Engineering
4 min readJun 13, 2024

--

Over my years as a DBA and StarRocks contributor, I’ve gained a lot of experience working alongside a diverse array of community members and picked up plenty of best practices. In this time, I’ve found five specific models that stand out as absolutely critical: deployment, data modeling, data ingestion, querying, and monitoring.

In my previous article I shared some tips on StarRocks querying. In this final installment of my five part series I’ll discuss monitoring.

Monitoring — What You Need to Know

First, here’s what I’d say is required, or at least recommended, when we’re talking about monitoring with StarRocks:

  • Required: Use audit plugins to import fe.audit.log data into a table for analyzing slow queries via Audit Loader.
  • Required: Refer to “https://docs.starrocks.io/zh/docs/2.5/administration/Monitor_and_Alert/ ” deploying Prometheus+Grafana [5]
  • Recommended: Use resource isolation for large query circuit breaking, and ensure a baseline for small queries.
# shortquery_group For Core Business Emphasis:
CREATE RESOURCE GROUP shortquery_group
TO
(user='rg1_user1', role='rg1_role1', db='db1', query_type in ('select'), source_ip='192.168.x.x/24'),
WITH ( 'type' = 'short_query', 'cpu_core_limit' = '10', 'mem_limit' = '20%');
# bigquery_group Used for large query circuit breaking, to avoid large queries saturating cluster resources.
CREATE RESOURCE GROUP bigquery_group
TO (user='rg1_user2', role='rg1_role1', query_type in ('select')),
WITH (
"type" = 'normal',
'cpu_core_limit' = '10',
'mem_limit' = '20%',
'big_query_cpu_second_limit' = '100',
'big_query_scan_rows_limit' = '100000',
'big_query_mem_limit' = '1073741824'
);

Locating Large Queries with StarRocks

Next, let’s talk specifically about locating larger queries. Start with the following:

View currently running SQL queries on FE:

SHOW PROC '/current_queries'

Your results should include several columns such as:

  • QueryId
  • ConnectionId
  • Database (The DB of the current query)
  • User: User
  • ScanBytes (The amount of data scanned so far, in Bytes)
  • ProcessRow (The number of data rows scanned so far)
  • CPUCostSeconds (The CPU time used by the current query, in seconds. This is the cumulative CPU time of multiple threads. For example, if two threads occupy 1 second and 2 seconds of CPU time respectively, then the cumulative CPU time is 3 seconds).
  • MemoryUsageBytes (The memory currently occupied. If the query involves multiple BE nodes, this value is the sum of the memory occupied by the query on all BE nodes).
  • ExecTime (The duration of the query from initiation to now, in milliseconds).
mysql> show proc '/current_queries';
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
| QueryId | ConnectionId | Database | User | ScanBytes | ProcessRows | CPUCostSeconds | MemoryUsageBytes | ExecTime |
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
| 7c56495f-ae8b-11ed-8ebf-00163e00accc | 4 | tpcds_100g | root | 37.88 MB | 1075769 Rows | 11.13 Seconds | 146.70 MB | 3804 |
| 7d543160-ae8b-11ed-8ebf-00163e00accc | 6 | tpcds_100g | root | 13.02 GB | 487873176 Rows | 81.23 Seconds | 6.37 GB | 2090 |
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
2 rows in set (0.01 sec)

The SQL command to view the resource consumption of a specific query on each BE node is:

SHOW PROC '/current_queries/${query_id}/hosts'

The return results will have multiple rows, each describing the execution information of the query on the corresponding BE node. This should include the following columns:

  • Host (BE node information)
  • ScanBytes (Amount of data scanned, in bytes)
  • ScanRows (Number of data rows scanned)
  • CPUCostSeconds (CPU time used)
  • MemUsageBytes (Current memory usage)

mysql> show proc ‘/current_queries/7c56495f-ae8b-11ed-8ebf-00163e00accc/hosts’;
+ — — — — — — — — — — + — — — — — -+ — — — — — — -+ — — — — — — — — + — — — — — — — -+
| Host | ScanBytes | ScanRows | CpuCostSeconds | MemUsageBytes |
+ — — — — — — — — — — + — — — — — -+ — — — — — — -+ — — — — — — — — + — — — — — — — -+
| 172.26.34.185:8060 | 11.61 MB | 356252 Rows | 52.93 Seconds | 51.14 MB |
| 172.26.34.186:8060 | 14.66 MB | 362646 Rows | 52.89 Seconds | 50.44 MB |
| 172.26.34.187:8060 | 11.60 MB | 356871 Rows | 52.91 Seconds | 48.95 MB |
+ — — — — — — — — — — + — — — — — -+ — — — — — — -+ — — — — — — — — + — — — — — — — -+
3 rows in set (0.00 sec)

When it comes to monitoring, a few key commands can go a long way. Now that you know what to keep in mind, monitoring with StarRocks should be a breeze.

This marks the end of the fifth and final installment in my best practices series. Working with StarRocks continues to be a real joy and I’m excited about what the future holds for the project. In the meantime, join me on StarRocks Slack and let’s connect.

--

--