In my previous post I shared some SQL scripts which would give you the number of undistributed commands by publication, by publication and article, and by distribution database. So you can measure undistributed commands at a very high granularity (articles), or at a very low granularity (the entire distribution database.) How granular you go depends entirely on where your pain is: you could of course filter out to only a few articles, or a particular publication.
Let's say you choose to monitor at the publication level. The best thing to do is to have a SQL Agent job run immediately prior to your “Distribution Clean Up” job so that you get the count of undistributed commands. You could even choose to create a step in the clean up job yourself. The step should be something as simple as the one below:
--create table in tempdb first INSERT INTO tempdb.dbo.undistributedcommands SELECT p.publisher_db ,p.publication ,COUNT (1) ,GetDate() FROM MSrepl_commands cmds INNER JOIN MSarticles a ON a.article_id = cmds.article_id INNER JOIN MSrepl_transactions t ON cmds.xact_seqno = t.xact_seqno INNER JOIN MSpublications p on p.publication_id = a.publication_id GROUP BY p.publisher_db, p.publication
Leave the job to run for as long as you want to get a baseline for. I left it for 10 days. At this point, I got a good history of undistributed commands and where a DBA had to take action to keep replication working, and when it was under heavy load but still working OK, and when replication was under no heavy load.
After this you can then decide how you wish to monitor this, like either on a NOC wall or via a job and alert when thresholds are hit.