Rajasekhar Reddy asked:
Could you please let me know the best way to get alarms before tempdb gets full ?
A method of dealing with
tempdb filling up, and analysing the causes (thus preventing them in the future) is provided here. This is a second generation method, useful to all ASE servers, with unknown, untamed
tempdb content; the first generation rendition is quite adequate for tamed and controlled databases. This has been implemented and tested in large Production systems. The approach is important, and different from other approaches that have been evaluated, in two requirements:
[bxA]- monitoring and analysis of the conditions that lead to
tempdb full (rather than merely reacting to a full
tempdb)
- administering Sybase from external scripts (this should be quite ordinary for test and production systems; scripting for Sybase is not discussed here)
Cause & PreventionHandling the symptom without understanding and addressing the cause is a never-ending story of fighting fires, quite unfulfilling. The two common causes and prevention are identified here, so that the firebugs has be hunted down.
tempdb is the busiest database on any Sybase server. Any effort that is made to affect these two areas will substantially reduce tempdb usage, and therefore affect overall server performance.
1 A large
tempdb, and the event of it filling up, is caused by the over-use of temp tables; particularly re-creating large result sets as an intermediate step in processing. As the main database grows,
tempdb has to grow as well. This is caused by the
lack of Normalisation; in fact a large and active
tempdb is a classic hallmark of this condition.
2 For Normalised databases, the second cause is poor SQL coding that fails to take advantage of the power of Relational paradigm (joins, subqueries, set-processing, conditionals), and instead reverts to creating intermediate worktables and uses procedural processing (row-processing; cursors or cursor substitutes). That is, the coder has a Relational Database but has coded in non-relational style.
User
tempdbs provide a containment factor, however, if the cause is not addressed, these simply spread the pain and present additional administrative tasks to the DBA.
Synopsis & MethodTwo thresholds are identified:
1 A Report Threshold for space used by an object in
tempdb, below which the object is of no concern, and above which the object is reported (eliminates the multitude of small objects from the report)
2 An Alarm Threshold, for space used by an object in
tempdb, which when exceeded, triggers an alarm (e.g. mail being sent to the DBAs).
Both thresholds are expressed as a percentage of the allocated tempdb
default segment space, with set defaults of e.g. 2% for objects worth reporting and 20% for objects which cause alarm. If tempdb
default and
logsegments are not separated, then the percentages should be slightly lower, to compensate for that.
The method consists of three components:
1 A table
tempdb..sg_object (not a hash table) which holds a series of snapshots of objects which exceed the Report Threshold. This is placed in
model, so that it is created in
tempdb on server boot.
2 A stored proc
sp_helptempdb_sg which can be executed from anywhere, on any [user] tempdb:
- both Report and Alarm Thresholds have defaults (e.g 1% and 20% respectively) which can be overridden with execution parameters
- identify the objects residing in
tempdb, from
sysobjects and
sysindexes, which are above the Report Threshold
- store them in
tempdb..sg_object- report such objects for the current execution [unless executing Silently]
- report "ALARM" (an additional column) for objects that exceed the Alarm Threshold
3 A shell script
sg_monitor_tempdb, within the structure for normal DBA scripts:
- invoked by
crontab every minute or every five minutes
- execute
sp_helptempdb_sg- capture the output_report
-
grep the output_report. If "ALARM" is found:
---
mail output_report to
admin_mail_list (e.g. same list as for "dump failed" condition)
--- [if
login exists in a kill_list, kill spid]
Analysis of CauseNote that
tempdb..sg_object remains for inspection of specific causes, via ordinary simple SQL, logins, objects, and applications can be readily identified. Over time, patterns and trends can be analysed. It is simple enough to connect a spreadsheet application to it and analyse the content. This provides the basis for identification and thus prevention of the cause of large objects in
tempdb; the ability to approach the culprits with evidence; charge-back for resource usage on a user-pays basis; etc.
The table may be filtered and archived into a permanent table in a user database, where longer term trend analysis is relevant.
RelatedAlways set
sp_dboption 'abort tran on log full' for [user]
tempdbs. This eliminates the condition, where many users are hung in
log suspend, and waiting for the DBA to clear it; they get aborted and cleared right away.
Considerations1 User
tempdbs may be monitored as well. For sanity reasons, all reports for all
tempdbs monitored are placed in the main
tempdb.
2 Once the structure is running, the thresholds have been modulated to suit the particular site and
tempdb use, and confidence has been established, the enhancement to
kill the relevant
spid may be implemented.
3 Using resource limits is possible in some but not all circumstances, but the facility itself is limited, and therefore I do not provide recommendations for it.
4 MDA tables could be used. However there are many problems with them, not the least of which is inability to join tables and substantial overhead (as per documentation). In order to use them effectively, one must set up a capture-and-store mechanism. They are thus not worthy of consideration for code that must execute correctly under duress.
Limitations1 This method captures all forms of
tempdb objects except internal worktables (there is currently no method of identifying worktables in Sybase). In any case, worktables are proportionate to temp tables, thus considerate modulation of the thresholds will cover worktables.
(Some might suggest that MDA tables provide information re worktables, but that is false, only the number of tempdb tables and worktables per process is provided.)2 This method does not capture the condition where cartesian products which exceed
tempdb size
within any single iteration of the monitoring script (ie. one minute). Note that cartesian products s are unintended, and controlled by other means; they are not the same as large intermediate result sets that keep growing.
First GenerationIn order to report any segment full, and hopefully do something meaningful if it is
logsegment,
sp_thresholdaction has to exist in any case.
In the first implementation of this method, I modified the threshold proc, to populate tempdb..sg_object, if it were triggered for tempdb default segment at 70%, 80%, 90% and 95% full. That worked well for well-behaved databases and servers, where
tempdb is
not running riot, where the main issue is spikes in
tempdb usage. Those events are usually the result of new code (by new coders, refer Cause [2]), or report tools running without configuration limits. That method stores a snapshot at those specific trigger events only, and is recommended for standard-compliant, quality databases.
For sub-standard databases and anti-social applications, where
tempdb does run riot, something more, in terms of capturing the build-up to tempdb full, is required.
CodeSoftware Gems provides all relevant code for all objects identified here, at a commercial quality level, as a fully integrated part of a
Sybase Server & Database Administration suite. That includes housekeeping tasks suh as trimming and archiving
sg_object. The code required uses ordinary Sybase ASE facilities, it is simple enough to write (ask questions), and it is presented here for those who wish to write their own. The superiority of this method over others, lies in the approach.
Copyright © 2009 Software Gems Pty Ltd