Sybase Performance & Relational Database: Data Distribution

Welcome!

It appears that it is time I created my own blog, and served you in a couple of areas, which happen to be my favourite subjects, my life-long pursuits. If you are used to my attendance at other sites, then you can expect at least the same service levels here.

Article I plan on two types of articles: your worthy questions, which will be the same as a thread on any newsgroup, with accepted comments and responses; and full technical articles which may be of value to you, submitted by me or others. Look for the New Question or Feedback post below. Over time, I expect you to provide you with an understanding of Performance and Standards.

Protocol This is my first attempt at this, so please be patient, the format has not settled in yet. Rather than the free-for-all available in public sites (or private, censored, vendor-hosted sites labelled as "public") , what I would like to do ala ExpertsExchange, is to have many viewers but fewer invited writers. Village idiots, ADHD and OCD sufferers are welcome to visit, but should note that they enjoy no special status: spraying graffiti on the site will result in the same consequences for all people. At this stage, to Post Comments or New Questions, you need to a Google Account. I can be reached at first.lastname@gmail.com.

Relevance The web is flooded with rubbish. Let's try to keep this site brief and meaningful. As always, we will place principles above personalities. Everybody has one, there is nothing we can do about it, I'll ignore yours, and you ignore mine. This will let us get on with the job; the technical facts; the reason you visited. Be professional, but express yourself freely, the process moves faster.

The web is full of misinformation. I have a particular interest in helping people to operate at their full capacity, which means providing the relevant information so that they can destroy the myths and fallacies that hinder them. Unfortunately the proponents of these myths are often vendors, due to their commercial interests, so that may place me in conflict with them. Standards and technical facts are above, and independent of, vendors; as an ardent proponent of same, I remain vendor-independent.

Enjoy, and let me know if you don't.

Copyright © 2009 Software Gems Pty Ltd

2009-03-15

Data Distribution - Mario Ascione

Distribution of data over different disks for better disk utilization

Without posting the entire thread, which has a somewhat different context, I have picked it up at the point where the unanswered part starts. Mario Ascione posted:

The table has a unique clustered index on the primary key "test_id" [identity] with lock datarows and some non-clustered index.

That is not a relational key, it is a physical, non-relational key which will cause a problem everywhere it is used. Let's look at one instance ...
[bxA]

Filling the table happens what [person] said (first reply):
"Note: Once you place a clustered index on your roundrobin partitioned table, future inserts will be placed sequentially (not randomly) on the various segments on which the partitions reside; in other words, the partitions will become unevenly populated with new inserts until the clustered index is re-created (which will equally redistribute the data, in index order, over the partitions/segments)."

I do not agree with the person's wording, but in the main ... with a view to continuing this post without going back and forth ... ok.

So only the first partition is populated while the others remain empty.

Well, that is what you have told ASE to do. The "clustered" index tells ASE to PLACE the rows in that physical order [it doesn't really but that is a different point, outside the scope of your problem]. The partitioning tells ASE to use RR, to place the rows in the heap (not the "clustered" index) in the four partitions in RR sequence. The Identity "key" is always going to be next-sequential: that means the "clustered" index remains a B-tree with true next-sequential leaf ordering (the space usage is probably not visible to use); but the data rows are RR across 4 partitions. BUT you ran your test from one session, one spid: that does not provide RR; the RR is for each sequential insert from DIFFERENT spids; all inserts from one spid go into one partition.

Basically you have a few contradictions, and ASE it doing "what you told it to do".

There is another issue, it is hard to learn by yourself, making mistakes and correcting them one by one. It is easier to learn from examining good implementations.

I have to drop and recreate the clustered index in order to distribute data evently over the four partitions.

Yes. In this type of situation only. In other situations, definitely not.

AND this is invoking the characteristics of the "clustered" index. But right afterward, as you insert new rows, it goes back to uneven distribution.

Some questions:
How can I avoid drop/create clustered index?

You can't in this case. If the test was more real (multiple inserting spids) the data would be distributed ... but that conflicts with the CI definition you have given it, so "suck it and see".

I've read that It's wrong use CI on a datarows table.

It is not "wrong" to use a CI on DPL/DRL. First, the CI is not a CI ala APL, so do not expect that operation. Second it is PLACEment only; it remains a separate detached B-Tree, but on the same segment as the data heap. Third an Identity or next-sequential number or datetime is the wrong key for a CI. Basically, 90% of the time, people choose the wrong key for the CI on DPL/DRL, and they have incorrect ideas re placement/distribution. Therefore I suggest not having a CI on DPL/DRL, just have NCIs, you lose nothing except the confusion.

Should I use a non-clustered index instead of clustered one?

In this case, that would be a simple way of overcoming the confusion and getting what you intend. Drop the CI; create an NCI on test_id; run your inserts from four spids, and you will get even distribution of data across the partitions.

Thanks Derek for the clarification.....

just one doubt:
what do you mean with "four spids"?
number of worker process? I suppose not (the worker process is an ASE internal thread; I've configured 12 wp).
multiple threads ("session") performing insert operation? yes means that partition table is good "only" in multithread environment.
Others?

Correct. Worker processes are internal to ASE and support parallelism, I wouldn't express it as "only good for multi-threaded environments", the server is beautifully multi-threaded; parallelism needs a multi-processor environment. Each spid/connection will have many WPs.

By "four spids" I mean four sessions, four separate invocations of isql, which will simulate four separate user/client connections in a live system. The inserts of each separate spid/connection will "Round Robin" across the heap partitions in a DPL/DRL table.

1 comment:

quelli della 5F said...: Hi all,

just for close the issue; I've done a test:
dropping the clustered index on a partitioned table and leaving the table only with non-clustered index, data insertion is distributed evenly across partition.

Thanks
Mario; 23 March, 2009 20:43

Sybase Performance & Relational Database

Welcome!

About Me

Labels

About Me - Personal

Blog Archive

The Irredeemable Geek

Geek Code

Alternate Location

For the Last 13 Years

2009-03-15

Data Distribution - Mario Ascione

1 comment: