As The Crow Flies

« Tomcat 5.5.20 bites... | Main | CSS text-transform »
Wednesday Mar 07, 2007

Purging a large set of data from Clarify

One of my customers actually had 2 reasons to purge information from Clarify.

The first instance was when they sold off a product line and as a result they actually needed to purge all customers, cases, cr's, etc that was solely linked to the divested product.  We actaully handled this with the baseline purge/archive features of dataex.  We only had to write a few scripts to set the arch_ind flag and modify the *.arc files for some exentions we had added to the schema.  Overall it went well.  We had a dataset of about 50000 cases + cr's and we managed to purge about 2500/night.  We also were able to keep the system online while this was running.  So in a short while we had it done.

In the second instance, we had much, Much, MUCH more data to deal with.  i'm talking about ~17 million sites + ~17 million contacts + ~10 million opportunites + all the related activity logs, notes, etc.  No way was this going to be handled using dataex purge/archive.  We looked at some of the commercial solutions, but all of them were priced around $50k.  This is was a one off deal, so we figured we could get it done more cheaply using consulting $'s.  Our first design was to write a series of pl/sql scripts to mark the records in the objects to be purged.  Then either run a series of parallel deletes on the individual objects or run a delete for each business object (and it's related objects) as a transaction.  After some discussion we opted for the transaction delete approach.  After some testing, we concluded (my memory is a bit fuzzy here) we could probably get 100k records deleted per hour and it would require the system to be offline during the purge.  That left us with a couple of all weekend outage windows.  Unacceptable.  We then hundled around the room again and the Oracle DBA's came up with a solution (see sometimes they actually do earn their keep!).  We would flag the records as originally designed, but then they would partition the tables based on our flags.  Once the tables had been partitioned, they would then drop the partitions.  This worked flawlessly.  We were able to keep the application online and do all the work in the background with minimal impact to the users.  As a bonus, even though this need was a one-time purge, we can use these scripts in the future for further purges, or in some cases, we have used them to migrate data to other applications.


Comments:

Post a Comment:
  • HTML Syntax: Allowed