Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can MS SQL*Server do this?

Status
Not open for further replies.

klamerus

Programmer
Jun 23, 2003
71
US
All,

We are considering migrating from a document management procduct (Documentum) for which we've created an internal web interface to simply putting the documents in a database. Documentum really acts as a middle man. It stores the documents and manages the full-text index, but relies upon a commercial database for the metadata (either Oracle or SQL*Server). We don't use full-text indexing on the documents in question, just metadata, but we also want to store the documents in "image" fields in SQL*Server too.

Another motivator for our doing this is that we're on Windows NT (still) and using Oracle 7.3.4. So, we have to move.

The current DB server standard is SQL*Server 2000 on Windows 2003. Server H/W is new (3 GHz and faster) and we use SANS. The number of concurrent users is almost always < 50 combined for the 3 document management systems we want to migrate.

Our concern is can SQL*Server scale to have a single DB store ~ 800-900 GB of files (~25,000,000 files), which is the size of our largest system. In fact, to be save, we'd want to know we could storage ~ 1 TB comprised of ~ 40,000,000 files in a single database. Each of the other two systems are about 1/4th this size, but we'd also like to put all three databases on a single database server (4 CPU, 3+ GHz, 4 GB RAM or more, etc.).

Does anyone know if what I'm describing is possible and has anyone actually seen it done? I've read the MS literature and it all seems very feasible, but the real world is always a different matter from marketing.

Are there any "gotchas" to consider (such as breaking up the DB files into # GB chunks) to ensure performance or reliability?
 
It is possible, but as this is entering the area of warehouse dimensions I do not have experience on it, so my comments are not probably of great help. I am sure others here will give you further help and point you to some usefull links (please do!!!)


My advise, based on some customers with the same amount of data, also on SANS with Oracle, is to split the database into several database files, and eventually into more than one server instance (even if on the same physical machine).

How you do this will depend on your database design (e.g. relationships between tables) and how your code works with them, and on how frequently each table is accessed.



Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Thanks,

We had planned to split the DB at about 100 GB / Database file to make recoveries easier if one were to need restore (for whatever reason).

The situation now is that each file is in the 50-500 KB range and so backups take > 1 day (the files are directly on the file system and managed by Documentum).

The additional question would be any experience in using SQL*Server to manage files stored in image fields and any resulting impact on data delivery. In other words, does storing images slow down all other traffic because there's a single thread doing the actual file i/o?
 
By the way, there is no * in SQL Server.

-SQLBill

Posting advice: FAQ481-4875
 
The company I work for has developed a number of medium scale document management systems with external web interfaces and for the situation you describe, while I'm quite sure that SQL Server can do what you want, I think that you're making a mistake to put the files inside the database.

You don't say how frequently the documents change as this will have a huge impact on the backup process. If, as in our applications, only a very small number of documents are added or replaced per week then a simple incremental backup of the files to tape on daily basis will probably suffice.

The SQL Server DB will be much smaller as it will only contain the metadata so it may be possible to back this up daily or better still just replicate it to another server.

Other advantages if you keep the files outside the database are that you can store them as widely distributed as you wish and can hold multiple copies so that if a given file server fails you can easily go to an alternative file server. You can even implement an interface to a CDROM jukebox storage device if that's appropriate or as an ultimate fallback retrieve a file from tape!

Could you provide a bit more background as to the frequency of change and the nature of the internal web application? Also is there any 'quiet' time, say overnight, when backups will have minimal impact on the users?


Bob Boffin
 
SQLBill

Thanks. Somewhere along the way I started putting in the * and I don't recall where. Old habits die hard.

Wo far as the the backups go, the situation is that we currently have 15,000,000+ documents.

The company requirement is that for an enterprise system (such as this one), a full backup is required weekly with daily differentials. At this time, those weekly backups can take > 50 hours. This is with state of the art TSM software and a dedicated backup network. The backups don't take anywhere near this long with fewer larger files (as tested with our fileshares).

The issue is that the time needed to traverse directories, check date time/stamps, and copy and checksum files that need to be copied.

We have also created a number of medium sized systems (with < 500,000 files) as you've described, but the issue here is the sheer volume of files and the length of time to back them up.

While we were looking at what to do, we concluded why not just store the files in another field and leverage the storage management built into the database. Clearly, MS added it (and uses it in SharePoint) for a reason, and so has Oracle.

I've been doing Doc Mgmt for about 18 years and frankly this seems to just be a matter of technical evolution. These specialized systems originally evolved to handle non-magnetic storage (opticals), and hierarchical, and other systems before magnetics could handle it all. Now days you can get 1 TB for your home system.

That said, who knows what all issues there are with doing this. So far, we've spoken with MS (we have MS premier support), and they say it's reasonable - but we like to get second (and third, fourth, etc.) opinions and do testing before doing anything.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top