×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

General DB question - looking for best option

General DB question - looking for best option

General DB question - looking for best option

(OP)
I need some suggestions/advice.
I've been asked to write an application for a new venture on which a friend is about to embark.

We're looking at no more than a 1 year turn around.

In the Microsoft arena, I've done C++ (6.0 and some .net) for the last 8 years, VB and VBA programming...off and on...for about 7 years, as well as experience with many other languages.

This application would need to I/F with a database of hundreds of thousands of clients. We'd need the ability to archive client data when expired (without removing it) as well as generate reports & statistical charts. I've never written an application like this before (using a databse) and trying to determine the best language and database to use...one that places the least amount of restrictions on the customers.

So I guess the question is...what are the options...what could provide the easiest turn around...work best with C++ or VB...how can I find out my options??  I'm a little overwhelmed not understanding my choices.

Thanks in advance.

RE: General DB question - looking for best option

The database platform can be just about anything.  You can use MySQL, Microsoft SQL Server, Oracle, etc.  SQL Server and Oracle both have free versions, but they have limits.  To get larger versions of the databases you'll need the paid for version of the databases.

Personally I'm a SQL Server guy myself.  It can easily handle the load you are talking about.  As for the client application C#.NET or VB.NET will probably be the easiest.  C++ will also work, but will require more work.

The other nice thing about using Microsoft SQL Server as the database is that you don't have to install or configure any database components on the client workstations as Windows comes with all the SQL Server drives preinstalled.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

RE: General DB question - looking for best option

As an alternative if you're familiar with classic VB there is no particular reason to change. It will happily interface via ADO onto any of the standard databases. For modest database needs you can use MSSQL free edition. For heavy duty use I would go with MySQL. It's open source, heavy duty and free. You only need to buy the support you need.

___________________________________________________________
If you want the best response to a question, please check out FAQ222-2244: How to get the best answers first.
'If we're supposed to work in Hex, why have we only got A fingers?'
Drive a Steam Roller
Steam Engine Prints

RE: General DB question - looking for best option

(OP)
Thank you so much! That helps tons and gives me the direction I need.

RE: General DB question - looking for best option

Hundred of thousands of clients.. Database performance will be a significant issue. Don't do any designing until you read up on that for the database you have chosen. The database design is a key element in how well it will perform and the hardest thing to change after the go live date when you find out performance isn't good enough.

Database programming and design is a specialty in it's own self. This sounds like a large project. If you can hire another developer to help with this end of things (since you have little database experience from what you said above), it will help the project move faster. If not then one thing you need to start doing is learning to think in terms of sets not in terms of manipulating one record ata time. Also int erms of general database understanding and knowledge herw are some helpful articles. Some are SQL Server specific, but reading thenm will give you an idea of what kinds of things you need to think about in doing the designing and programming

http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx.
http://www.tek-tips.com/faqs.cfm?fid=4785
http://www.sommarskog.se/dynamic_sql.html

You will need to learn about indexing as well (sorry dont; havea good article handy onthat one) and a good perfromance Tuning book for your selected database backend is a must as well as a good book on the flavor of SQL it uses.
  

"NOTHING is more important in a database than integrity." ESquared
 

RE: General DB question - looking for best option

Quote (SQLSister):

Hundred of thousands of clients..

Just goes to show that the original post was ambiguous.  SQLSister, I don't think the OP meant that there would be 100,000 client users, rather that this would be a database of 100,000 records of client data.

OK, mom, which is it?

RE: General DB question - looking for best option

That's a very good point.  The answer to that question will determine a lot about how much horse power your database need will need to be able to handle.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

RE: General DB question - looking for best option

(OP)

Yes...hundreds of thousands of clients meant

The business will have hundreds of thousands of records of client data.

I'll take a look at those articles.
I feel like such a newbie! All advice is greatly appreciated!

RE: General DB question - looking for best option

hundreds of thousands of client records actually counts as a small scale database.

Just about any RDBMS can handle that sort of numbers with ease.

As others have said before, familiarise yourself with general database principals first. Database platforms will most likely be decided by cost and ease of deployment.

--------------------
Procrastinate Now!

RE: General DB question - looking for best option

For something of this scale, have a look at MS Access. Nice to have built-in frontend forms, reports, and queries all in one place.

"Don't be irreplaceable. If you can't be replaced, you can't be promoted."
 

RE: General DB question - looking for best option

Quote:


We'd need the ability to archive client data when expired (without removing it) as well as generate reports & statistical charts.

If reporting is to be the primary role of the data base then a star schema may be a better option.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
                           - Daffy Duck

RE: General DB question - looking for best option

(OP)

Those articles bought back what I learned in college but haven't touched since.

And really...hundreds of thousands of data records is considered a small database?

Thank you all.  

RE: General DB question - looking for best option

yes, hundreds of thousands of records is a small database.

a better measure would be the disk size of a database, since that takes into account structural complexity, i.e. lots of tables.

anything under a gigabyte I'd consider to be a small database...

--------------------
Procrastinate Now!

RE: General DB question - looking for best option

I consider anything under 500GB as small.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
                           - Daffy Duck

RE: General DB question - looking for best option

Database size isn't the only measure of how large a database is.  Transactions per second also need to be included.  A 10 Gig database with 50k tps would probably be a large database.

A 500 Gig database with 50 tps probably isn't such a large database.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

RE: General DB question - looking for best option

I don't know the size but we have tables that are storing case information going back 25 years with tables that contain millions of records...defendants, charges, hearings, sentences.

Leslie

Have you met Hardy Heron?

RE: General DB question - looking for best option

(OP)
I'm sorry...

"A 10 Gig database with 50k tps would probably be a large database."


"a better measure would be the disk size of a database, since that takes into account structural complexity, i.e. lots of tables."

Could someone expand or point me to articles directly related to this? How would you measure tps? Obviously this needs to be a consideration during development - what sort of analysis would you do to help you determine this?


"I don't know the size but we have tables that are storing case information going back 25 years with tables that contain millions of records...defendants, charges, hearings, sentences."

What kind of database are you using?

To All...sincerest appreciation for helping me come up to speed on this.
 

RE: General DB question - looking for best option

The transactions per second is a perfmon counter that you can look at.  It's under MSSQLServer:Databases > Transactions / sec.

I'm not aware of any specific articles which document any guidelines as there really aren't any.  There are a lot of variables that need to be taken into account when deciding if it's a large database.  There are no hard and fast rules.

Some of the variables will be:
Space used
Data acquisition rate
data deletion rate
Transactions per second
Uptime requirements

What makes this harder to figure out is one of the other variables is the skill of the DBA managing the database.  For some DBAs a large database would be something that is 50 Gigs with 100 Tps that has a 9-5 uptime requrement, for others 1TB with 30k tps with a 24x7 uptime is a large database.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

RE: General DB question - looking for best option

All that being said, I'm in the middle of a couple of performance tuning articles, and this will probably be something which I cover within the articles.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

RE: General DB question - looking for best option

TPS is a valid measure of an OLTP system but if it is an Data Warehouse/OLAP system then table size is a better measure. Dimensions with 100s of thousands and fact tables with 100s of millions to billions to me is a VLDB, but they may have a low TPS.

The original post states

Quote (WorkingMom3):


This application would need to I/F with a database of hundreds of thousands of clients. We'd need the ability to archive client data when expired (without removing it) as well as generate reports & statistical charts.
In my world this is a Data Warehouse based in a star schema, as the need is to archive records and report against them.  TPS for processing the data in should be less of an issue than the query time to satisfy reporting needs.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
                           - Daffy Duck

RE: General DB question - looking for best option

This is very true, TPS is basically worthless when dealing with OLAP.

Can the op tell us if it's OLAP or OLTP?

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

RE: General DB question - looking for best option

Quote:

"I don't know the size but we have tables that are storing case information going back 25 years with tables that contain millions of records...defendants, charges, hearings, sentences."

What kind of database are you using?

It's an IBM iSeries (previously the AS400) that uses a DB2 database.   

Leslie

Have you met Hardy Heron?

RE: General DB question - looking for best option

(OP)
You guys are great.

The data is what I would call critical data, however, for this initial release, I'm finding the data is rather simple, if that makes sense. I'm finding that most of my data fits this pattern:

tblClientTable
  clientId
  OtherClientFields

tblClientReportLink
  clientId
  reportId

tblClientReports
  reportId
  otherReportData
  Other Fields

with clientId being the "centric" ID, if you will. (Yeah...no...I'm not yet there on DB lingo. I have to look up practically everything you guys say...but I'll get there)

Pulled the following off a google search on OLTP vs OLAP. My "answers" are in bold and annotated with ***

Purpose
OLTP: To support business processes.
OLAP: To assist with making better business decisions.
** both intentionally selected

Where the data comes from
OLTP: Online entry, point of sale devices, data loggers etc.
OLAP: OLTP databases, flat files.

Criticality
OLTP: Generally considered missioncritical. Regularly back up to protect from data loss.
OLAP: Important to decision support; however; organizations can continue to operate without it. Back up less often than OLTP databases.

Database Size
This was the original question. Unsure of how to answer this as of yet
OLTP: Relatively small database size if the history information is regularly archived. Can quickly bloat if history is not removed.
OLAP: OLAP database can be very large. Contains historical information generated by various OLTP databases within an organization.

Concurrent Users
OLTP: Potentially very high numbers of users. Some OLTP databases support thousands of concurrent users.
OLAP: OLAP databases support a much lower number of users than their OLTP counterparts. ** expect anywhere from 10 to 50 concurrent users

Response Time
OLTP: All application queries should complete in less than a second.
OLAP: Queries commonly take seconds, minutes, and hours.

Data Changes
OLTP: High number of insert, update, and delete operations are generate by user transactions.   ** no deletes – inserts and updates only – corrections handled by annotated update
OLAP: Very low number of data changes generated by users. Data is updated by bulk import batches.

Ad hoc querying
Unsure of how to answer this. Queries will be "predefined" but tools will be created to allow creation of limited "user defined" queries.
OLTP: Poor ad hoc query performance. Indexes are only created to support defined application queries.
OLAP: Very good ad hoc query performance. Indexes are created to support queries possible queries.

Querying complexity
OLTP: Data is highly normalized requiring many joins to retrieve information.
OLAP: Data is denormalized requiring few joins to retrieve information.

If I'm correct in my understanding of normalized vs denormalized data, the tables are completely normalized (which I do understand could possibly have issues but none have arisen yet)

It would appear that OLTP would be the answer to the question.

I guess I need a better understanding of TPS.

 

RE: General DB question - looking for best option

Actually, you need both OLTP and OLAP.  If your application supports business processes, this suggests that transactions are involved (orders, workflows, whatever).  Therefore, the business process components should be supported by an OLTP database.

The reporting component should be underpinned by an OLAP database.  This database will of course be fed by the OLTP.

One thing to remember when you design a system that supports transactional business processes:  NEVER run reports directly against the transactional database.  Have a process in place that extracts the OLTP data, transforms it to the proper cubed format, and loads it into your OLAP reporting database.  This ensures that transactional performance will remain stable and that you have the right schema in place for a flexible reporting solution without rejiggering the OLTP schema.

My US$.02

Phil H.
Some Bank
-----------
Time's fun when you're having flies.

RE: General DB question - looking for best option

(OP)
Phil, thanks for your 2 cents. Any chance you could expand just a little. My impression of OLTP vs OLAP was that they are database design paradigms. Maybe you could recommend a good book.

And, back somewhat to the original question...does it seem like I can use Access?

RE: General DB question - looking for best option

I am an Access fan, unlike many of the people above :), yet I would have to say that I do not think that this project is suitable for Access unless the database will only be used by one or two people for statistics, reporting and related items.

RE: General DB question - looking for best option

Based on the information provided, I'm thinking that Access won't be the correct solution for you.  I'd probably recommend SQL Server Express Edition for this project.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

RE: General DB question - looking for best option

The original posts leads me to believe that the transactional systems already exist so a OLTP system probably isn't needed.  As for OLAP versus OLTP they are design methodologies for specific types of systems.

The briefest comparisons are:
OLTP: Designed to quickly insert data.
OLAP: Designed to quickly retrieve data.

Granted that is like the most basic explination but the fact that the 2 are support the opposite of each other, which means that most everything in the design will be different.

I believe there are better things to worry about at this point than the platform.  Some points to consider first are.

Primary purpose?  To me it sounds like it is data archival and reporting.

User Scalability? How many users will you need to support for the reporting piece.  This can have an impact as in a 3NF system the overhead of mutiple users can seriously impat performance. Where a Star Schema has much less overhead.

Data Scalability: How much data will you begin with, how much data do anticipate in 6 months? 1 year what is the anticipated growth in volume and how many years of history will you retain and load initially.

User Ability: how tech savy are yuor users, do they require simple reports that are delivered at the click of a button, or do they require parameterized reports or maybe ad-hoc capabilities.  This is important because this is where the users will spend their time and why reinvent the wheel when there are dozens of applications already built to do this, but some applications target specific backends.

Good luck!

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
                           - Daffy Duck

RE: General DB question - looking for best option

(OP)
I'm trying but it really seems as if all this is going around in circles.

Would anyone be able to give me example answers to Paul's questions?  Why are there better things to worry about than the platform? Isn't that the reason for asking the questions he's asking...to determine the platform?  

Primary Purpose: His statement is correct.
User Scalability: Between 20 and 40 users at one time
Data Scalability: While still trying to grasp that myself because the application is new to me, my strong feeling based on what I've completed so far is that data will not grow at a huge rate. That is, input during a single day, month and year is somewhat constrained by the nature of the business. After a certain period of time, some customer accounts will be archived (just put "in a box on a shelf" just in case with no real active intent of accessing them).
User Ability No, I do not expect users to be tech savy. I will need to automate report generation for them. In addition to the push-a-button-reports, I will need to create a mechanism for them to "create" their own reports (that is, modify a push-a-button-report based on dialog box selections)

And though I said it before...thank you so much for all your time!





 

RE: General DB question - looking for best option

My reasoning for saying other things are more important than platform is that the design to some degree should be platform independent.  I don't design a DW system for SQL Server I design the DW system according to the business needs.  If it is to be a fairly small system with minimal users then a smaller DB engine could probably handle it. IF it is a large system with many users then other requirements come into play.  30-40 concurrent users on even a medium size database doing statistical/analytic reporting will have an impact on what technologies I use.  The technical abilities of my audience will also impact my decisions.  Now wouldn't you look good if you went to them and said we can do this in MySQL to only later realize that intial scope was really lacking?  Then you realize that users need response times measures in 0-3 seconds? that dashboarding tools need to be implemented?  Tools in your chosen technologies don't easily lend themselves to report creation by the non technical savy?

Archiving and Reporting is what a Data Warehouse and Data Marts are designed for.

Reporting well you have a ton of tools designed for presentation.  Cognos, ProClarity, Sql Server Reporting Services, Performance Point, AlphaBlox, Business Objects and Micro Strategies are just some of the bigger names. Each with its pros and cons and some specific to a db platform.

Scalability becomes an issue more on response times to user requests and the amount of data. Most users want a reporting system to spit the reports out with little to no lag which becomes an even greater challenge when you increase concurrent user count.  A change in data volume may increase your ETL times if your data increases 10% should it take 10% more time to load and can you afford that time.  Again these answers may impact your platform decision.

 

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
                           - Daffy Duck

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close