×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Options for Distributed/Grid Databases

Options for Distributed/Grid Databases

Options for Distributed/Grid Databases

(OP)
I'm researching the area of desktop grids and wondering what options there are for running databases across many nodes. The sorts of applications you normally read about (eg SETI) don't have a typical corporate transactional model.

I've used Oracle 10G in the past, which I think is a good candidate but it's expensive of course. MySQL seems to run its cluster in memory which might be limiting. However maybe replication achieves a similar effect. Do you know of other packages that could be spread across a number of nodes?
 

RE: Options for Distributed/Grid Databases

don't forget the MSDE.... It's the 'portable' version of MS SQL Server. Oh... and it's free as long as you stay with in 2GB Database size (If I am remembering correctly).

However, SETI and Rosetta read flat files containing the segment data, process it and write the results to another file. Then once finished, it connects to the central server and uploads the results file.

--------------------------------------------------
"...and did we give up when the Germans bombed Pearl Harbor? NO!"

"Don't stop him. He's roll'n."
--------------------------------------------------

RE: Options for Distributed/Grid Databases

MSDE is being phased out but is being replaced by the Lighter Free versions of SQL 2005 and 2008, Express and Compact.

MS SQL 2008 Versions

RE: Options for Distributed/Grid Databases

(OP)
For a grid application you need a database which is spread across several nodes - maybe anything from 4 to 32. It needs however to behave as if it were on one server. I don't know much about SQL Server - does that work in a grid manner?

RE: Options for Distributed/Grid Databases

It does depend very much on what you want and need to do. Just spreading the load across many desktop stations will only work if each has the full data and then this is the problem, keeping them in sync.

If each station only has partial data and you query something into "the cloud", you will need some central server that knows where to find which data, or you need a p2p protocol that will flood the net with the query and that can make it slow again.

If you distribute data to locally crunch on it and commit a final result (like SETI) any file based database might be a choice besides small sql server editions.

Bye, Olaf.

RE: Options for Distributed/Grid Databases

(OP)
You're right Olaf.

I'm thinking you either need some sort of replication arrangement where each node is constantly propagating updates (using some kind of optimistic locking scheme) or where each node calls a single database. If that database has to run on expensive exclusive servers then you're not achieving the full impact of workstation grid. Oracle can appear as a single database but run on many low-cost servers. Whether it would work on workstations connected only by an office network I'm not sure. It may not be able to deal with the extended latency, at least in its current form.

The management processes of distributing work and checking it is being serviced, already exists but I've not so far spotted the conventional transaction database element solved.  

RE: Options for Distributed/Grid Databases

Quote:


The management processes of distributing work and checking it is being serviced, already exists

So I assume it's more like SETI or distributed calculations of wheater, raytracing, something compared to that?

Is it designed in such a way, that clients "check out" data for their processing and then "check in" the data again, after they are finished?

Or do other clients need to have access to the same data too as it's non exclusive access?

Bye, Olaf.

RE: Options for Distributed/Grid Databases

(OP)
Oracle works by dividing up all the blocks equally amongst all the nodes - not the data but lock control. Incoming requests are directed to a specific node depending on loading messages they send to each other. That node 'knows' which other node owns the block(s) it wants, and asks for control. It then behaves as normal and returns control to the owner node when it's finished its transaction. In the meantime the data is locked as in a single cpu situation ie any other node asking for access to the blocks will be denied by the owning node.

The data would be typically on a SAN but I guess you could virtualise that as well.

For Oracle Grid there is no 'master' node. It's a peer arrangement. I think this would work if
1) the heartbeats could be configured to tolerate sufficient latency (it expects a dedicated LAN between the servers)
2) you don't killed on per-cpu pricing.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close