Geoff,
Thanks for the reply. I realise it could take a bunch of coding to properly parse a formula, which is why I was asking in the original post if any of the clever chaps here knew of any way to access Excel's internal representation of the formula structure. However, the problem is not quite as hard as having to do a full parse of all possible excel formulae - see my reply to Skip to see why.
Skip.
I am considering creating a simple model development tool in Excel. It is partly a "business case" and partly an intellectual exercise.
I recently put together a feasibility / tradeoff model for a particular optical sensor. It started off as a simple list of inputs and calculated values, but the logic was quite complex and a little hard to understand - especially when discussing the results with colleagues. So I tried assigning the values to text boxes which could be connected via connector arrows and moved to make the logical flow and internal dependencies more obvious.
This actually proved very useful, but it was a tedious p.i.t.a. to do the assigning of the data to the textboxes. It also occurred to me that if the process was automated it could do more than simply display the logic flow and would also be a very quick way of building a model in the first place.
Therefore, I have been thinking about putting together such a tool, mostly in my own time, partly because I think it would actually be quite useful at work, but also as an interesting exercise.
The idea is based around a set of interfaces as follows.
1 Variable Parameter Input
This would enable the user to specify a set of parameter values associated with each variable, e.g.
Name: Integrated_Photons
Description: Number of photons captured per frame by each pixel
Units: Photons
Default Value: 1000000
Use Default: no
Formula: Source_photance*pixel_footprint*sensor_subtense*staretime
The purpose of the default values would be to allow the user to run the model while it was being developed (e.g. before the derivation of a given parameter had been defined) or to determine desired values to achieve specific downstream results.
2 Variables Listing
This would be a list of all the variables currently in the model – including not just the ones for which the definition had already been given as above, but also those whose existence is implied by their inclusion in the formulae. For example, the above variable implies the existence of the additional variables “Source_photance”, pixel_footprint”, sensor_subtense”, and staretime”. Clicking on a defined variable would activate the input form for that variable, allowing it to be edited. Clicking on a currently undefined variable (which would be displayed in a different colour) would activate a new input form to prompt the user to define it.
3 Tabular Data
This would show all of the parameters of all the currently recognised variables. Essentially this is the actual Excel model.
4 Graphical Display
This would show a text box per variable, laid out in a logical flow given by the dependence hierarchy, with connecting arrows showing dependence. The user would be able to arrange these by hand or to select “auto-placement”. Types of variable would be colour coded – e.g. final value, calculated intermediate value, user-input, physical constant. Any clash of units would also be highlighted here (and probably in the Tabular Data section too). For example, if the variable “velocity” is given by the formula “=distance/time” and “distance” is in units of m and “time” is in units of s, then the units of “velocity” should be m/s.
I realise that I might be better off using Labview or Simulink for something like this, but I was developing the original model at home over Christmas, where I don’t have access to either of those, and I do frequently work at home, so being able to do this kind of thing in Excel is quite handy. Also, both of the above are pretty much terra incognita as far as I’m concerned. In addition, I thought it might be a fun exercise (which is why I’ll mainly be doing this in my own time – except for stuff like this message!).
I hope the above answers your question.
Anyway, I’m sorry if that was a bit long winded, but from the above (very superficial) description, it can be seen why I want to be able to parse the formulae:
a) To determine the existence of implied variables
b) To determine variable dependence
c) To check the consistency of units
However, given the way in which this will be used, I can’t see any reason why I would need to explicitly specify any actual cell addresses – all variables would be referenced by name alone. Consequently, I at least don’t need the parser to handle addresses, which might make the job a little easier. In fact, as I write this, I think I might have the answer. All functions must contain brackets, so any text which falls between an operator and an open bracket “(“ must be a function and can thus be ignored. Hmm, I’ll have to think about that.
Tony