Previous Up

Chapter 7  Technical Notes

7.1  Architecture

7.1.1  File organization

Gnucap source files are organized into groups by the name prefix as follows:
ap
``Argparse''. Generic parser and lexical analysis library.
bm
Behavioral modeling.
c
Commands.
d
Devices and models.
e
Device and model base classes. (``e'' comes from ``electrical'' and is retained because of inertia.)
io
Input and output library, raw, generic.
l
Library. General purpose functions and classes that do not fit elsewhere.
m
Math library.
plot
Obsolete plotting that should be replaced.
s
Simulation engine.
u
Utility functions and classes. Gnucap Specific.
The files ap_, io_, l_, m_ are not Gnucap specific. Although they were created for Gnucap, they are public domain and may be used by anyone for any purpose.

The remaining files bm_, c_, d_, e_, s_, u_ are Gnucap specific, and reuse is subject to the Gnu Public License.

Some of the d_ files are automatically generated during compilation. Do not change them, because your changes may be lost in a recompile. For licensing and distribution legal purposes, these files are considered to be ``object'' code, even though they are readable C++.

The files d_.model, where present, contain the actual model descriptions as input for modelgen, the model compiler. These files are the source that is used to generate the corresponding .cc and .h files. All changes should be done to the .model file. For GPL purposes, these files are considered to be ``source''.

7.1.2  Building, Makefiles

Gnucap uses a 4 part Makefile, designed for simultaneous builds on several systems. A true Makefile is built by selecting and catenating the four pieces. A master Makefile switches to a subdirectory and builds a specialized Makefile there.
Make1
The file list. Specific to this program.
Make2
Compiler and system dependencies. Specific to the compiler. In some cases, hardware dependencies are here. There are several provided. Choose the one that matches your system.
Make3
Basic ``make'' targets. Generic.
Make.depend
List of dependencies.

7.1.3  Program flow

It all starts at ``main'', in main.cc. The function ``main'' has a loop that gets input and calls ``CMD::cmdproc'' to dispatch the command.

Batch mode is done in ``process_cmd_line'', by using ``CMD::cmdproc'' to execute the commands ``get'' or ``<'' which is passed to ``CMD::cmdproc'' as text.

The function ``CMD::cmdproc'' dispatches the command to its handler. The handlers are located in the ``CMD'' namespace, and the ``c_'' files.

7.2  Transient analysis

7.2.1  The ``CPOLY'' and ``FPOLY'' classes

Before beginning a discussion of the evaluation and stamp methods, it is necessary to understand the ``CPOLY'' and ``FPOLY'' classes.

These classes represent polynomials. At present, only the first order versions are used, but consider that they could be extended to any order.

When evaluating a function f(x), there are several possible representations for the result. The ``CPOLY'' and ``FPOLY'' represent two of them.

The ``CPOLY'' classes represent the result in a traditional polynomial form. Consider a series of terms, c0, c1, c2, ... These represent the coefficients of a Taylor series of the function expanded about 0. (Maclauran series). Thus f(x) = c0 + c1 x + c2 x2 + c3 x3 + ... In most cases, only the c0 and c1 terms are used, hence the ``CPOLY1'' class. The series is truncated, so it is exact only at one point. The value ``x'' in the ``CPOLY'' class is the point at which the truncated series is exact, so it is not truly a series expanded about 0.

The other ``FPOLY'' classes represent the same polynomial as a Taylor series expanded about a point ``x''. Again, consider a series of terms, f0, f1, f2, ... This time the terms represent the function evaluated at x and its derivatives. Therefore, f0 is f(x), f1 is the first derivative, f2 is the second derivative, and so on. To evaluate this for some t near x, f(t) = f0 + f1 (t-x) + f2 (t-x)2 + f3 (t-x)3 + ... Again, in most cases, only the f0 and f1 terms are used, hence the ``FPOLY1'' class.

Both of these are equivalent in the sense that they represent the same data, and there are functions (constructors) that convert between them. The ``FPOLY'' form is usually most convenient for function evaluation used in behavioral modeling and device modeling. The ``CPOLY'' form is most suitable for stamping into the admittance matrix and current vector for the final solution.

7.2.2  The basic solution algorithm

In simplified form, the algorithm looks like this ...

before doing anything ....
  expand()
  precalc()
on issuing the ``tran'' command ..
tr_begin() // get ready
for (each time step) {
  tr_advance() // precalculate and propagate
  for (each iteration) {
    tr_queue_eval() // decide which models need evaluating
    do_tr() // evaluate models
    tr_load() // build the matrix of equations
    solve the resulting system of equations
  }
  if (converged) {
    tr_review() // how are we doing? suggest time step
  }
  if (no problems) {
    tr_accept() // postcalculate and accept data
  }
}
The functions referred to above are actually loops that call that function for all devices in the circuit.

For all of them, it is possible that they may not be called. If there is evidence that the result will not change from the last time it was called, it probably will not be called. Since this algorithm is not perfect, it is possible that any particular function may be called twice, so they are written so calling more than once is equivalent to calling once.

expand

The expand functions expand subcircuits and models, as needed. Unlike Spice, it does not flatten the circuit. It allocates space for the additional storage, attaches models, and related tasks. It does not compute any values. It is called once after reading the circuit, and possibly later when the topology of the circuit is changed.

Most simple elements do not have expand functions. Most advanced components do.

Expanding a subcircuit makes a copy of it, and remaps the nodes. Most components use a shallow copy. That is, if something is attached through a pointer, the value of the pointer is copied, not the attachment. Commons are never copied when the owner components are copied.

It is ok to expand a component more than once. Either it frees then re-expands, or it keeps what it can and checks to make sure it is correct.

precalc

The precalc functions attempt to pre-calculate anything that will remain constant during a simulation run. This includes size dependent transistor parameters and the stamp values for linear elements.

The actual evaluation of constant linear elements is done here. For nonlinear elements, it computes a first guess.

dc_begin, tr_begin, tr_restore

These functions are called once on issuing a simulation command. The dc_begin functions are called on starting a DC or OP anaylsis. The tr_begin functions are called on starting a transient analysis from time = 0, or the first time. The tr_restore functions are called on starting a transient analysis in such a way that the analysis continues from where a previous transient analysis left off.

The purpose is to make sure that the initial guesses and numbers for prior iterations that don't exist are properly set up. For linear elements, the values are set up here and are not computed later.

dc_advance, tr_advance

These functions are called before beginning a new time or voltage step.

For basic storage elements like capacitors, they store the data from the previous step. They may also attempt to predict a new value, in hopes of speeding up the real solution.

For delay elements like logic devices and transmission lines, this function does the real work. It takes previous results and applies them, generating data that will be later loaded into the matrix.

tr_needs_eval

This function returns true if the component needs to be evaluated on this iteration. It should return false if it has already been queued, but some do not do this.

tr_queue_eval

This function queues the component to be evaluated, if it needs it. If tr_queue_eval is not called, it will not be evaluated.

do_tr

In most cases, the do_tr functions do the real work, or call the tr_eval function to do it. It evaluates the model, checks convergence, and queues it for loading. Calling this function more than once on an iteration is harmless, except for the waste of time.

Usually, it calculates the function and derivative. It may also do integration, interpolation, iteration, or whatever is required. The result is a set of values ready to stamp into the admittance matrix and current vector.

There are several distinct steps within this function.
  1. The first step is to gather the information necessary to make the computations. Usually, this is the node voltages, but it could be currents, temperature, charge, or something else.

  2. The next step is to evaluate any attached function. This could be done in line, or by a call to tr_eval. The result of this evaluation is stored in _y0 (of type FPOLY1. The tr_eval function reads the value of x from _y0, and fills in the f0 with the result of function evaluation, and f1 with its derivative. The tr_eval function must also check for convergence by comparing the new _y0 with the old value, _y1. This attached function is generic in the sense that it is the same for all device types. This is the y = f(x) that is referred to in the behavioral modeling documentation.

  3. These values are stored for convergence checking and probing.

  4. After that, it must be converted to a current and admittance so it can be used in the system of nodal equations. This step is dependent on what type of device it is. For a conductance element, tr_eval directly returns the correct information, so nothing needs to be done here. For a capacitor, this step does numerical integration. Capacitors store this in _i0. Most other elements do not store this result directly.

  5. Then, it must be converted into CPOLY form to meet the requirements of the system of equations.

  6. The device is queued for loading. Unlike Spice, Gnucap does not actually load the matrix here.

tr_load

This function gives the appearance of loading the admittance matrix and current vector with the values calculated in do_tr.

Actually, it does much more. In most cases, it actually loads a correction factor, assuming the old values are already loaded. To do this, it keeps track of what values are actually loaded. Whether it loads a correction or the actual value is determined first by the option incmode, then by status information about the solution. If it is suspected that correcting would cause too much roundoff error, it loads the actual value. The decision of whether to do a full load or an update is global.

In addition, it may apply damping in hopes of improving convergence. This means to load a value somewhere between the new and old values, in effect taking a partial step. The decision to damp is semi-global. Groups of elements are adjusted together.

The actual loading is done by one or more of a small group of general functions, depending on whether the element is active, passive, poly, or a source. Only certain patterns can be stamped. Complex devices use a combination of these patterns.

WARNING to model developers: DO NOT stamp the matrix directly!

tr_review

The tr_review function checks errors and signal conditions after a time step has converged. It makes entries into the event queue, makes mode decisions for mixed-mode simulation, and evaluates time step dependent errors. It returns an approximate time that the element wants for the next step. The actual next time step will probably be sooner than the value returned.

tr_accept

This function is called after the solution at a time step has been accepted. For most devices, it does nothing. For devices having storage and delayed propagation, it evaluates what signal will be propagated. For a transmission line, it calculates and sends on the reflections.

tr_unload

This function removes the component from the matrix, possibly by subtracting off what was loaded. Usually, it sets the current values to 0 and calls tr_load.

7.2.3  Step control

The basic algorithm

The basis of it is in the files ``s_tr_swp.cc'' and ``s_tr_rev.cc''.

The function TRANSIENT::review sets two variables: ``approxtime'' and ``control''.

The variable ``approxtime'' is a suggestion of what the next time should be. Note that this is a time, not a difference. Also note that the simulator may override this suggestion. Another ``control'' is an enum that shows how the time was selected. You can probe control(0) to find this code, or control(1) to see how many steps (not iterations) it calculated internally.

This time may be in the future, past, or again at the present time, depending on conditions. A time in the future means all is well, and the simulation can proceed as expected. A time in the past indicates something is wrong, such as convergence failure, excessive truncation error, or a missed event. In this case, the step is rejected, and time backed up. A repeat at the present time usually means a latency check failed. A portion of the circuit that was thought to be latent was found to be active. This usually indicates a model problem.

First, it attempts to suggest a time ``rtime'' based on iteration count and options.

There are several ``options'' that control the stepping: Second it makes another suggestion ``tetime'' based on truncation error, etc. It does this by calling the ``review'' function for all components, and taking the minimum. Any component can suggest a time for its next evaluation with its review function. Most components return a very large number, letting the capacitors and inductors dominate, but it is not required for it to be so. This time should be in the future, but errors could produce a time in the past.

Then, the earliest time of the above two methods is selected. A time in the past means to reject the most recent time step and back up, but note that this time is only a suggestion that may not be used.

The function ``TRANSIENT::sweep'' essentially processes the loop ``for (first(); notpastend; next())''. The function ``TRANSIENT::next()'' actually advances (hopefully) to the next step. It may go backwards.

The actual time step depends on the suggestion by the review function (approxtime), the event queue (which includes what Spice calls ``breakpoints''), the user step size (nexttick), and some tricks to minimize changes.

Some considerations ... So, here it is ...

Assume we want it at the time the user requested. If the event queue says to do it sooner, take it, else take the user time. Note that this time is needed exactly, either now or later. If the ``approxtime'' is sooner than the exact time, inject a time step as follows... if the time step is less than half of the time to when an exact time is needed, take the approxtime, else take half of the exact interval, in hopes that the next step will use up the other half.

After that, there are some checks ....

``Very backward time step'' means that the suggested new step is earlier than the PREVIOUS step, meaning that both the current step and its predecessor are rejected, thus it should back up two steps. Since it can back up only one step, it rejects the most recent step and tries again at the minimum step size. This usually means there is a bug in the software.

``Backwards time step'' means to reject the most recent step, but the one before that is ok. It will reject the step and try again at the smaller interval. This happens fairly often, usually due to slow convergence.

``Zero time step'' means that the new time is the same as the previous time, which usually means there is a bug in the software. Something is requesting a re-evaluation at the same time.

The combination of ``zero time step'' and ``very backward time step'' means that the re-evaluation didn't work.

Now, accept the new time and proceed.

The ``review'' function

Every component can have a ``review'' function, in which it can determine whether to accept or reject the solution. It will accept by suggesting a time in the future, or reject by suggesting a time in the past. It returns the suggested time. It can call new_event to request an exact time.

For capacitors and inductors, the review function attempts to estimate truncation error using a divided difference method, and it suggests a time for the next solution that will result in meeting the error requirement. Occasionally, it will discover that the step just computed fails to meet the requirement, so it will reject it.

Truncation error is related to the third derivative of charge or flux. Since current is the first derivative of charge, it would seem that second derivative of current should produce the same results faster. In practice, the current based method tends to estimate high leading to smaller steps, and the charge based method tends to estimate low, leading to larger steps. The conservative approach would suggest using the current based method, but that sometimes led to unreasonably small steps ans slow simulations, so I chose (as Spice did) the other method. Either method is ok when the step size used is close to being reasonable, but when the trial step is unreasonably large, either approach gives a very poor estimate. Taking a step much too small will make the simulator run much slower, as it takes many steps, then the step size is allowed to grow slowly. This is slower both because of the many unnecessary steps, and because of many adjustments. Taking a step that is much too large will result in a choice that is better than the first trial, which will make a better estimate and be rejected. It is rare to get more than one rejection based on truncation error.

Conclusion

Gnucap will usually do more time steps than Spice will, due to 2 factors. Gnucap will force calculations at print points and fourier points, and can reject a bad step. It is usually a little more, but could be as much as twice as many steps.

7.3  Data Structures

7.3.1  Parts list

internals: parts list parts list: internals

Main parts list

The primary data storage is in a list of ``cards''. A card is anything that can appear in a net list. Cards live here, primarily, but there are some other auxilary lists that also contain pointers to cards.

The list stores pointers, rather than actual objects, because there are many types of cards. All are derived from the ``card'', through several levels of inheritance.

Usually, they are stored in the order they are read from the file, except for subcircuits, which are stored in separate lists to preserve the hierarchy.

As of release 0.24, the main list is in static storage, so there can be only one. This will change. New cards can be inserted anywhere in the list, but usually they are inserted at the end. The mechanism for marking the location is a hybrid of STL and a 15 year old pointer scheme, which will also change someday.

The ``Common'' and ``Eval'' classes

The ``common'' serves two distinct purposes. The first is to share storage for similar devices. The second is to attach ``evaluators'' to otherwise simple components for special behavior.

Most circuits have many identical elements. The ``common'' enables them to share storage. One ``common'' can be attached to many devices. When a new device is created, even if it is parses separately, an attempt is made to find an appropriate device to share with.

Simple elements like resistors and capacitors can have ``evaluators'' attached as commons. These evaluators calculate a function and its derivative, and return it in a standard form. Some evaluators are used internally, such as in the diode and mosfet models. Some are used explicitly, such as in behavioral modeling.

7.4  Performance

This section gives some notes on some of the performance issues in simulators. It is not intended to be complete or well organized.

7.4.1  Virtual functions

There is a question of the impact on speed from the use of virtual functions. The experiment used here is to use the circuit eq4-2305.ckt from the examples directory, and try several modified versions of the program. I used a 100 point dc sweep, a version between 0.20 and 0.21, and made several modifications for testing purposes. I chose this circuit because it has little to mask the effect, and therefore is sort of a worst case.

I added an int foo to the element class. I made the function il_trload_source call a virtual function virtual_test and stored the result. The local version body has a print call, which should not show, to make sure it calls the other. These functions simply return a constant, determined by which version of the function is called. Run time is compared, with and without this.

With 1 virtual function call (included in load)
               user      sys    total
  evaluate    13.45     0.11    13.56
      load    13.40     0.06    13.47
        lu     1.91     0.09     2.00
      back    22.35     0.27    22.61
    review     0.00     0.00     0.00
    output     0.11     0.11     0.22
  overhead     0.23     0.19     0.42
     total    51.45     0.83    52.28
With 10 virtual function calls (included in load)
               user      sys    total
  evaluate    13.47     0.09    13.57
      load    24.69     0.17    24.87
        lu     2.09     0.02     2.11
      back    22.17     0.35    22.51
    review     0.00     0.00     0.00
    output     0.14     0.11     0.25
  overhead     0.25     0.25     0.50
     total    62.82     0.99    63.81
No extra function calls (included in load)
               user      sys    total
  evaluate    13.41     0.09    13.50
      load    11.75     0.05    11.79
        lu     2.04     0.03     2.07
      back    22.51     0.33    22.84
    review     0.00     0.00     0.00
    output     0.08     0.11     0.19
  overhead     0.31     0.25     0.56
     total    50.10     0.86    50.96
My conclusion is that in this context, even a single virtual function call is significant (10-15% of the load time), but not so significant as to prohibit their use. The load loop here calls one virtual function inside a loop. The virtual function calls an ordinary member function. Therefore, about 30% of the load time is function call overhead.

The impact should be less significant for complex models like transistors because the calculation time is much higher and would serve to hide this better.

Spice uses a different architecture, where a single function evaluates and loads all elements of a given type. This avoids these two calls.

7.4.2  Inline functions

For this test, il_trload_source is not inline. Contrast to "No extra function calls" and "1 virtual function" above, in which this function is inline.
               user      sys    total
  evaluate    13.44     0.15    13.60
      load    13.85     0.14    13.99
        lu     1.73     0.02     1.75
      back    22.89     0.35    23.24
    review     0.00     0.00     0.00
  overhead     0.45     0.17     0.63
     total    52.50     0.94    53.44
This shows (crudely) that the overhead of an ordinary private member function call (called from another member function in the same class) is significant here. The cost of a virtual function call is comparable to the cost of an ordinary private member function call.


Previous Up