Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T-state counting #145

Open
holub opened this issue May 2, 2021 · 3 comments
Open

T-state counting #145

holub opened this issue May 2, 2021 · 3 comments

Comments

@holub
Copy link

holub commented May 2, 2021

T-state counting similar to zmac.
Example of usage here
Example:

code:   ld      a,5
        ld      (hl),a
        inc     a
        inc     hl
        ld      (hl),a
cost    equ     t($)-t(code)
@holub holub changed the title T-state caunting T-state counting May 2, 2021
@specke
Copy link

specke commented Feb 12, 2023

One of the first things I've done when I restarted writing assembly code in 2013 was to create a t-states counting library (it was very basic, but would be able to deal with the task as above). Unfortunately, since then I grew to recognize that a feature like this does not belong to assembler, or to be more precise, would be mostly useless in assembler, esp. for an assembler that attempts to be multiplatform, as sjasmplus. The reason for this is very simple:

  • How do we deal with conditional execution? For any code that branches "the execution time" is not something defined uniquely.
  • Depending on a particular platform, the number of t-states for specific commands can vary. E.g. ZX Spectrum clones with M1 delays, effectively, round up the execution time of commands with odd number of t-states to become even. On Amstrad CPC the executions times are all rounded up to the nearest multiple of 4 t-states (I do not know the details about this, so apologies if I am not stating this fully correctly). My point is, if your assembler makes a promise to compute something like this, you open a real can of worms where people from different platforms would require different timing profiles.
  • Depending on a particular platform, the number of t-states for specific command can depend on precise timing of the command (I am thinking about ULA delays on ZX Spectrum, but I am sure there are other situations too).

All in all, if you want accurate timings, this a job for a good emulator for your platform. There are assemblers that can invoke emulators as part of their workflow (I know of at least one), but this is a very substantial commitment and redesign, which is difficult to justify here.

@ClaireCheshireCat
Copy link

I recently wrote a piece of code like that, but the durations are given in "NOPs" for the Amstrad CPC. By updating the arrays in the source you could get the T-States, I suppose :

By the way, any advise on this code is welcome ;-)

;==============================================================================
; TIMINGS_TICKER
;------------------------------------------------------------------------------
; Measures the duration of a snippet of code
; Conditional jumps are counted like if they where false (i.e, no jump)
;------------------------------------------------------------------------------
;------------------------------------------------------------------------------
; INPUT:
;  Start of the code to measure
;  End of the code to measure
;------------------------------------------------------------------------------
; OUTPUT :
;  A label TICKER is set (or overwritten), containing the duration of the
;  code, in NOPs
;==============================================================================
    macro TIMINGS_TICKER start,stop
        LUA PASS3
            nops={3,2,2,1,1,2,1,1,3,2,2,1,1,2,1,3,3,2,2,1,1,2,1,3,3,2,2,1,1,2,1,2,3,5,2,1,1,2,1,2,3,5,2,1,1,2,1,2,3,4,2,3,3,3,1,2,3,4,2,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,2,2,2,2,2,2,1,2,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,2,3,3,3,3,4,2,4,2,3,3,0,3,5,2,4,2,3,3,3,3,4,2,4,2,1,3,3,3,0,2,4,2,3,3,6,3,4,2,4,2,1,3,1,3,0,2,4,2,3,3,1,3,4,2,4,2,2,3,1,3,0,2,4}
            nops[0]=1
            bytes={3,1,1,1,1,2,1,1,1,1,1,1,1,2,1,2,3,1,1,1,1,2,1,2,1,1,1,1,1,2,1,2,3,3,1,1,1,2,1,2,1,3,1,1,1,2,1,2,3,3,1,1,1,2,1,2,1,3,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,3,3,1,2,1,1,1,3,0,3,3,2,1,1,1,3,2,3,1,2,1,1,1,3,2,3,0,2,1,1,1,3,1,3,1,2,1,1,1,3,1,3,0,2,1,1,1,3,1,3,1,2,1,1,1,3,1,3,0,2,1}
            bytes[0]=1
            ednops={2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,4,4,4,6,2,2,2,3,4,4,4,6,2,4,2,3,4,4,4,6,2,2,2,3,4,4,4,6,2,4,2,3,4,4,4,6,2,4,2,5,4,4,4,6,2,4,2,5,4,4,4,6,2,4,2,4,4,4,6,2,4,2,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}
            ednops[0]=2
            edbytes={2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,4,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}
            edbytes[0]=2

            timings_debug=_c("TIMINGS_DEBUG")

            start=_c("start") % 65536
            stop=_c("stop") % 65536
            if(stop<start) then
                stop=stop+65536
            end

            ptr=start
            cptnops=0

            while(ptr<stop) do
                byte=sj.get_byte(ptr)
                if(nops[byte]==0) then
                    if(byte==0x0ED) then
                        edbyte=sj.get_byte(ptr+1)
                        cptnops=cptnops+ednops[edbyte]
                        ptr=ptr+edbytes[edbyte]
                    else
                        sj.warning(string.format("TIMINGS_TICKER : This extended opcode is not yet managed : #%02x #%02x",byte,sj.get_byte(ptr+1)))
                    end
                else
                    cptnops=cptnops+nops[byte]
                    ptr=ptr+bytes[byte]
                end
            end

            if(ptr~=stop) then
                sj.warning("TIMINGS_TICKER : Stop value doesn't point to the byte following an opcode")
            end

            sj.insert_define("TICKER",cptnops)
        ENDLUA
    endm

There's a problem with this code : You can't use this value later in the code without raising warnings
I mean :

startA:
  ld a,2
stopA:
  TIMINGS_TICKER startA,stopA
  dup (64-TICKER) ; This code should ensure my routine lasts for exactly 64 NOPs
    nop
  edup

this code produces the right opcodes (#3E #2 followed by 62 #00) but raises a lot of "warning: Label has different value in pass 3'

@ped7g
Copy link
Collaborator

ped7g commented Jun 16, 2023

the Lua script looks quite good (just incomplete with regards to IX/IY and bit instructions if I understand it correctly from quick read). I would probably define it as function during PASS1 and then just call it in PASS3 whenever needed to avoid all of this being defined/processed multiple times for each block.

the problem with later use is unfortunately hard limit of the sjasmplus design, the code to assemble and thus addresses between second and third pass should be same, thus reading the code in third pass and adjusting by that is breaking this principle. If you really know what you are doing and you don't care about the warnings, you can still sometimes get what you asked for in the binary, but the "correct" way to do this would be to assemble these independently, first assemble to binary blob the inner part which will be aligned, then in new asm file and new assembling process incbin that... oh wait, that reads actual bytes still in pass3 .. hmm.. so you would have to read the file by lua io in each pass (or first pass) and calculate the T-states and then have the DUP padding fixed for every pass with same result.

Or produce the inner code in first assembly and during its third pass generate the timing info and export it to small include file for second assembling, which can then just incbin + include these two files and use the values without running the timing counting at all -> that's probably the most "sjasmplus" way fitting the assembler architecture well.

In other words this type of task is not a good-fit for sjasmplus, and there's no simple fix/improvement to get there.

If you don't want to go there splitting the assembling into two, you can avoid some of those warnings by hard-padding the DUP block with like ORG (or anti-dup with remaining nops to have 64 in total) after it so no matter how long it is, there's only a bit of code not defining new labels following it doing the important stuff, probably jumping to next code at end of it, then ORG/anti-dup makes sure that each pass follows at same address after this piece of code no matter what are the timing results and avoid changes in pass2/pass3 addresses.

I think this nicely illustrates why it would be tricky to add it into sjasmplus, and to satisfy all possible use-cases, so I'm still not even considering that. I'm somewhat open to the idea of adding T-states printed into listing file (under some option), but not planning to do that in near future, and it would not help use cases like this, would just make manual counting a bit simpler. In my ideal world I would rather see more tools doing their part of work, like z88dk ticks emulating the code and counting T-states including loops/etc (but that does not simulate ZX contention, only Z80 timings), and then some IDE integrating all this and more, like using AI to generate possible unit tests for newly written code, using something like ticks to measure it's performance and run unit tests and produce hints about results produced by code while typing, etc... sadly there's no such dream IDE at all right now, and it's unlikely I will ever write it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants