-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPI and ISRs -- Why? I have the answer for you. #478
Comments
I see the utility of attachInterrupt() and detachInterrupt() (which are entirely undocumented) - but I'm curious about how you would make an ISR that would be able to form a coherent system with the tools that SPI.h provides - I mean, there's never been an ISR provided for it, so as it was attachInterrupt() would cause the next SPI transfer to jump to the undefined vector and from there to badISR which jumps to 0, which (prior to 1.4ish) would be an ungraceful crash into a broken state where no interrupts would run and peripherals were in different states than initialization code assumed. Since whatever version the dirty reset detection went in, that is caught and a software reset is fired So the way you're expected to use those member functions is that you write your own ISR and... How exactly does this interact with the SPI library? That seems un-arduino-like in terms of the depth of the code that it would require, and I'm not sure how to make that into something coherent with the SPI.h API.... I'm not sure where in any of those massive libraries I'd look for that part? I also wonder how doing it that way makes things any easier, since the ISR will have to be specific to the architecture, so putting the attach/detach (which don't work without advanced programming) functions into SPI.h fails to serve the purpose of SPI.h, that being to bundle the architecture dependent SPI-related stuff up so SPI-using libraries have a standard portable API, since architecture specific ISR code is still needed. When I made that change to the library, I had no idea what the intended use case was (Are those documented anywhere? If so, where? They're not documented on the Arduino website's SPI.h documentation) As for usingInterrupt() - the problem here comes about for the same reason that attachInterrupt() sucks so much more. On classic AVRs, there were two kinds of pin interrupts INTn, and PCINTn, and a part would have a PCINT on all or most pins, but only a few INTn interrupts (sometimes only one). attachInterrupt() and SPI.usingInterrupt() only worked on INTn interrupts, which kept things simple and let attachInterrupt be used without blocking off all pin intertrupts. Modern AVRs dispense with the two tiered pin interrupt system, and merge the two into one, by making the all-pin PCINT-like functionality trigger on rising/falling/low/change instead of just change. But this leaves no coherent way to implement attachInterrupt without blocking off every pin interrupt. I rewrote attachInterrupt with great difficulty, in assembly written with aggressive optimizations to prevent most of the duplication of code that would be present when keeping a separate copy of the ISR for each port, and splitting each port's stub-isr (which bypasses calling conventioned to reach the initial dispatch function. the eventual dream was that I would be able to use things like if __builtin_constant_p(pin) and so on, to - as long as only constant-foldable values were passed as the pin - only "reference" the file for that port, and thus, allow user interrupts on other ports; one of the most compelling of them being the 1 line do-nothing ISR that is the natural choice to wake from sleep when you have no other interrupts on that port (which is the majority of the time, even if you're not trying to design for this, and it's almost always trivial to arrange if you are), something like Anyway - it turned out that this broke under the new attachInterrupt logic (and in fact it may have never worked on mTC). As I looked at it, an inherited-from-stock-core function at that point, I grew increasingly disgusted with the implementation, and as a fix was needed urgently, I just made all pin interrupts do the global interrupt thing. I intended to come back to this, but had kinda forgotten about it. I see better ways to implement it now - while nothing I'd call performant, these would be more than twice as fast as the horror that was originally implemented.
|
So, I agree on usingInterrupts, though I still disapprove of that sort of thing. I may make it conditional on having at least 8k of flash (this library is the same one that mTC gets. and the 2/4k people would give me hell), because the overhead will be non-negligible. SPI.attachInterrupt() and SPI.detachInterrupt() I'd like to get a confirmation from you about what I described above. I will note that you are the first one to mention this in well over a year. |
I am currently working on a project, that (if I can get the events, timers, and lut's to cooperate) will not only revert the "undocumented" attach/detach, but include an extra parameter to enable client mode.
Explanation: Communication is such that I only need 7 bits, not the full 8 bits of SPI, and it's synchronous. NOTE-- using an AVR64DD14, if I end up needing more pins, I'll go with the 20... I shouldn't have to though. So far: The issue is that LUT1 isn't "seeing" the WO from TCB0, instead it emits a simple pulse, so I might have to instead use events to the LUTs and use an S/R latch? Haven't gotten that far yet, but WO totally doesn't track in the LUT. Here's the basic code, minus the SPI stuff, just outputs to pins (/ss and sck have to do that anyway).
PC1 contains the 1MHz clock, which would be connected to SCK |
Periodically print out the value of TCB0.CNT - you should get a range of values from 0 to 7. If you're only getting 0's, then the timer isn't ticking. I suspect that the issue here is that either TCBs only clock from events in some modes, or that WO is not defined in the compare modes. I think it's the latter.... Ifn ya don't need millis (delay() and delayMicroseconds() still work when millis is disabled), you could use a pair of TCBs (or on a DD28, see below for why you might want that, there's an extra TCB, so you would be able to keep millis. I've looked at similar methods. I do not believe that you can run in slave mode without SS being driven low during transactions and going high otherwise (it doesn't just disable master mode, it also resets the slave mode when it goes high (clearing incomplete transfers and so on). You do realize that there aren't any AVR64DD20's in the nice little QFN, only the brick sized SOIC-20, so if size is a priority, that's not going to be a winner. And it costs more than the AVR64DD28 in a slightly larger QFN (and only pennies less than the AVR64DD28 in SOIC packaging). The die wouldn't fit in the 3x3 QFN (confirmed by filing the top off of AVR32DD20 and AVR64DD14 (I am not going to sit there filing for an hour to get into an SOIC-20; 14 and 20 pin, and 28 and 32 pin parts each share a die, so 14 and 20 should be the same. (16k and 32k ones in each class are the same too; they do that pretty often, So there are 4 DD dies - and 12 DDs. My guess is that the factory "calibration" includes telling the chip what it is). The 32DD is about as large a die as fits in the 3x3 QFN, and the 64k one is a little bit larger. (my understanding is that this is also why there's no 3x3 QFN 3217 - they were using a less advanced process node then, so a 32k modern AVR with peripherals wasn't able to fit. The difference between the two is smaller than you might expect based on the premium they charge on the 64k ones, but the fact that they use the same die for the two smallest sizes on many products is consistent with the really disappointing |
Timer stops at 7, "running" bit claims it's not running after counting to 7. Real question is where's the pulse coming from... Note also that the only time I need is the 32Khz clock, delay won't be used in the final design, and serial won't either. There's plenty of pins if I could have done it all bitbang style, but the problem with that is the IRQ storm from that, hence the attraction of the LUTs/CCL. There's also a bug someplace (possible missing SEI? bad vector?) where if I run this with the millis/micros disabled, it wedges. Could also be the optimizer messing things up as usual... I have a few other bugs I have tripped up too, but I'm guessing the features are so unused, that nobody has noticed. Shall I just attach my findings here, or add a new report? |
CNT == CCMP == stop |
I've managed to get closer to a solution, but unfortunately, Microchip really screwed up on timers and docs, as usual. |
Also updated the EB-part specific docs with info from the header as it becomes clearer what these will do.
There's one last detail, needed (automatic switching to slave). There are several different, and all incompatible ways to do this (there's no real standard yet). Here's what I've seen in the wild... separate SPISlave library -- bad, because of code duplication Note that when in slave mode, transaction isn't used anymore, transfer takes data from a queue (pointer to data with length, set to 0 on /SS high edge), and if the queue is empty, either the output is disabled and reflects the IDLE state or a default value is preloaded with the idle state e.g. 0xff or 0x00. Ideally RX is queued too, within an entire /SS transaction. Thoughts? |
Also found out the issue with the trigger for TCD. Basically, boils down to the chip won't clear if the start event is already low. basically, to start, it wants a simple pulse. The fix is to pass the event through an edge detecting LUT :-D |
So, I've been working on my project, and here's where the need for the ISR comes in. My process is the following when entering the ISR: The whole trick here is that while it is in an ISR, it is expected to stall the user's code, but allow other stuff to happen such as serial, millis, etc. |
blink Uhhhhh..... What you describe doesn't do what you say it does.
Upon entering the ISR, and until a reti instruction is executed. the LVL0EX bit in CPUINT.STATUS will be set. As long as LVL0EX is set, only the level 1 interrupt vector, if configured, can be triggered. Only a single lvl1 int can be specified. Therefore, the second half of your final sentence does not reflect what is the documented behavior of interrupts. The interrupt system does not work like classic AVRs. The global interrupt flag is not cleared on entering an interrupt- the CPUINT.STATUS ___EX bit appropriate for the interrupt priority is set instead, and on a RETI, cleared. I don't know what the exact result is when reti's and interrupts are not matched with each other but I suspect that when a RETI is executed, it clears the highest bit of CPUINT.STATUS that is set. It doesn't have any way to know which reti is associated with which vector, so all it could do is rely on the fact that if any LVLnEX is set, then it is in an interrupt of that level, and should clear that bit as it executes reti, allow interrupts to trigger only when status is 0, or that interrupt is elevated, and when entering the ISR, the LVLnEX bit would be set based on what it's priority is (I think the xmegas did it like this too, only because they were xmega it had to be overly complicated. They had at least 4 priorities, and I think the same less than useful NMI), and the more I think about it, the more I am convinced that other approaches are implausible, Also, because of this, lines 1, 2, 3, 6, 8, and 9 have no effect on the behavior of the code other than slowing it down unless you have an elevated interrupt... This core does not use elevated interrupts and it's use is very rare in arduino circles. Unfortunately, the stock core, and megacorex as far as I know still have the original implementation for Serial - which does use it, under some cases, when using serial (when the TX buffer is full and you try to write more to it, and when you call Serial.flush()... even though there isn't a need, and even though there are both hangs in the field (at very high baud rates) and theoretical concerns in multiple scenarios (mostly at low bauds) |
Considering it's "Secret sauce" the steps are fairly close to what is done, but not exact. For modern AVR, it's slightly different... we need to instead change the code path. Basically we split the ISR into a top-half, and a bottom half. The top half services the actual ISR and it's hardware. The bottom half runs in the normal context until it is done, and returns back to the top half, which returns with RTS instead of RTI. Yeah, a bit convoluted, but, that's probably a better description than the terse one above is. :-) |
If you are at all curious on how I accomplish this on classic AVR, ARM and MIPS, I have some code on github that can show you how this is done across those platforms. Here's a classic AVR one. This does task switching, but needs extra hardware. https://github.com/xxxajk/xmem2 ...and the universal one (Classic AVR/NVIC on MIPS/ARM) https://github.com/felis/UHS30/tree/master/libraries/dyn_SWI At some point I will be adding modern AVR, but perhaps you can understand easier what is going on by looking at actual production code. |
That answers my technical reaction to your scheme. Considering how convoluted a situation that is, it does sort of circle back to my original line of comments which is - that's a plausible if convoluted solution that effectively makes a simulation of reentrant code. But how does this involve SPI.h? The point of the standard libraries is to push architecture dependent code behind the standard library so the app code isn't architecture dependent... as this library would hgave tro be architecture dependent. |
It applies in two cases... either monitoring of /SS or RX/TX via ISR for something gawd-aweful slow. Edit: Yeah, SBI CBI, forgot about those, that's going to be my "fix" ;-) |
CBI/SBI only work on the low I/O space. There's also nothing special to using them anymore (there sorta was on classic AVR). If you do, in c
will compile to sbi 0x1F, n
compiles to ldi (some reg), (1<<3, constant so math optiized away), out 0x1F, somereg.
Yet because two bits were set there in one line, you get the worst result: ldi, in, ori, out which is NOT INTERRUPT SAFE and if an interrupt reads that variable, you must disable interrupts when you do that (not the case with the others). So you may need to wrap that in a cli/set. If the register isn't in the low I/O, including all the ones that enable and disable a peripheral specific interrupt on a modern AVR, all of those will produce the even slower lds (2 word 3 clock) ori sts (2 word 2 clock), and similarly interrupt unsafe. On the modern AVRs the low I/O is the following:
This explains the paucity of registers with bit level access on AVR, as well as other painful limitations like only having std/ldd on Y and Z (it would have taken another 4096 instructions to do it for x too). Classic AVRs frequently put what they expected to be used often there after they'd filled it with the pin-related ones, but like most manufacturers, they're not great at knowing what features we use. On modern AVRs they effectively admitted this and chucked all the riffraff out of the low and high I/O spaces. |
This is why I always disable global interrupts, then do the change, and re-enable them. |
First off, thank you for continuing the development to support these chips. You saved me a ton of development time.
So, I can explain why SPI and ISRs are actually important, and why so many of the misconceptions about using SPI within an ISR is "bad".
SPI.attachInterrupt()
andSPI.detachInterrupt()
: This allows one to write a library that basically avoids polling, and instead, simply reads the result storing it for use elsewhere. This has been very handy when using SPI in client mode, since you are expecting to handle it elsewhere, and you may have a long running part of code, and you don't wish to lose any data. You do that by capturing the information and process it elsewhere, much the same way Serial does. It also allows you to send data in host mode with queuing. Remember, SPI doesn't actually require the use of the/SS
line, and that the communication could be one way.UsingInterrupt()
: This is super important when you don't want anything to disturb the sequence on the data stream, be it a collision, or something else. Consider the fact that most neopixel type addressable LEDs can be quite finicky if a delay is in the wrong place. You don't want an ISR to mess with it. Secondly it allows multiple devices to operate weather-or-not they are in an ISR. I can give you several cases where this becomes quite important: UHS2, used under multitasking (such as my XMEM library), UHS30, various WiFi chips, etc. The actual list is pretty large, and I realize that most MCU coders either don't have the experience to avoid ISR deadlocks, or don't have the correct way of thinking about code flow. That's where developers like me, handle all of the messy parts hidden away in an ISR. Polling is crap, and can be a huge cause of missed data, or not meeting time requirements for a device that is on SPI, or causes a collision because some code uses an ISR to handle a specific device on SPI multidrop, instead of doing the stupid thing of polling, and losing the event and the data. I have a similar argument over heap, and yes, I do use new/delete/malloc/free in ISRs. I even include a replacement or tie into the libc functionality. RTOS on MCUs are pretty popular as well, so it makes sense to do these things.Here is a sample of very popular libraries that use this convention, some of them I either wrote, or helped to write.
xmem2
UHS2.0
UHS3.0
There are others such as ethernet too.
So I suggest that you keep the above functionalities. They're kinda important, and you aren't the first one to whine and question the rationale. Fact remains it's useful, and in a lot of cases, a requirement.
In order to help you become educated on the "controversy" and a complete discussion on the topic, which contains the relevant links to the Arduino mailing list go read this.
SPI sharing between interrupts and main program
The text was updated successfully, but these errors were encountered: