FastSPI_LED2 is a rebuild from the ground up using multiple layers of components that, it turns out, may have usages outside of just the LED library!  The lowest level of these is the Pin access library.  This is designed to allow me to write higher level code accessing pins, using the fastest mechanisms available on known platforms, and falling back to workable methods on arduino platforms that don’t have all the information in place for Mostest Speed[tm].  The goal with the Pin class is to make pin access as easy/trivial as possible, and also as portable as possible.  For example, here’s a simple bit of code to blink a pin:

#include "FastSPI_LED2.h"

setup() { Pin<13>::setOutput(); }
loop() { Pin<13>::hi(); delay(200); Pin<13>::lo(); delay(200); }
// even shorter loop() { Pin<13>::toggle(); delay(200); }

and that’s it!  What you don’t see, though, is the mechanism that is used under the hood.  Most introductory arduino code recommends using methods like digitalWrite – which has a lot of overhead for turning a pin on or off.  While that may not be important for a simple example like this, with something like FastSPI_LED2, high performance is part of the name and the game!  The Pin class/library, under the hood, tries to use the most efficient/speedy method for twiddling the pin that it can.

For example, on an AVR, the hi() call compiles down to a single avr operation, which runs in a single clock cycle, seen in the below disassembly:

000000aa : <loop>
  aa:   2d 9a           sbi     0x05, 5 ; 5

On an arduino where I haven’t yet defined pin mappings, this code looks more like:

00000100 : <loop&gt;
 100:   e0 91 0a 01     lds     r30, 0x010A
 104:   f0 91 0b 01     lds     r31, 0x010B
 108:   80 81           ld      r24, Z
 10a:   90 91 09 01     lds     r25, 0x0109
 10e:   89 2b           or      r24, r25
 110:   80 83           st      Z, r24

A few more instructions, part of why i’m working on making sure I get pin definitions in for as many platforms as possible.  Still though, better than using digital write which becomes:

00000100 <loop>:
 100:   8d e0           ldi     r24, 0x0D       ; 13
 102:   61 e0           ldi     r22, 0x01       ; 1
 104:   0e 94 b5 01     call    0x36a   ; 0x36a <digitalWrite>

and that digital write code? Well, let’s take a look at the disassembly of digitalWrite:

0000036a <digitalWrite>:
 36a:   48 2f           mov     r20, r24
 36c:   50 e0           ldi     r21, 0x00       ; 0
 36e:   ca 01           movw    r24, r20
 370:   82 55           subi    r24, 0x52       ; 82
 372:   9f 4f           sbci    r25, 0xFF       ; 255
 374:   fc 01           movw    r30, r24
 376:   24 91           lpm     r18, Z+
 378:   ca 01           movw    r24, r20
 37a:   86 56           subi    r24, 0x66       ; 102
 37c:   9f 4f           sbci    r25, 0xFF       ; 255
 37e:   fc 01           movw    r30, r24
 380:   94 91           lpm     r25, Z+
 382:   4a 57           subi    r20, 0x7A       ; 122
 384:   5f 4f           sbci    r21, 0xFF       ; 255
 386:   fa 01           movw    r30, r20
 388:   34 91           lpm     r19, Z+
 38a:   33 23           and     r19, r19
 38c:   09 f4           brne    .+2             ; 0x390 <digitalWrite+0x26>
 38e:   40 c0           rjmp    .+128           ; 0x410 <digitalWrite+0xa6>
 390:   22 23           and     r18, r18
 392:   51 f1           breq    .+84            ; 0x3e8 <digitalWrite+0x7e>
 394:   23 30           cpi     r18, 0x03       ; 3
 396:   71 f0           breq    .+28            ; 0x3b4 <digitalWrite+0x4a>
 398:   24 30           cpi     r18, 0x04       ; 4
 39a:   28 f4           brcc    .+10            ; 0x3a6 <digitalWrite+0x3c>
 39c:   21 30           cpi     r18, 0x01       ; 1
 39e:   a1 f0           breq    .+40            ; 0x3c8 <digitalWrite+0x5e>
 3a0:   22 30           cpi     r18, 0x02       ; 2
 3a2:   11 f5           brne    .+68            ; 0x3e8 <digitalWrite+0x7e>
 3a4:   14 c0           rjmp    .+40            ; 0x3ce <digitalWrite+0x64>
 3a6:   26 30           cpi     r18, 0x06       ; 6
 3a8:   b1 f0           breq    .+44            ; 0x3d6 <digitalWrite+0x6c>
 3aa:   27 30           cpi     r18, 0x07       ; 7
 3ac:   c1 f0           breq    .+48            ; 0x3de <digitalWrite+0x74>
 3ae:   24 30           cpi     r18, 0x04       ; 4
 3b0:   d9 f4           brne    .+54            ; 0x3e8 <digitalWrite+0x7e>
 3b2:   04 c0           rjmp    .+8             ; 0x3bc <digitalWrite+0x52>
 3b4:   80 91 80 00     lds     r24, 0x0080
 3b8:   8f 77           andi    r24, 0x7F       ; 127
 3ba:   03 c0           rjmp    .+6             ; 0x3c2 <digitalWrite+0x58>
 3bc:   80 91 80 00     lds     r24, 0x0080
 3c0:   8f 7d           andi    r24, 0xDF       ; 223
 3c2:   80 93 80 00     sts     0x0080, r24
 3c6:   10 c0           rjmp    .+32            ; 0x3e8 <digitalWrite+0x7e>
 3c8:   84 b5           in      r24, 0x24       ; 36
 3ca:   8f 77           andi    r24, 0x7F       ; 127
 3cc:   02 c0           rjmp    .+4             ; 0x3d2 <digitalWrite+0x68>
 3ce:   84 b5           in      r24, 0x24       ; 36
 3d0:   8f 7d           andi    r24, 0xDF       ; 223
 3d2:   84 bd           out     0x24, r24       ; 36
 3d4:   09 c0           rjmp    .+18            ; 0x3e8 <digitalWrite+0x7e>
 3d6:   80 91 b0 00     lds     r24, 0x00B0
 3da:   8f 77           andi    r24, 0x7F       ; 127
 3dc:   03 c0           rjmp    .+6             ; 0x3e4 <digitalWrite+0x7a>
 3de:   80 91 b0 00     lds     r24, 0x00B0
 3e2:   8f 7d           andi    r24, 0xDF       ; 223
 3e4:   80 93 b0 00     sts     0x00B0, r24
 3e8:   e3 2f           mov     r30, r19
 3ea:   f0 e0           ldi     r31, 0x00       ; 0
 3ec:   ee 0f           add     r30, r30
 3ee:   ff 1f           adc     r31, r31
 3f0:   ee 58           subi    r30, 0x8E       ; 142
 3f2:   ff 4f           sbci    r31, 0xFF       ; 255
 3f4:   a5 91           lpm     r26, Z+
 3f6:   b4 91           lpm     r27, Z+
 3f8:   2f b7           in      r18, 0x3f       ; 63
 3fa:   f8 94           cli
 3fc:   66 23           and     r22, r22
 3fe:   21 f4           brne    .+8             ; 0x408 <digitalWrite+0x9e>
 400:   8c 91           ld      r24, X
 402:   90 95           com     r25
 404:   89 23           and     r24, r25
 406:   02 c0           rjmp    .+4             ; 0x40c <digitalWrite+0xa2>
 408:   8c 91           ld      r24, X
 40a:   89 2b           or      r24, r25
 40c:   8c 93           st      X, r24
 40e:   2f bf           out     0x3f, r18       ; 63
 410:   08 95           ret

Quite a bit of difference in generated code output, no?  The Pin library is for those times when you absolutely have to bitbang (either you can’t use the hardware SPI port, or you’re doing something with pins that don’t involve SPI or anything SPI like at all, I’m looking at you WS2811) but still want it to be as fast as possible.  Also – this library works on the teensy 3.0 arm platform as well, reducing the hi/lo calls to just a load and a write.  (The load is required because the GPIO registers are  in a high block of memory, you need a full 32 bits to represent them, so the address to the GPIO location for a pin needs to be loaded into a register, then you can push a pin value you into).

Right now, the Pin class is written and tuned for high performance output.  Setting pins, toggling pins, with a variety of support functions to help achieve higher performance, even in environments where the pin->GPIO port mapping can’t happen at compile time.  A future post will detail how to use some of these other methods in the Pin class to squeeze the most performance out of your bit twiddling code (for example, when bitbanging SPI output, the inner loop of the fast SPI code can push out a bit every 4 cycles – 2 to determine if the current bit is hi or lo, 2 to set the data line appropriately and strobe the clock)!  In addition, a future revision of the FastSPI_LED2 library will update the Pin class to support reading data/values as well as writing them.

About these ads