PDA

View Full Version : Discussion Multi-Master SPI Arbitration


Wulffy
Sep 10, 2007, 01:02 AM
I am sure that this thread will prove to be a bit esoteric, but I am struggling with something that I'd like to throw out to the group for review and consideration.

My issue is this: I am looking at implementing common-memory on a Multi-Master SPI bus. I am looking at having two or three masters. I'd like to provision for a total of four (a minimum) or more, to facilitate future growth.

I have read a lot and understand that, in it's native implementation and by it's very nature, SPI and a Multi-Mater topology is somewhat exclusive of each other.

I plan on having 3+ SBCs in my system. Without having the luxury of interrupt driven events, and a strong need to facilitate High-Speed Asynchronous Transfer of data to and from one module to the other, I have decided to try and tackle what some texts have described as "rare and awkward, and are usually limited to a single slave (http://en.wikipedia.org/wiki/Serial_Peripheral_Interface#Disadvantages)".

The hardware-layer of the mechanism that I am looking at using to effect this is SPI-Based FRAM (http://www.ramtron.com/doc/AboutFRAM/Technology.asp). The primary reasons for this choice is three-fold:

Longevity. First and foremost, due to the very high quantity of read/writes that I am looking at, having 10^12 to "unlimited" write cycles will serve to ensure that once deployed, I won't run into issues with reliability that Flash based devices would be faced with, possibly failing after as few as 10^5 or 10^6 write cycles.
Zero-Wait state. The data is stored as fast as I can possibly shovel it down the device's throat, without having to stall while waiting for the memory device to retain the data - no refreshes, no battery backup, no delays related to writing to slow flash.
Speed. I2C, by design, facilitates multiple masters, but I2C is slow. SPI, conversely, is darn quick, and I feel is the better choice given what I want to use this for, especially with the Speed of SPI coupled with the Zero-Wait memory - should prove to be brutally fast, if it can be made to work...
Accordingly, I have decided that I want to jump this high hurdle...

I am going to have one of my MCUs acting as a Telemetry Host - interfaced to a MaxStream Xtend transceiver. It will be responsible for querying the common-memory to retrieve the parameters that are to be transmitted. Additionally, it will also receive commands from the ground control station for the airborne subsystems and store them in the common-memory. The Telemetry Host MCU could also be the Flight Data Recorder, if I determine that it has enough 'idle' time when I do my work-load studies on the ship's systems.

I will have a 2nd MCU acting as a Navigation Computer Unit (NCU). By virtue of it's name, I am sure that you can probably extrapolate it's role in life - GPS/IMU interface-based 3D-Waypoint navigation.

I initially planned on also have a third MCU acting a flight data recorder and also as a movable surfaces controller, listening to the Rx's ppm and demuxing it to the servos/actuators, and I may very well proceed in this fashion. The one thing that may cause me to reconsider this is the fact that there are some pretty simple and robust COTS solutions showing up out there with PIC-based dedicated servo controllers, or the new ASIS? that MX has a lead on (I haven't yet ping'd MX for the vitals on that - I'll let him get comfortable with the hardware first, before I go bugging him about it...). If these COTS solutions prove to be sufficient and reliable enough, then I won't re-invent the wheel and I'll try to push the flight data recording back onto the Telemetry Host MCU, and implement the COTS solution for Servo Demuxing, Control, and Failsafe. This is all going to be driven by the loop iteration rates that I can get out of each of the subsystems...

That bring me full circle to the root reason for the implementation of the common-memory device. My hardware choice and language selection doesn't yield a combination where I feel that I will have the resources to have coordinated synchronous communication directly between the various subsystems without substantive overhead penalties. I feel that this overhead would be unacceptable.

Being able to have each subsystem having a control loop that is NOT predicated on relying on a separate subsystem's commands or data communications will yield the most efficient and desirable results - i.e. I feel that:

Having the NCU collecting data from the GPS/IMU, doing the 3D or 4D navigation calculations, storing the needed reactions to the high-speed common-memory and moving on to do it all over again will serve to yield a very high loop rate for this subsystem
Having the Telemetry Host being able to go to the high-speed common-memory to immediately store any received commands, and then retrieve the various parameters that require transmission, without having to wait for other systems to get to the point in their loop to directly provide the data to it, will serve to ensure that the the control loop for the Telemetry Host is at an adequately high enough rate.
Having the Movable Surfaces controller being able to fetch the required corrections, independent of the source of said corrections, will surely serve to ensure that the control loop iteration rate is sufficiently high.
Basically the above serves to illustrate that the operations of the various subsystem's loops can become autonomous and independent of the other subsystems, with the storage/extraction of commands and information to/from the FRAM SPI slave device.

...

Utopia would be the availability of a SoC device that has three or four+ SPI Slave ports with multi-port FRAM. Again, I have searched and have yet to find such a device... Granted I could implement something with FPGA systems, but that has a whole set of penalties that I don't think I am willing to pay.

With the zero-latency of FRAM, with the high speed of SPI, and with the inter-subsystem independence that this approach provides, I feel that this is a very compelling reason to try to implement what I am considering.

...

So, now that I have over-bloodied the frigging horse :), can those of you who may have some first-hand knowledge of Multi-Master SPI (MMSPI/SPI-MM?) arbitration implementation please reply with some suggestions as to how the successful arbitration of 3+ masters with a single slave device might be able to be achieved?

Some have mentioned that a Prop (Parallax Propeller) might be the way to go, other have suggested FPGAs with embedded processor cores and memory. I'd like to think that there might be a way to successfully implement this without having to rely on any additional external components. SPI bus collisions may be able to be totally avoided. Unrecoverable system crashes are not acceptable. it seems to me, ignorance admitted, that the use of one or more GPIO lines in between the Masters, and some simple check and balance code on each Master, should be a means with which to successfully implement said arbitration with a very high degree of reliability and a (very-?)small performance penality...

Thanks for reading my diatribe. Please review and advise with any viable suggestions that you might have, or have experienced success with previously.

Again, thank you!

-t

p.s. In addition to the SPI FRAM device, I also have an I2C FRAM device coming, just in case I can not make the MMSPI implementation work as needed/intended...

poynting
Sep 10, 2007, 03:06 AM
Sorry, but I can't lend any help on the MM-SPI. However, I would pose the question of what processors you're planning on using and why you suspect that you need so much processing power as to split it amongst 2-3 processors. An ARM9 or maybe even an ARM7 should have ample MIPS to do everything you need... using multiple processors may cause more trouble than it's worth in the long run, unless there is a system spec that's forcing you to use multiple less-capable micro-controllers. There are a lot of good ARM dev boards out there, and the tools are free if you're willing to put a little setup time into gcc. There are also good tutorials into getting it all up and running.

zik
Sep 10, 2007, 03:29 AM
I agree with poynting. It seems like instead of running three processors you could run it all on the one processor. Having said that one of my autopilot designs does have two processors but the second one is only for manual override if the software in the main processor crashes.

SPI is designed as a single-master bus. I can envisage ways it could be made to work as multi-master, say by using a CSMA-type "it's my bus now" approach. But it seems like a lot of extra complexity when there might be easier ways to get the same outcome.

Still, I like the fact that lots of people are trying different and interesting things. Good luck with your project Wulffy!

_helitron_
Sep 10, 2007, 05:11 AM
Agree fully with poynting and zik, have done a lot of multiprocessor applications in the past but don't like it anymore. I vote also for the ARM or the propeller. Only my 2ct.

//Erwin

XJet
Sep 10, 2007, 05:42 AM
Although a single-processor implementation can be commercially sensible when development time allows and production volumes are high enough to recoup the somewhat higher development costs, there are still advantages to multi-processor designs that can't be overlooked in applications such as UAV control systems.

For a start they can be more stable and fault-tolerant.

If you're relying on a single CPU to do all the work then (at a microcontroller level) a bug in any one task can effectively bring the whole thing down around your ears -- unless you're getting into really sophisticated chips capable of supporting a RTOS that has code/data protection and protected modes of execution.

It can also be a whole lot simpler to design and debug a system where each functional block is nicely compartmentalized into a block that runs on its own CPU.

The UAV control systems I've developed are multi-processor and it also allows for great flexibility by way of simply plug-and-play with the various modules (GPS, barometric, inertial, magnetic, telemetry, etc).

And, since the number of bugs generally increase at a rate that is decidedly non-linear as the code-size increases, it's much easier to "prove" the code before commissioning the final system.

toxicmouse
Sep 10, 2007, 07:21 AM
Wulffy, i can't help on the multi master SPI, but i am faced with a similar problem. a one chip solution would be great but there are advantages to breaking down the task into little modules. evidently your modules don't work independently, which is why you need comms. so i see your problem as one of inter-chip comms rather than memory, using memory is your solution.

maybe it is worth reviewing the tasks that each chip performs and try to reshuffle the tasks so that each chip works as independently as possible, and thus fewer comms are needed.

perhaps it is worth making the chip interfacing with the Maxstream a communications chip for the rest of the system.

failing the above i think it will be easier to get comms working between chips.

i think i am going to reread your post again.

[edit] in my system i had the sensors for measuring attitude read by the navigation chip- same as you. due to the high update rate needed, the comms rate between the nav chip and the control surfaces chip was very high. this caused problems, so i just connect the sensor to the control surfaces chip now. this vastly reduces the the amount of comms needed between chips, to around 1 message per second!!

dmgoedde
Sep 10, 2007, 07:36 AM
I faced similar issues, and I solved it rather successfully by using the 8-core Parallax Propeller IC. 32bit w/ floating point support, 20 MIPS/core for up to 160 MIPs total for $13, and but most importantly the RAM is shared between all 8 cores in a clever way where no conflict is possible when dealing with 32 bit longs... you can easily tag-team cores on tasks like this: core#1 intercepts and processes all GPS strings then updates RAM locations for lat, lot, alt, etc... while core#2 periodically accesses those same RAM locations and does some other value-added task such as navigation calculations. Those two cores are cooperating 100% w/o conflict. Same applies to intensive task such as IMU/Kalman calcs... one core reads the firehose of raw data from IMU via A2D converter, and another core does Kalman estimations, then core#3 takes polished Kalman state estimates and does lower level control calcs, while core #4 feeds servos a continuous uninterrupted stream of 50Hz pulses.

I hope I didn't go off on a tangent too much, and I don't want to disturb your current project with suggestions of switching MCU platforms/programming language. I have to pipe up though, because I feel like many people would benefit from the Prop IC, and not enough people know about it yet. In my opinion it is almost too perfect for these intensive apps. And no, I'm not on the payroll of Parallax inc.

As for concern of a single processor crashing and bringing whole system down around your ears, I can say two things about the Prop: 1) in my 1+ year playing/learning about prop I have seen ZERO crashes/lockups...when I did a goofy programming thing I just didn't get the result I wanted but the prop itself has NEVER crashed or locked up on me EVER. 2) Parallax designed the Prop for ultra-stable imbedded apps, and they did this by not using dynamic run-time thingies (its 4:41am on my all-nighter programming binge and I can't think of a better word) ... the compiled program is static with all objects and RAM locations written in stone.

Another thing I need to say is that learning "spin" language is pretty easy, and I was writting hard-core autopilot code in a matter of 1 month and I'm really not much of a programmer.

toxicmouse
Sep 10, 2007, 02:37 PM
can spot a new-born parallaxer from a mile away :)

dmgoedde
Sep 10, 2007, 02:46 PM
can spot a new-born parallaxer from a mile away :)

Ha!! I have to admit I have limited experience on other micros, and I don't know C++ or Java, etc... I'm running the danger of being more enthusiastic than knowledgable on broader points of this subject. I hope my enthusasm helps Mr Wulffy.

XJet
Sep 10, 2007, 11:38 PM
I faced similar issues, and I solved it rather successfully by using the 8-core Parallax Propeller IC. 32bit w/ floating point support,

I had a quick look at the Propeller and found *zero* support for floating point. Even a search of the support forums for "floating point" turned up nothing.

How are you going to squeeze any reasonable floating point routines into 2Kbytes of "cog" memory anyway?

It's a nice chip (effectively must a multiprocessor array with a shared bus and round-robin time-slicer) but I see significant limitations for UAV autopilot/guidance applications.

toxicmouse
Sep 11, 2007, 07:28 AM
is floating point really necessary? i can get all calculation within tolerance on an 8 bit MCU, with a little difficulty- but it works. i looks like dmgoedde is taking advantage of the rapid development times of the propellor.

suppergenus
Sep 11, 2007, 12:01 PM
[QUOTE=Wulffy]other have suggested FPGAs with embedded processor cores and memory. I'd like to think that there might be a way to successfully implement this without having to rely on any additional external components.[QUOTE]

Hello,

Let me repeat what I think I'm hearing:

You want multiple computing modules to share a common memory so that they don't have to talk to each other directly, they can each update their allocated section of the memory, and then the other units can read the updated memory and use it accordingly.

The FRAM device you are using is an SPI slave, so in order to have multiple uCs writing to it they must each be able to "master" the device. The complication here is that two devices may wish to write to or read from the FRAM at the same time which would screw everything up.

so.. this isn't perfectly correct, but I think this is what you were hoping for (note, this has problems of its own, but could be made to work):

You can do what you're looking for but it takes a good deal of handshaking and overhead. I would use another processor to supervise the process. Set your SPI module on each of the master to not use CS in hardware. Connect CLOCK, MISO, MOSI from each master to the slave (FRAM). Now you have a 3 bit bus connecting them all together. Instead of connecting CS from all the masters to the one slave, connect each master's CS to the supervisor.

When a master wants to talk to the slave it now signals that to the supervisor by its CS signal.

The supervisor will notice the event and if no other master is contending for the slave the supervisor will do two things: Enable CS (chip select) on the slave and signal BACK to the master (CS_RETURN? another port pin) that it is clear to go.

what if one master (A) tries to write to the FRAM while another (B) is reading from it? Well, the A master will enable CS to the supervisor. The supervisor will notice that, but since it already knows that master B isn't done reading it (its CS line is still enabled) it doesn't signal back the "ok, go ahead" CS_RETURN signal to Master A until master B is all done. When master A sees "all clear" on its CS_RETURN (made that name up) it is free to begin its transaction with the slave.

All written out like this I would say: For this kind of application FPGAs ARE WORTH THE LEARNING CURVE. You could easily implement an internal or external shared memory across multiple processor cores and access them using parallel bus lines instead of a serial protocol. In fact, thats probably the direction I would take it.

All that to say, what is your maximum needed data rate? I2C isn't THAT SLOW for an application whose update rate is 5Hz. For instance, your INU will need to be fast, but how often does it really need to provide that data to the system? Even if its 1KHz, I2C would be fast enough.

dmgoedde
Sep 11, 2007, 12:43 PM
I had a quick look at the Propeller and found *zero* support for floating point. Even a search of the support forums for "floating point" turned up nothing.

How are you going to squeeze any reasonable floating point routines into 2Kbytes of "cog" memory anyway?

It's a nice chip (effectively must a multiprocessor array with a shared bus and round-robin time-slicer) but I see significant limitations for UAV autopilot/guidance applications.

Wulffy - I spoke up because Prop does EXACTLY what you want. It automatically handles sharing of data in shared RAM across 8 cores. Just buy a cheapo demo-board and play with it. After a 1 month learning curve you'll be very tickled. Who cares about floating point math if you don't need it? I just started using FP math in prop, and only brought it up to show how great prop is.

XJet, Please read the forums again (and the manual). Prop does IEEE-754 compliant FP. Checkout http://obex.parallax.com/objects/category/6/. Directly, the Prop supports FP constants and variables. To do FP math, you use one of the objects shown in the link. I made the claim of FP because it is real and I use it to calc least squares fit on large data sets via scheme shown in attachment (that's just slope, I also calc R-square and other parameters). My routine doesn't use 2kB cog RAM, it is running from the 32kB of MAIN RAM. 2kB "cog" memory is mostly only used when doing assembly code in a cog, otherwise the main 32kB ram hold code for all threads running in the various cores (cogs).

I have proven Prop is very suited to UAV work. I'm using the Prop on my "AttoPilot" rather successfully! Maybe I'm ignorant of how an autopilot "should" work, but mine works well, and I'm juggling half a dozen intensive tasks simultaneously, most of which prepare data for the "mom" thread to use, so data sharing between cores is an integral reason I'm using the prop. 32kB doesn't sound like a lot, but I'm running a rather complex and extensive application with many contingencies and functionality built in, and only using about 20kB of the RAM at the moment.

suppergenus
Sep 11, 2007, 01:32 PM
Wulffy - I spoke up because Prop does EXACTLY what you want. It automatically handles sharing of data in shared RAM across 8 cores. Just buy a cheapo demo-board and play with it. After a 1 month learning curve you'll be very tickled. Who cares about floating point math if you don't need it? I just started using FP math in prop, and only brought it up to show how great prop is.

I have proven Prop is very suited to UAV work. I'm using the Prop on my "AttoPilot" rather successfully! Maybe I'm ignorant of how an autopilot "should" work, but mine works well, and I'm juggling half a dozen intensive tasks simultaneously, most of which prepare data for the "mom" thread to use, so data sharing between cores is an integral reason I'm using the prop. 32kB doesn't sound like a lot, but I'm running a rather complex and extensive application with many contingencies and functionality built in, and only using about 20kB of the RAM at the moment.

The propellor is an interesting device for sure, I giggle reading the datasheet. (laughing with it, not at it) The only downside I see is using the SPIN language instead of C with extensions. Probably not that big of a deal. That said, it does look like it would do great stuff for UAV work.

10,000 ways to skin this cat:)

and you're right, it would make Wulffly's life a lot easier than the multi master design.

XJet
Sep 11, 2007, 04:57 PM
is floating point really necessary? i can get all calculation within tolerance on an 8 bit MCU, with a little difficulty- but it works. i looks like dmgoedde is taking advantage of the rapid development times of the propellor.

So you're using look-up tables for your transcendental values then?

I suspect that our accuracy requirements are a little better than yours -- our system also needs to cope with things such maintaining a constant (on course) track despite the effect of things such as strong (cross) winds, etc.

Floating point just makes this stuff so much easier.

dmgoedde
Sep 11, 2007, 05:22 PM
So you're using look-up tables for your transcendental values then?

I suspect that our accuracy requirements are a little better than yours -- our system also needs to cope with things such maintaining a constant (on course) track despite the effect of things such as strong (cross) winds, etc.

Floating point just makes this stuff so much easier.

Prop has 32kB of ROM lookup tables built in, including Sine table of 0.04 degree precision, and Log/Anti-log tables. I made a quick algorithm that does reverse-lookup to support Arc-sine as well (it takes 1.01ms to run, which is fine for me).

As far as accuracy, I use floating point in Prop when I need extreme accuracy or don't want to run risk of rolling over the 32 bit integer limit, such as summation of all X or X^2 in a large data set.

As far as library extensions, that is what the prop is all about... Spin is object-oriented, and my current AttoPilot software calls about 20 extensions, many of which are pre-packaged from Parallax or others.

My $0.02 worth!!

dmgoedde
Sep 11, 2007, 06:58 PM
is floating point really necessary? i can get all calculation within tolerance on an 8 bit MCU, with a little difficulty- but it works. i looks like dmgoedde is taking advantage of the rapid development times of the propellor.

Yes, taking advantage for VERY quick development (it is shocking me in fact now that I got over initial learning curve). Also, doing integer math in 32 bits is great because of the headroom for +/-2 Billion integers, plus on Prop it automatically handles the sign bit during math operations, you just write the equations. The only time I use FP math is during the least-squares regression fitting of data, all other times I use signed integer math in 32 bits.

dmgoedde
Sep 11, 2007, 07:12 PM
The only downside I see is using the SPIN language instead of C with extensions. ...
and you're right, it would make Wulffly's life a lot easier than the multi master design.

I know from experience it would make Wulffy's life easier! I don't see SPIN language as a downside other than it is new, however Parallax essentially 'had' to make a new language tailored for the multi-core environment. The processor and language go hand-in-hand and work beautifully together.

What do you think Wulffy?

Wulffy
Sep 11, 2007, 11:26 PM
...What do you think Wulffy?
I think that the response herein has been the best I have seen yet, and serves to reinforce why the Internet, and specifically forums, are the neatest thing since sliced bread. But I digress...

Some of my ignorance is going to shine brightly here: I had understood that the RAM (beyond the intra-prop 2K ram [EDIT - I think I am mixing up devices mentally.?.]) was implemented via an i2c device. I figured that I could create some SPIN/ASM to arbitrate, and rely on the SPI hardware to maintain the data throughput.

One of the insightful replies, from SG, hit the nail on the head in describing my need. The platform that I am programming in is Coridium's ARMbasic - it exists on a family of devices that are ARM7 based and running at 60MHz. There is a virtually zero learning curve for me when it comes to the development environment. Not having to progress thru a curve for the environment, I can focus on the functionality. I have done just a bit of ASM many (MANY) moons ago (on the 6800 family).

I am looking for iteration rates of ~125hz+. The reason for this is that, in addition to the autonomous functionality of the platform, I am wanting very high accuracy and very high data rates. Why? Because I do... :)

Seriously, this post (http://www.rcgroups.com/forums/showpost.php?p=8059471&postcount=121) from this thread (http://www.rcgroups.com/forums/showthread.php?t=579335) serves to demonstrate one of the reasons that I am looking for these rates and accuracy. My goals are pretty well defined in this thread (http://www.rcgroups.com/forums/showthread.php?t=644892). Scripting support is also a very attractive feature set that I will strive to implement when base-line functionality is established.

EDIT: One other reason for high iteration rates is described in this post (http://www.rcgroups.com/forums/showpost.php?p=7796082&postcount=101). Having an SPI bus with multiple masters and multiple slaves may present some benefits that I may not be able to realize otherwise. The AOIMU IS a concept that I will eventually pursue, even though I now know that it was not a unique epiphany as there has been a thorough study on the proposed system topology...

This thread's responses have been top-notch!

I invite, encourage, and plead for the dialogue to continue. As I continue digest the content of what has already been posted, as well as future submissions, I will work at turning over rocks, expanding on the various sub-threads that will surely surface...

Thanks for all of the dialogue thus far.
Keep 'em coming!!!

-t

dmgoedde
Sep 12, 2007, 12:59 AM
Some of my ignorance is going to shine brightly here: I had understood that the RAM (beyond the intra-prop 2K ram [EDIT - I think I am mixing up devices mentally.?.]) was implemented via an i2c device. I figured that I could create some SPIN/ASM to arbitrate, and rely on the SPI hardware to maintain the data throughput.


If you're referring to Propeller main 32kB RAM, no it is 100% seemless, no i2c, just need one instance of one core passing RAM location of specified variables to the other core. Both cores can read/write to that location at will.


I am looking for iteration rates of ~125hz+. The reason for this is that, in addition to the autonomous functionality of the platform, I am wanting very high accuracy and very high data rates. Why? Because I do... :)


I can't let this go (I realize your attraction to stick with what you know and zero learning curve so you can focus on issue of flight and not programming for its own sake)... I have done Kalman-ish iterative loops at 145+ Hz on the prop, and that is data firehose of 7 channels from 12 bit A/D, 30 samples per channel per iteration (over 30,000 channel reads/second) because one core is accessing the microchip MCP3208 A/D via SPI and updating RAM at 145Hz for all 7 channels I need, while another core is un-encumbered to take the data and run some modelling at same 145Hz rate.

Please don't take my comments as disrepectful of your wishes! since I learned SPIN (I don't even know a shred of assembly for the prop) I have been able to 100% forget about tough programming issues and focus 100% on implementing things like linear quadratic regulator control loops, background data fitting during flight, etc...

Wulffy
Sep 12, 2007, 06:12 PM
...Please don't take my comments as disrepectful of your wishes!...
My friend, no offense taken. Conversely, I sincerely appreciate the comments. I will give them due consideration as my plan develops.

Thank you for your time.

-t

p.s. I've been bidding on some prop eval boards - maybe I can get one, if nothing else, to explore the device with...