Shouldn't be too tough, and I wouldn't be surprised if someone can come up with something better than this, but here is one idea.
Description:
Connect s1 to vcc, but through a capacitor and resistor in series. The resistor would be small enough that s1 is high when the cap is uncharged. (on powerup) Include a resistor to ground, after that so that after the cap fills up, s1 goes low and stays there. A third cap bypasses the cap, so that the cap can drain when the power is off.
See attachment. (C1 is labled, but ignore that. I forgot to delete the label when I made the pic.)
As far as the sizes of R1,2,3. I'm not sure how to start, but here are the constraints.
R3 should be large compared to R2. (10x larger should be fine, maybe 5x)
R2 should be large compared to R1( 5x to 10x again)
The R1, C1 time constant should be way more than the lenth of one clock cycle.
The R3, C1 time constant will be approximately the amount of time you have to leave the device off to reset the data bits to HLLL
For instance, if R3 is 100 times the value of R1 then the reset time will be 100 times as long as a clock cycle. If your clock cycle is .001 sec, then you will have to stay powered down for at least .1 sec for to reset. There should be a very wide range of values that work.
I suppose I'm ignoring any current draw into S1 itself, but if it's quite small, compared to current through R1 on power up, then it can also be ignored. If it's not small enough, then it might further constrain the value of R1. (R1 would have to be small enough that the current through R1 greatly excedes the current draw into S1)
(At the risk of being redundent, heres the full strategy)
On power up, C1 can be assumed empty(due to R3), and S1 sees close to Vcc. (r2, r3 are large compared to R1 so they are ignored for that.)
Then after one or more clock cycles, c1 fills up enough that R2 controls what is going on. (R3 is large compred to R2 so it can be ignored for that)
So S1 goes low and stays there as long as the power is on.
When you power down, no more current flows through R1 or R2, so R3 is the only thing that matters and the cap discharges so its ready for next time.
Basically I like to ignore things when possible, don't I?
If something doesn't make sense, let me know, but if you get the general ideal then just play around with the values until you find what works. I think you'll have A LOT of leeway in value selection with this design.
--tim
(hope the attachment worked)