Maker Pro
Maker Pro

EEPROM checksum error

D

Dummy

Jan 1, 1970
0
What is the possible cause of EEPROM checksum error?
Could magnetic field corrupt the EEPROM data? Any design guidelines to
prevent this potential failure?
 
L

Luhan Monat

Jan 1, 1970
0
Dummy said:
What is the possible cause of EEPROM checksum error?
Could magnetic field corrupt the EEPROM data? Any design guidelines to
prevent this potential failure?

Eeproms, as far as I know, do not have this feature. Something else
(the programming device) must be making one up and storing it in the
eeprom - somewhere.

You will need to provide more specific information.
 
K

Ken Taylor

Jan 1, 1970
0
Dummy said:
What is the possible cause of EEPROM checksum error?
Could magnetic field corrupt the EEPROM data? Any design guidelines to
prevent this potential failure?

When did this occur? After a long time in circuit? After programming?

Ken
 
D

Dummy

Jan 1, 1970
0
Ken Taylor said:
When did this occur? After a long time in circuit? After programming?

Ken

The range of usage spanned from 6 months to 2 years. EEPROM is used in
mobile radios. All radios are equipped in car or truck. Suspected to
be noise at the supply line that caused the EEPPROM checksum error,
but experiment showed that the noise injected at supply line is
filtered by regulator circuit. Power supply line to EEPROM is
confirmed to be clean regardless the amount of noise exist at main
supply. What could have caused the checksum error?
 
R

Robert Monsen

Jan 1, 1970
0
Dummy said:
The range of usage spanned from 6 months to 2 years. EEPROM is used in
mobile radios. All radios are equipped in car or truck. Suspected to
be noise at the supply line that caused the EEPPROM checksum error,
but experiment showed that the noise injected at supply line is
filtered by regulator circuit. Power supply line to EEPROM is
confirmed to be clean regardless the amount of noise exist at main
supply. What could have caused the checksum error?

Make sure you follow all the recommendations of the manufacturer of the
eeprom.

Is the eeprom programmed in the field, or is it just programmed once at
the factory and then used from then on? Is it possible there is a
software error causing this? EEPROMs usually use keyed programming
sequences to prevent inadvertent corruption.

Make sure you lock out interrupts while programming the thing.

--
Regards,
Robert Monsen

"Your Highness, I have no need of this hypothesis."
- Pierre Laplace (1749-1827), to Napoleon,
on why his works on celestial mechanics make no mention of God.
 
K

Ken Taylor

Jan 1, 1970
0
Robert Monsen said:
Make sure you follow all the recommendations of the manufacturer of the
eeprom.

Is the eeprom programmed in the field, or is it just programmed once at
the factory and then used from then on? Is it possible there is a software
error causing this? EEPROMs usually use keyed programming sequences to
prevent inadvertent corruption.

Make sure you lock out interrupts while programming the thing.
All that good stuff....

Also, are these two-way radios? Do the EEPROM's get altered during normal
use, in which case is it possible RF is causing problems?

Ken
 
P

Pooh Bear

Jan 1, 1970
0
Dummy said:
What is the possible cause of EEPROM checksum error?
Could magnetic field corrupt the EEPROM data? Any design guidelines to
prevent this potential failure?

*What* checksum ? How do you calculate 'your' checksum ?

What type of Eeprom ? 24Cxx family for example ?

Far too little info supplied to meaningfully respond.


Graham
 
P

Pooh Bear

Jan 1, 1970
0
Robert said:
Make sure you follow all the recommendations of the manufacturer of the
eeprom.

Is the eeprom programmed in the field, or is it just programmed once at
the factory and then used from then on? Is it possible there is a
software error causing this? EEPROMs usually use keyed programming
sequences to prevent inadvertent corruption.

Make sure you lock out interrupts while programming the thing.

The serial interface is timing tolerant IME. Never seen false data as a result of background
interrupts.


Graham
 
H

Harold Ryan

Jan 1, 1970
0
Checksum is just the addition of each byte of data. At the end of the file,
another byte or word is added that will total all of the bytes to zero. If
any of the bytes are corrupt, the total sum of all the bytes will not be
zero. A loose wire or strong magnetic field may cause this problem.
Harold
 
S

Spehro Pefhany

Jan 1, 1970
0
The range of usage spanned from 6 months to 2 years. EEPROM is used in
mobile radios. All radios are equipped in car or truck. Suspected to
be noise at the supply line that caused the EEPPROM checksum error,
but experiment showed that the noise injected at supply line is
filtered by regulator circuit. Power supply line to EEPROM is
confirmed to be clean regardless the amount of noise exist at main
supply. What could have caused the checksum error?

What kind of EEPROM? Data corruption in EEPROMs is not uncommon-
caused directly by electrical noise, or by faulty design of the
controlling microprocessor system, either wrt to EMI or power supply
supervision. Redesign to decrease EMI susceptibility and PS issues,
and then (and ONLY then) tweaks to add redundancy to the non-volatile
storage can reduce the issue to insignificance even for large
quantities of units in challenging applications.


Best regards,
Spehro Pefhany
 
P

Pooh Bear

Jan 1, 1970
0
Harold said:
Checksum is just the addition of each byte of data. At the end of the file,
another byte or word is added that will total all of the bytes to zero. If
any of the bytes are corrupt, the total sum of all the bytes will not be
zero. A loose wire or strong magnetic field may cause this problem.
Harold

I'm broadly familiar with this thanks. I'm less familiar with why Eprom
programmers of old seemed to produce different checksums according to
manufacturer.

The OP still hasn't explained *what checksum* he's talking about under what
conditions.

Can he even validate the file ?


Graham
 
L

Lord Garth

Jan 1, 1970
0
Pooh Bear said:
I'm broadly familiar with this thanks. I'm less familiar with why Eprom
programmers of old seemed to produce different checksums according to
manufacturer.

The OP still hasn't explained *what checksum* he's talking about under what
conditions.

Can he even validate the file ?


Graham

Maybe a CRC was used rather than a checksum. How old is the code in the
EPROM?
 
D

Dummy

Jan 1, 1970
0
Pooh Bear said:
I'm broadly familiar with this thanks. I'm less familiar with why Eprom
programmers of old seemed to produce different checksums according to
manufacturer.

The OP still hasn't explained *what checksum* he's talking about under what
conditions.

Can he even validate the file ?


Graham


The EEPROM will be programmed in factory before shipping out to
customer.
Everytime when radio is turned on, checksum will be verified. Checksum
error will occur when any bytes are corrupted in the EEPROM. If data
corrupted during radio ON, any checksum error won't be detected until
the next radio turned OFF and ON cycle.

The corrupted bytes are at random EEPROM address.
Some of the parts could be recovered after re-programming while some
could not. For those parts which damaged permanently, failure analysis
showed cell overwritten. Trying to inject some noises to EEPROM data
or supply line while performing write operation could cause checksum
error. But all the voltages supplied to EEPROM are clean when in
normal use. The filter and regulator have taken care of the noises. So
it's not right to point to the noise as the culprit.

Most of the radios failed after being in the field from 6 months to 2
years.
 
R

Robert Monsen

Jan 1, 1970
0
Dummy said:
The EEPROM will be programmed in factory before shipping out to
customer.
Everytime when radio is turned on, checksum will be verified. Checksum
error will occur when any bytes are corrupted in the EEPROM. If data
corrupted during radio ON, any checksum error won't be detected until
the next radio turned OFF and ON cycle.

The corrupted bytes are at random EEPROM address.
Some of the parts could be recovered after re-programming while some
could not. For those parts which damaged permanently, failure analysis
showed cell overwritten. Trying to inject some noises to EEPROM data
or supply line while performing write operation could cause checksum
error.

You say they are preprogrammed, but this implies that you are writing
them during normal operation. Which is it?
But all the voltages supplied to EEPROM are clean when in
normal use.

^ Famous last words. :)

The filter and regulator have taken care of the noises. So
it's not right to point to the noise as the culprit.

Most of the radios failed after being in the field from 6 months to 2
years.

If the eeproms aren't being reprogrammed in the field during normal use,
then a software error is unlikely, unless the magic write sequence is
stumbled upon during a freak crash. If they *are* being reprogrammed
(ie, you are saving some value when the user retunes the radio) then
I'll again say software. I'm telling you, lock out those interrupts!

The other possibility is a bad batch of eeproms. This is fairly
unlikely, but not without precedent*. Attempt to correlate the bad ones
with some lot. Talk to the manufacturer, and ensure that they don't have
a 'known' problem. Also, I wouldn't reuse the corrupted ones just
because you managed to program them. I'd swap them out as soon as practical.

* A company I used to work for decided to save 10 cents a ram chip and
forgo individual testing of the chips by the manufacturer. Sadly, it
turned out that those chips were bad 5 to 10 percent of the time. They
were selling high availability purple ethernet switches for hundreds of
thousands of dollars each. The engineer responsible was of course
promoted to VP, and given vast new responsibilites.

--
Regards,
Robert Monsen

"Your Highness, I have no need of this hypothesis."
- Pierre Laplace (1749-1827), to Napoleon,
on why his works on celestial mechanics make no mention of God.
 
S

Spehro Pefhany

Jan 1, 1970
0
The EEPROM will be programmed in factory before shipping out to
customer.
Everytime when radio is turned on, checksum will be verified. Checksum
error will occur when any bytes are corrupted in the EEPROM. If data
corrupted during radio ON, any checksum error won't be detected until
the next radio turned OFF and ON cycle.

The corrupted bytes are at random EEPROM address.

Okay. As I suspected.
Some of the parts could be recovered after re-programming while some
could not. For those parts which damaged permanently, failure analysis
showed cell overwritten. Trying to inject some noises to EEPROM data
or supply line while performing write operation could cause checksum
error. But all the voltages supplied to EEPROM are clean when in
normal use.

Well, what about "abnormal" use, say something that might happen only
rarely? Are you claiming that the supply voltage on these parts was
maintained at 5.0V +/- 5% constantly, never straying lower or higher,
from factory to failure? And noise injected from the supply or other
pins could cause the micro's PC to point to random bits of code.
The filter and regulator have taken care of the noises. So
it's not right to point to the noise as the culprit.

I sure don't think you can conclude that.
Most of the radios failed after being in the field from 6 months to 2
years.

My original comments definitely apply to this situation. Can you post
a link to the schematic of the power supply, micro and EEPROM?


Best regards,
Spehro Pefhany
 
C

Charles Edmondson

Jan 1, 1970
0
Dummy said:
The EEPROM will be programmed in factory before shipping out to
customer.
Everytime when radio is turned on, checksum will be verified. Checksum
error will occur when any bytes are corrupted in the EEPROM. If data
corrupted during radio ON, any checksum error won't be detected until
the next radio turned OFF and ON cycle.

The corrupted bytes are at random EEPROM address.
Some of the parts could be recovered after re-programming while some
could not. For those parts which damaged permanently, failure analysis
showed cell overwritten. Trying to inject some noises to EEPROM data
or supply line while performing write operation could cause checksum
error. But all the voltages supplied to EEPROM are clean when in
normal use. The filter and regulator have taken care of the noises. So
it's not right to point to the noise as the culprit.

Most of the radios failed after being in the field from 6 months to 2
years.

Take a GOOD look at power up and power down sequences. A few years ago,
a vendor of mine was having problems with a similiar situation, where an
EEPROM kept getting programmed to random bits here and there. Seemed
that on start up (this was on a parallel port) there were voltage
glitches that JUST HAPPENED to mimic the programming sequence on the
device, which was not supposed to be field programmable! Since this was
a security dongle, and the bits were sometimes the security ID codes,
this was considered a very bad thing!

So, take a look at what occurs during start up and shut downs, and see
if there are any glitches then that can cause you problems!
 
R

Robert Monsen

Jan 1, 1970
0
Charles said:
Take a GOOD look at power up and power down sequences. A few years ago,
a vendor of mine was having problems with a similiar situation, where an
EEPROM kept getting programmed to random bits here and there. Seemed
that on start up (this was on a parallel port) there were voltage
glitches that JUST HAPPENED to mimic the programming sequence on the
device, which was not supposed to be field programmable! Since this was
a security dongle, and the bits were sometimes the security ID codes,
this was considered a very bad thing!

Were they actually able to observe this, or was it assumed? Dealing with
hardware/software interfaces, it is quite common for programmers to
blame software bugs on hardware 'glitches'. I've seen this again and
again. It is usually a bug that just seems to come and go, possibly due
to some unrelated change in the software that changes the timing or
place in memory where a random pointer is hitting. I have made a living
out of consulting on these kinds of issues.
So, take a look at what occurs during start up and shut downs, and see
if there are any glitches then that can cause you problems!

Yet another goblin to beware of. Thanks.

--
Regards,
Robert Monsen

"Your Highness, I have no need of this hypothesis."
- Pierre Laplace (1749-1827), to Napoleon,
on why his works on celestial mechanics make no mention of God.
 
C

Charles Edmondson

Jan 1, 1970
0
Robert said:
Were they actually able to observe this, or was it assumed? Dealing with
hardware/software interfaces, it is quite common for programmers to
blame software bugs on hardware 'glitches'. I've seen this again and
again. It is usually a bug that just seems to come and go, possibly due
to some unrelated change in the software that changes the timing or
place in memory where a random pointer is hitting. I have made a living
out of consulting on these kinds of issues.


Yet another goblin to beware of. Thanks.
From their rep, it had definitely been observed. Device had worked for
years, then they came out with a new package. New package also came at
same time they went to a new fab, which had different processes. New
processes made the programming sequence MUCH MORE sensitive, so that
random glitches now created random bits programmed in their devices.
They needed to replace a whole lot of devices in the field, and got a
lot of bad will because of the random failues. We are still replacing
these as they go bad...

You see, we only use on small field on the whole EEPROM. Problem
doesn't happen every time, and may be worse on some system, and less on
others. Also, some people just don't use the things that often to break
them!
 
D

Dummy

Jan 1, 1970
0
Robert Monsen said:
Were they actually able to observe this, or was it assumed? Dealing with
hardware/software interfaces, it is quite common for programmers to
blame software bugs on hardware 'glitches'. I've seen this again and
again. It is usually a bug that just seems to come and go, possibly due
to some unrelated change in the software that changes the timing or
place in memory where a random pointer is hitting. I have made a living
out of consulting on these kinds of issues.


Yet another goblin to beware of. Thanks.

Introducing random noise at supply line won't be able to cause any
glitches at EEPROM lines because the noise was riding on the supply.
However, when glitches are introduced to main supply line by creating
a temporary dip of voltage at certain period, glitches can passed
through to EEPROM lines.

Previously, we have been able to see the EEPROM checksum error by
introducing noise to EEPROM directly, bypassing the regulator. So I
reckon that if the glitches get through the regulator, most probably
checksum error will occur. We are checking on that. We thought of
noise, but missed out the glitches.

If that's the root cause, any method to prevent glitches? I guess
regulator is only able to filter the noise that rides on the Vcc. Any
sudden dip in voltage is not recoverable.
 
C

Charles Edmondson

Jan 1, 1970
0
Dummy said:
Introducing random noise at supply line won't be able to cause any
glitches at EEPROM lines because the noise was riding on the supply.
However, when glitches are introduced to main supply line by creating
a temporary dip of voltage at certain period, glitches can passed
through to EEPROM lines.

Previously, we have been able to see the EEPROM checksum error by
introducing noise to EEPROM directly, bypassing the regulator. So I
reckon that if the glitches get through the regulator, most probably
checksum error will occur. We are checking on that. We thought of
noise, but missed out the glitches.

If that's the root cause, any method to prevent glitches? I guess
regulator is only able to filter the noise that rides on the Vcc. Any
sudden dip in voltage is not recoverable.

For you, this might be a design issue. As power ramps up/ramps down,
different components react differently. Some have internal caps that
make them hold state a little longer than others, or are just more
sensitive to power supply levels. Think about the programming sequence.
What could provide it in your circuit. What could PREVENT it in your
circuit!

I have often found that start up conditions are not fully considered in
design. You just assume that the power comes up all at once, smoothly.
In reality, different voltage rails come up differently. Filter caps
take time to charge up to voltage. Good design takes that into account,
sometimes adding POR circuits to make sure that power is steady before
starting things up, and quick shut down sequences to turn everything off
before the power goes below limits. It's like preventing race
conditions and logic glitches. Sometimes, you just have to take a good
look at the failure modes...
 
Top