Re: [Durus-users] Interesting BTree failure

On Aug 15, 2011, at 10:12 AM, David Hess wrote:

>
> On Aug 15, 2011, at 8:50 AM, David Binger wrote:
>
>> This sounds like the offset mapping in memory is somehow out of sync with the
file.
>> We've never seen this condition,  which is obviously bad.  I'm concerned
>> about this and trying to figure out how it could occur.
>
> Yes, if the packing implementation was an issue, I would have thought we would
just have missing oids (or is that a durus_id now?) - but I also have evidence
of an object whose class is BNode trying to load state of a pickle that was
clearly associated with an application object.

Yes, the oid is now called "durus_id".

I just meant that the BNode class doesn't have any special state management code
that could be causing the problem.  I think BNode is involved only because
that is where your actions happen to be.

>
> Based on what I did to repair the database, it appears this corruption only
affected the items that were touched when the modification to the BTree was made
during the pack (interior BNodes and the application objects). It's almost as if
there is a race problem during packing having to do with either durus_id
allocation or file writing.

I would be interested in the details of the repair.

I don't remember the details of how you are using FileStorage.
Do you have just one thread/process using the FileStorage?  I hope so.

>
>> It seems unlikely to have anything in particular to do with BTree or BNodes:
those structures
>> are implemented in the same way as other persistent objects.  I think, from
your
>> other message, that you are looking closely at the packing code, and that
>> seems like a good idea.  There was a time when you talked about
>> overriding a '_p_' method.  Does the code you are running involve
>> any such low-level modifications?
>
> That's on one particular and isolated class that was not involved in this
situation. All it does is ignore the ghosting request from the cache manager
(which is safe because we never have conflicts).

That does seem safe.   Is there any code in the cache manager, though, that
calls the ghosting function and then takes some action based on the assumption
that the ghosting worked?

>
>> In the current code tree, File.write() ends with self.file.flush() under all
conditions.
>> Is that the case in the code you are running?
>
> Here's the implementation we are running from on Durus 3.7:
>
>     def write(self, s):
>         self.obtain_lock()
>         self.file.write(s)

The new code adds self.file.flush().  Without it, certain file systems
lose track of the correct end of the file, so a subsequent seek to the end
of the file ends up in an incorrect place.
In a pack, an offset is written into the header of the file and then
there is a seek to the end of the file.  It is important for
this to work correctly.  It could possibly be the reason
for the trouble you have seen.

>
> I noticed that DurusWorks 1.0 has been released but the CHANGES.txt files are
gone. Should we upgrade to DurusWorks 1.0 instead of 3.8? It looks like Durus is
no longer distributed separately?

Yes, DurusWorks is a new distribution that includes Durus.
I have not yet sent out an announcement.  Sorry.
I'll start a new CHANGES.txt file in the next release.
I'd recommend downloading it and diffing the durus package
against the one you have.  This code is not changed
much, so I hope that is not too much trouble.
I would upgrade.

>
> Thanks.
>
> Dave
>
>> On Aug 14, 2011, at 12:44 PM, David Hess wrote:
>>
>>> Afterwards, the oids seem to be jumbled up in this BTree (at least - maybe
elsewhere in the database too). Unpickled Persistent objects are not what they
should be - interior BNodes are sometimes application classes and stored values
are sometimes BNodes rather than application classes.
>
>