durusmail: quixote-users: Re: timestamp in upload.py prone to inaccuracy (patch)
timestamp in upload.py prone to inaccuracy (patch)
2003-03-16
timestamp in upload.py prone to inaccuracy (patch)
2003-03-17
Re: timestamp in upload.py prone to inaccuracy (patch)
2003-03-17
Re: timestamp in upload.py prone to inaccuracy (patch)
2003-03-17
Re: timestamp in upload.py prone to inaccuracy (patch)
2003-03-17
Re: timestamp in upload.py prone to inaccuracy (patch)
2003-03-17
2003-03-17
Re: timestamp in upload.py prone to inaccuracy (patch)
2003-03-17
2003-03-17
Re: timestamp in upload.py prone to inaccuracy (patch)
Graham Fawcett
2003-03-17
Greg Ward wrote:
> On 15 March 2003, Graham Fawcett said:
>
>>I'm writing an application that takes multiple file uploads on a page. I
>>discovered that the naming of uploaded files is based on a timestamp,
>>and can only ensure unique filenames if no more than one file is
>>uploaded per second.
>
> Not true on Unix at least -- the files would have to come in at more
> than one per *millisecond*, since the timestamp uses the floating-point
> part of time.time() as well.  But maybe time.time() doesn't act like
> that on all systems.  Can you fire up an interpreter and try this:
>

You're right, of course -- that was a typo on my part, the "milli" key
on my keyboard must have gotten stuck. ;-) But of course it's possible
for more than one file upload to occur within that period of time (small
files, fast hardware).

> Please don't use absolute diffs -- I prefer unified ("diff -u") myself.
> Easier to read, and much more likely to work if the baseline code has
> changed since you prepared your patch.

I agree... I guess my "-u" key was stuck as well...

> Anyways, I think a better approach would be to tack on a random number,
> and keep trying until the chosen filename does not exist.  Here's a
> first crack; this is completely untested, not threadsafe (because the
> standalone functions in random.py are not threadsafe), and it's subject
> to a fairly obvious race condition:

As long as the randint range is sufficiently large, there shouldn't be
too many collisions. I'd go for randint(0, 10**9) unless there's a good
reason not to.

How about letting the OS do the collision detection -- skip the "if
doesn't exist, then create it" idiom, just try to create it and let the
OS pick the winner? That removes the thread safety issue, I think, since
we're leaving the locking to the OS.

Code below is untested...

--- \projects\Quixote-0.6b5\upload.py   Sat Mar 15 22:23:56 2003
+++ \temp\upload.py     Mon Mar 17 10:49:18 2003
@@ -12,4 +12,5 @@

  import os, string
+import random
  from cgi import parse_header
  from rfc822 import Message
@@ -152,9 +153,19 @@
      def receive (self, file, boundary, dir):
          now = time()
-        tstamp = (strftime("%Y%m%d.%H%M%S", localtime(now)) +
-                  ("%.3f" % (now % 1))[1:])
-        filename = "upload.%s.%s" % (tstamp, os.getpid())
-        filename = os.path.join(dir, filename)
-        ofile = open(filename, "wb")
+        ofile = None
+        last_exc = None
+        for i in range(1000):
+            filename = '%d.%d.%d' % (now, os.getpid(),
+                                     random.randint(0, 10**9))
+            filename = os.path.join(dir, filename)
+            try:
+                ofile = open(filename, "wb")
+                break
+            except IOError, ioe:
+                # I would like to see the last exception if this
+                # really did fail...
+                last_exc = ioe
+        else:
+            raise last_exc
          done = read_mime_part(file, boundary, ofile=ofile)
          ofile.close()



-- Graham



reply