The time required to sort a file with the postman's sort can can be significantly reduced by careful specification of the sorting keys. This distinguishes the Postman's sort from other programs. To maximize the speed of the sort it is useful to have a basic idea of how it works. The following illustrates the operation of the sort by analogy with the way the post office sorts letters.
Example: Suppose we are to sort a years worth of accounting transactions by date where the date field is in format MMDD. We could simply use:
Note that the amount of time required to sort the file will be strictly proportional to the file size.
Suppose that all the sorting keys have the same data in the first bytes. If the data does not fit in memory, the whole file could be distributed to one list and paged out to disk. This would cost a lot of time with no contribution made to the sorting task. If records have many equal bytes at the beginning of the keys, the file could be copied many times before the sorting process actually starts to take place. PSORT deals with this by recognizing when a list starts to contain "too many" records and subdivides the list before writing it out to disk. The number of times a list can be subdivided is determined by the segment size set by he -l switch. Large files with large numbers of long keys with equal bytes will be sorted much faster if the -l switch is used to increase the segment size. This can be increased to 63K for 16 bit versions and upto about 20 percent of available memory for 32 bit versions. Using the -k switch to specify the type of data expected in the fields will permit psort to more efficiently allocate memory and increase the probability that the sort will in complete in record time.
The lesson of all this is that giving PSORT more information about the sorting keys will almost always speed up the sort.
By permitting overlapping asynchronous i/o when writing to work files, a good Disk Caching program will speed up the sort when the whole file will not fit in memory.
Beware of virtual memory systems. PSORT detects true installed memory on all platforms tested except for Win32s. PSORT limits its allocation of memory to an amount less that the available physical memory. However there may be some situations where the operating system does not provide the true available memory and/or will grant a request for memory from virtual memory. This will result in excruciatingly slow sorting times. This might occur with some DOS memory management programs, some UNIX platforms, Win32s, OS/2 when MEMMAN is set to SWAP, as well as others. If sorting large files seems to take disproportionately long and seems to require an excessive amount of disk activity, try using the -m switch to specify the maximum of memory that PSORT should use.