File Naming

From Daily Data

Jump to: navigation, search

Filenames are weird. While you can, in many cases, use anything you can type from the keyboard (especially under Apple's OS X), other computers may not be able to read the file simply because you have put some characters into the file name that the other computer can not open. Following are some guidelines to help you form a habit of good filenames.

Contents

Summary

For maximum safety and portability, follow these guidelines.

  1. Do not use any characters except alphabetic (a-z), numeric (0-9), underscore (_). Also, there should only be zero or one period in the file name, and if it exists it should separate the file name from the file type (for Windows and OSX). You can use dashes (-), but not as the first character of a file name.
  2. File names should not exceed 31 characters, including extension.
  3. Remember that only Windows is case insensitive; all other operating systems (and this includes web and ftp sites hosted on those OS's) are case sensitive, so A.txt and a.TXT are two different file names.
  4. All other characters, including spaces, require additional processing to be recognized and can interfere with scripts on all three operating systems.

If you can follow these rules, the following read is not necessary.

Invalid Characters in File Names

The following are a list of invalid characters for each of the operating systems. Note, these are invalid; they can not be used. However, there are additional characters that can cause you problems, which we'll go into in the later on.

Windows

* < > [ ] = + " \ / , . : ;
(Asterisk, Brackets, Equal sign, Plus sign, Quotes, Slashes, Comma, Period, Colon, Semicolon)

Apple OSX

None Exist  

Unix

 /
 (forward slash)

Web

 /\:*?"<>|
 (forward slash, back slash, colon, asterisk, question mark, double quotes, less than, greater than, "pipe")

Combined

The following are illegal on one or more of the mainstay personal operating systems

 ? < > | * [ ] = + " \ / , . : ;
(question mark, less than, greater than, pipe, asterisk, left and right square bracket, equals, plus, double
 quote, back and forward slash, comma, period except as a separator, colon, semi-colon)

Detailed File Naming Rules

The following list is taken from [http://www.portfoliofaq.com/pfaq/FAQ00352.htm], a fairly exhaustive study of file name characters to avoid for various operating systems. See the above for additional information.

  1. Do not use illegal filename characters, (e.g. : or ?). (All OSs).
    • Illegal filename characters: \ (backslash), / (forward slash), : (colon), * (asterisk), ? (question mark), " (double quotes), < (left angle bracket), > (right angle bracket), | (pipe). Most of these are Windows OS constraints; Mac allows all except a colon (though a forward slash, /, can cause issues for POSIX paths). The aim here is to allow problem-free cross-platform use. An all-Windows or all-Mac organisation may need to interact with others using different OSs, so the safe method is to observe both OS' limitations, even if you mostly/always work on only one OS.
  2. Do not use deprecated filename characters (; and ,). (All OSs).
    • Avoid %, #, and $ as these are commonly used as variable name prefixes, so it can get messy if automating anything with filenames that include these characters. If networking cross platform (e.g. Samba, SMB, CIFS) consider effects of !+{}&[] on path and filename translation.
    • Where possible avoid spaces in filenames (though not strictly necessary, they can complicate things, especially if scripting). You are best advised to stick to alphanumerics, underscores, hyphens, periods. Do not use a hyphen or a period as the first or last character of a filename as this can have special meaning on some OSs (e.g. a starting period often indicates the file is hidden or system file that is not displayed to non-admin user accounts).
    • File/folder delimiters: Mac Classic uses a colon, Mac OS X uses either forward slash (POSIX paths) or colon ('Mac' paths), Unix uses a forward slash and Windows a backslash (plus colon for drive letter).
    • Deprecated filename & path characters: , (comma), ; (semi-colon), (space), • (bullet = ASCII #149), % (percent), & (ampersand). The 'bullet' character has no special significance but does seem popular as a form of name punctuator amongst some Mac users but it can cause unreadable filenames in a cross-platform environment.
    • Unix has few limitations. Filenames may be up to 256 characters. A forward slash (/) is a folder delimiter and a leading period (.) makes that file a system file.
  3. Keep file names less than 32 characters (ie, 31 or less) including extension. (Mac Classic).
    • Windows, Unix and Mac OS X all support long filenames. References conflict as to whether exactly 256 or 255 characters are allowed (in Windows this includes the extension). However, as no current OS will allow you to create a filename exceeding this and all have the same top limit, this is one constraint you are unlikely to have to test for!
    • Consider the impact of long filenames on their display in OS and program dialogs, web pages etc. For instance Firefox won't wrap long strings of alphanumeric characters (nor can you force it to do so via code) so that long filenames may 'break' web layouts not designed with this issue in mind.
  4. Some systems have problems with file names over 64 characters, including extension. (Windows: ISO9660+Joliet CD or Hybrid CD partition).
    • Windows, Unix and Mac OS X all support long filenames. References conflict as to whether exactly 256 or 255 characters are allowed (in Windows this includes the extension). However, as no current OS will allow you to create a filename exceeding this and all have the same top limit, this is one constraint you are unlikely to have to test for!
    • The 'Joliet' standard for Windows CD supporting long filenames has a limit of 64 characters for the total path (folders & filename). If you need deep nesting, use short folder and file names! Also watch for limitations on the depth of folder nesting that can occur with strict observance of ISO9660 (see first bullet in next list). This is something that is rarely explained in CD/DVD burning software and is more likely to bite when using older systems/software. If path length limitations are a likely issue then make sure you test before starting any large body of work.
  5. No extension - extensions are mandatory for Windows and the only means for Portfolio to tell file type. (Windows, Mac OS X).
    • Whilst modern 32-bit Windows (post Win 95) is will tolerate files without extensions it does not know how to handle them. In addition some apps use the OSs 'knowledge' of file types to help it when finding/opening files - i.e. if the OSs is confused so to will be some apps. Bottom line, if Windows use is likely always have file extensions. OSX usage seems to be drifting towards the extension model now resource fork info is becoming a legacy issue.
  6. Filenames should not have more than one period - Portfolio may misinterpret extension. (Windows, Mac OS X).
    • Mac, making less deliberate use of extensions allows periods in names. Portfolio will assume the set of characters after the right-most period in the name is the Windows-compatible extension. So if you must have, for example, "myname.txt.stuff" for a text file, better to call it "myname.stuff.txt"; Portfolio will read the first as having extension ".stuff", whilst the second will be read as ".txt".
    • If you intend to transmit files over the internet (especially as an email attachment) multiple periods such as "filename.tif.zip" will often get the file killed in transit. This is because IT systems that see files with "multiple extensions" as a risk since the true nature of the file may be "masked" so that a worm/virus can inflict damage. See also the notes on #9 below - more than one period in a path (whether folder or filename) may be intepreted by a well-secured web server as an illegal path (possible hacking attempt) and the file at the path won't be served.
  7. Extension may be wrong, i.e. not 3 characters. (Windows, Mac OS X).
    • Windows convention is for three letter extensions but more or fewer characters are not unknown, e.g. "logo.ai" (Illustrator) "page.html" (some HTML editors' output). The point of this rule is not to insist extensions of 3 characters but to flag up odd - normally Mac created names - that might have problems on Windows computers.
  8. Illegal characters in path to file - same issue as #1 but for path. (All OSs).
    • Illegal path characters: as above minus backslash and colon.
  9. Deprecated characters in path to file - same issue as #2 but for path. (All OSs).
    • Deprecated filename & path characters: , (comma), ; (semi-colon), (space), • (bullet = ASCII #149), % (percent), & (ampersand). The 'bullet' character has no special significance but does seem popular as a form of name punctuator amongst some Mac users but it can cause unreadable filenames in a cross-platform environment.
    • Unix has few limitations. Filenames may be up to 256 characters. A forward slash (/) is a folder delimiter and a leading period (.) makes that file a system file.
    • Periods, though allowed in filenames, are deprecated as they aren't supported in stricter ISO9660 versions (without Joliet) and on some older systems such as VMS.
    • For web use, periods in filenames are deprecated as some web servers (especially IIS), when fully security patched, will not serve content if the URL has any folder names containing a period (effectively the underlying server 'rule' being applied is to check the path for > one period, assuming the filename at the end of the path contains a single period).
  10. Filename may not begin with a period. (Windows not allowed, Mac treats as a hidden file)
    • You are well advised not to use commas as a starting or finishing character - they are likely to get missed when reading by eye.
  11. Filename may not end in a period. (Windows not allowed - OS 'throws away' the trailing period when naming/reading so incorrect matching vs. Mac name)
    • You are well advised not to use commas as a starting or finishing character - they are likely to get missed when reading by eye.
  12. Names conflicting with some of Win OS' old DOS functions (Not allowed in either upper or lowercase and with or without a file extension or as a file extension: COM1 to COM9 inclusive, LPT1 to LPT9 inclusive, CON, PRN, AUX, CLOCK$ and NUL)
    • See also MSDN. "Null.txt" is allowed but "nul.txt", "NUL" or "nul" are not.
  13. Case sensitivity. Windows OSs (and IIS web servers) aren't case sensitive. Most other OSs (and web servers) are.
    • Consider being case-insensitive when naming. In a large archive it can be tempting fate to rely on case to distinguish a file from another; thus ideally "Filename.jpg" and "FILENAME.JPG" should resolve to be the same file. In addition, "Filename.jpg" is arguably not unique. So regardless of whether your primary OS is case-sensitive, it is a good idea to (a) treat all case variations of a name as one for uniqueness of naming but (b) use only ever use one variant of the name (as if the OS were case-sensitive).
  14. Filenames ought not to begin with a hyphen (Unix systems my interpret the filename as a flag to a command line call).
    • Unix system command line calls usually involve a program command with a series of hyphen-prefixed letter(s), or 'flags', that alter that program's behaviour. Starting a filename with a hyphen on such systems could cause the filename to be misinterpreted in command lines as a concatenated set of flags and thus not processed as a filename as intended.
Personal tools