When you open a file that file already exists on the disk using the W mode the con tents of the existing file will be erased?

How to write to files

The process of writing to a file is very similar to the process of reading from a file (which is covered in a separate lesson), in that both require opening a file object for access. The difference is in the second argument to open(), in which the string "w" – short for write – is passed.

newfile = open("hello.txt", "w")

When a file object is instantiated in write mode, it has access to the write() method. Closing a file object works the same way as it does when reading from the file:

newfile.write("hello world") newfile.write("goodbye world") newfile.write("wokka wokka") newfile.close()

Writing a binary file

The above example works when writing text to a file. However, when writing binary data, i.e. bytes, the string "wb" must be passed into the open() function:

newfile = open("somebinaryfile.zip","wb")

Most of the kinds of files we write will be in text mode, but occasionally, we'll download a binary file – such as a zip or image file – and save it to disk.

How to accidentally wreck your data

When trying to open a file for reading, but passing in a non-existent filename, Python will throw a FileNotFound error. Try this at the interactive Python shell:

>>> myfile = open("blaskdfjsadklfjdfsadflkj", "r") FileNotFoundError: [Errno 2] No such file or directory: 'blaskdfjsadklfjdfsadflkj'

What happens if you try to open a file in write-mode with an equally nonsensical name?

>>> myfile = open("blaskdfjsadklfjdfsadflkj", "w")

Nothing, at least error-wise. Instead, a file of the name blaskdfjsadklfjdfsadflkj will be created wherever your code is running. If you ran it from your ~/Desktop directory, for instance:

When you open a file that file already exists on the disk using the W mode the con tents of the existing file will be erased?

OK, but what happens when you try to open a file for writing using a filename that already exists? Nothing, error-wise. But whatever file that existing filename pointed to is basically wiped out. You may get an error message if you attempt to write to a path that points to a directory or some kind of protected file. But for every other kind of file, it's just gone and there is no confirmation message.

This is why in each of the assignments, I have you create a new tempdata subdirectory and stash things into it, to reduce the likelihood that you end up overwriting existing files in your other file directories. But you should still be careful – i.e. take a few seconds and think about what you're doing before hitting Enter – whenever you pass in "w" or "wb" into the open() function.

Why write to files?

A good portion of this chapter is spent warning you about how writing files might lead to catastrophic accidents of accidentally deleting data, so it's worth asking: why do we even want to write files in the first place?

The answer is pretty easy: so that the data we've collected/created can live on after our program finishes its work – or, as is frequently the case, dies unexpectedly.

Consider the following code which downloads the HTML contents of the current New York Times homepage into a variable named nyttext:

import requests resp = requests.get("https://www.nytimes.com") nyttext = resp.text

If my program ends there, whatever was stored in the variables resp and nyttext is gone. For many situations, that's probably what we want. But if we want to examine how the NYT homepage changes over time, then we would need to save copies of it that persisted from one Python session to the next. This means saving files to our hard drive:

from os.path import join import requests resp = requests.get("https://www.nytimes.com") nyttext = resp.text outfname = join("tempdata", "nytimes.com.html") outfile = open(outfname, "w") outfile.write(nyttext) outfile.close()

Strategies for not overwriting your files

Of course, if we re-run this script in the next hour, day, or even the next second, whatever was at "tempdata/nytimes.com.html" will get overwritten.

Use the current time in the filename

One strategy is to incorporate the current timestamp into the filename to be saved. Here, I create a subdirectory named nytimes.com, and every file in it is given a name like 1453688120.431147.html – with the numbers being the result of the time.time() function, which returns the "current time in seconds since the Epoch":

from os.path import join from os import makedirs import requests import time # Set up the storage area STORAGE_DIR = join("tempdata", "nytimes.com") makedirs(STORAGE_DIR, exist_ok=True) # Download the page resp = requests.get("https://www.nytimes.com") # Set up the new file current_time = str(time.time()) print("The time in seconds since epoch is now:", current_time) outfname = join(STORAGE_DIR, current_time + '.html') outfile = open(outfname, "w") outfile.write(resp.text) outfile.close()

If you were to save that code into a script named nytdownload.py and then repeatedly run it via the command-line interpreter:

$ python nytdownload.py The time in seconds since epoch is now: 1453689209.676369 $ python nytdownload.py The time in seconds since epoch is now: 1453689210.85706 $ python nytdownload.py The time in seconds since epoch is now: 1453689212.452021 $ python nytdownload.py The time in seconds since epoch is now: 1453689213.67095

You would have a tempdata/nytimes.com subdirectory full of files:

. ├── nytdownload.py └── tempdata └── nytimes.com ├── 1453689209.676369.html ├── 1453689210.85706.html ├── 1453689212.452021.html └── 1453689213.67095.html

Check for the existence of a file

Sometimes, you only want to download a file once. For example, the works of Shakespeare are unlikely to change in the near future, so we'd only want to download the file only if we've never downloaded it before.

We can use the exists() method from the os.path module, which returns True or False if the path passed into it currently exists:

from os.path import join from os.path import exists import requests SHAKE_URL = "http://stash.compciv.org/scrapespeare/matty.shakespeare.tar.gz" SHAKE_LOCAL_PATH = join("tempdata", "shakespeare.tar.gz") if exists(SHAKE_LOCAL_PATH): print("Skipping download;", SHAKE_LOCAL_PATH, 'already exists') else: print("Downloading", SHAKE_URL) resp = requests.get(SHAKE_URL) outfile = open(SHAKE_LOCAL_PATH, 'wb') # remember that Requests Response objects have the `content` # attribute when dealing with the contents of binary files outfile.write(resp.content) print("Saved file to:", SHAKE_LOCAL_PATH) outfile.close()

Save that code into a file, e.g. shakeydownload.py, and run it from the command-line. Assuming you don't have anything at the path tempdata/shakespeare.tar.gz, and the download successfully completes, you should see this output after a few seconds, or however long it takes your Internet collection to download all of Shakespeare's work:

$ python shakeydownload.py Downloading http://stash.compciv.org/scrapespeare/matty.shakespeare.tar.gz Saved file to: tempdata/shakespeare.tar.gz

Try re-running the script. The script should finish near-instantaneously since it doesn't have to download the file:

$ python shakeydownload.py Skipping download; tempdata/shakespeare.tar.gz already exists $ python shakeydownload.py Skipping download; tempdata/shakespeare.tar.gz already exists

If you delete (or rename) tempdata/shakespeare.tar.gz, re-running shakeydownload.py will operate as if you had never downloaded the file before.

When you open a file for writing if a file already exists the file will be destroyed?

In place of mode you write one of the three modes - these are OUTPUT, APPEND, and INPUT. The OUTPUT mode permits Basic to write information to the file. If the file already exists, Basic will overwrite the old file with the new information, destroying all previous contents of the file.

What happens if you open an output file and the file already exists?

In most languages, when you open an output file and that file already exists on the disk, the contents of the existing file will be erased. When an input file is opened, its read position is initially set to the first item in the file.

What type of file are you working with when opening a file with the W mode?

When you open a file that file already exists on the disk using the "w" mode, the contents of the existing file will be erased. The process of opening a file is only necessary with input files. Output files are automatically opened when data is written to them.

What will happen when a program opens a file in write mode if the file doesn't exist?

If you open a file for writing and the file doesn't exist, then the file is created with 0 length.