Useful Script for Backing Up Pis

Hi, I made a little utility script that folks here might find useful (or might have MUCH BETTER VERSIONS OF! and if so, let me know!)

Basically we needed a script that monitored a certain folder on the desktop. Checked it for files, and if it finds them, backs them up to external storage. Before I had seen a couple different folks and scripts people used that weren’t as generalization ( you might have to mount your external storage to a certain name or something).

The script is just in our Mothbox Github:

This script should let you just plug in any old thumb drive or external hard drive, and it will make a folder called “backup photos” and it will backup your data there.

The script will also work with MULTIPLE external storages. It will seek out the one with the most available space first, and load that up, and if it gets full, it will find the next most available.

I also have it set to clear out the files in the internal storage if the device is below a certain amount of internal storage (Currently 4GB)

Anyway, hopefully this script is

a) helpful to you

b) reminds you of a much better, thorough, robust script you use for automated backing up, and you can send that to me, and we can use that instead :slight_smile:

1 Like

Thanks Andy!

If you are working on a Raspberry Pi or anything with a Unix OS you can use rsync. Or on Windows you can use rsync from WSL.

Being an old school linux command it has a lot of functionality packed into somewhat confusing command line flags. But it has the option to backup to any drive you are connected to or through other network protocols like SSH.

It can be set to do archiving where it copies any new/updated files to the backup dir, or full mirroring (deleting the files from the backup directory). You can also only include certain files, or give it patterns of files to exclude.

Possibly the nicest feature when getting started is you can add --dry-run and it will list all the things that it would have done, but not actually do them. This way you can be sure that you have your script set up correctly.

There are many user tutorials:
https://www.hostinger.co.uk/tutorials/how-to-use-rsync

1 Like

As a further thought, while Rsync is really good at fast efficient back up it doesn’t do all of your other checking that you might to delete extra files from the source, or to seek out which of you external storage devices it should write to.

If you wanted to you could use subprocess to do both with something a bit like this (I just bammed this into the forum, I didn’t test the block of code!.

import subprocess

# Andy's python code to choose which external drive is best

try:
    ret = subprocess.run(['rsync', '-avz', photos_folder, backup_folder], check=True)
except subprocess.CalledProcessError as err:
    raise RuntimeError(f"Oh no! Mothbox couldn't backup your files!") from err

# Andy's delete code

However with rsync filling up one external drive would be best as it will only copy the files that are new. I think with the current script if you attached 3 external drives of the same size it would take it in turns backing up to each of them, and any files it hadn’t deleted would be mirrored on the back ups?

This may be good! Redundant back ups are good, but also may be problematic if your externals start running out of space.

1 Like

This is so fantastic, exactly what I love about this forum! Thanks so much julian! Ill try to incorporate it and report back!

You are totally right! I think that would be annoying! to have 3 partial backups.

Also the way i have it currently set up, the internal storage folder keeps getting bigger and bigger and copies over ALL the files at once I think a better organization would be something like

├── Internal Storage
│   ├── photos
│   │   └── "mothbox_boxnum_timestamp_8.png"
│   ├── photos_backedup
│   │   └── "mothbox_boxnum_timestamp_7.png"
│   │   └── "mothbox_boxnum_timestamp_6.png"
│   │   └── ...

├── External Storage
│   ├── photos_backedup
│   │   └── "mothbox_boxnum_timestamp_7.png"
│   │   └── "mothbox_boxnum_timestamp_6.png"
│   │   └── ...

Where fresh photos are always saved to the same folder
but then moved once they have been backed up to an external storage

that way it can always just look in the “fresh” folder for new stuff to backup

as for how to organize having multiple external storages, you are correct, i was naive in my thinking of “just choose largest” because soon it will not become the largest!

so i need to think about what would be a good way to load up those storages!

here’s the current update i cobbled together

Backupper Script
This script is for folks collecting lots of data automatically that needs to get backed up at certain intervals

for instance saving a bunch of files to a folder, but then automatically copying them to larger external devices

This script first defines paths for the desktop, photos folder, and backup folder name. Then, it defines functions to:

    Get the storage information (total and available space) of a path.
    Find the largest external storage device connected.
    Copy all files from one folder to another while preserving file metadata.

Finally, the script checks if the photos folder exists and then finds the largest external storage. It compares the total space and available space on both the desktop and the external storage to determine if the external storage has enough space for the backup. If so, it creates a backup folder on the external storage and copies the photos. Otherwise, it informs the user about insufficient space.

Note:

    This script assumes the user running the script has read and write permissions to the desktop and any external storage devices.
    You might need to adjust the user name in desktop_path depending on your Raspberry Pi setup.


'''

import os
import subprocess
import shutil
from pathlib import Path

# Define paths
desktop_path = Path("/home/pi/Desktop/Mothbox")  # Assuming user is "pi" on your Raspberry Pi
photos_folder = desktop_path / "photos"
backedup_photos_folder = desktop_path / "photos_backedup"

backup_folder_name = "photos_backup"
internal_storage_minimum =1

def get_storage_info(path):
  """
  Gets the total and available storage space of a path.

  Args:
      path: The path to the storage device.

  Returns:
      A tuple containing the total and available storage in bytes.
  """
  try:
    stat = os.statvfs(path)
    return stat.f_blocks * stat.f_bsize, stat.f_bavail * stat.f_bsize
  except OSError:
    return 0, 0  # Handle non-existent or inaccessible storages

def find_largest_external_storage():
  """
  Finds the largest external storage device connected to the Raspberry Pi.

  Returns:
      The path to the largest storage device or None if none is found.
  """
  largest_storage = None
  largest_size = 0
  
  for mount_point in os.listdir("/media/pi"):
    path = Path(f"/media/pi/{mount_point}")
    if path.is_dir():
      total_size, available_size = get_storage_info(path)
      print(path)
      print(available_size)
      if available_size > largest_size:
          largest_storage=path
          largest_size = available_size
      '''
      if total_size > largest_size:
        largest_storage = path
        largest_size = total_size
      '''
  print("Largest Storage: "+str(largest_storage))
  print(largest_size)
  return largest_storage

def rsync_photos_to_backup(source_dir, dest_dir):
    if not os.path.exists(dest_dir):
        os.makedirs(dest_dir)
    
    # Build the rsync command with options for recursive copy, delete on source, and verbose output
    #rsync_cmd = ["rsync", "-avz", "--delete", source_dir, dest_dir] # copies the whole folder, not just files inside
    rsync_cmd = ["rsync", "-av",  str(source_dir) + "/", dest_dir]
    
    # Call rsync using subprocess
    try:
      process = subprocess.run(rsync_cmd, check=True)
    except subprocess.CalledProcessError as err:
      raise RuntimeError(f"Oh no! Mothbox couldn't backup your files!") from err
    
def rsync_copy_and_delete_files(source_dir, dest_dir):
  """
  This function uses rsync to copy files from source_dir to dest_dir and then deletes the originals from source_dir if successful.

  Args:
    source_dir: The source directory containing the files to copy.
    dest_dir: The destination directory to copy the files to.

  Raises:
    subprocess.CalledProcessError: If the rsync command fails.
  """
  if not os.path.exists(dest_dir):
    os.makedirs(dest_dir)
    

  # Build the rsync command with options for recursive copy, delete on source, and verbose output
  #rsync_cmd = ["rsync", "-avz", "--delete", source_dir, dest_dir] # copies the whole folder, not just files inside
  rsync_cmd = ["rsync", "-av",  str(source_dir) + "/", dest_dir]
  # Call rsync using subprocess
  process = subprocess.run(rsync_cmd, check=True)
  
  # If successful, iterate through copied files and delete them individually
  if process.returncode == 0:
    for root, _, files in os.walk(source_dir):
      for filename in files:
        source_file = os.path.join(root, filename)
        dest_file = os.path.join(dest_dir, filename)
        # Check if the file was successfully copied (exists in destination)
        if os.path.isfile(dest_file):
          try:
            os.remove(source_file)
            print(f"Deleted: {source_file}")
          except OSError as e:
            print(f"Error deleting {source_file}: {e}")
  
  return process.returncode

#older way of just copying items in the folder
def copy_photos_to_backup(source_folder, target_folder):
  """
  Copies all files from the source folder to the target folder.

  Args:
      source_folder: The path to the source folder.
      target_folder: The path to the target folder.
  """
  if not os.path.exists(target_folder):
    os.makedirs(target_folder)
  for filename in os.listdir(source_folder):
    source_path = os.path.join(source_folder, filename)
    target_path = os.path.join(target_folder, filename)
    shutil.copy2(source_path, target_path)  # Preserves file metadata
    
def delete_original_photos(source_folder):
  """
  Deletes all files from the source folder.

  Args:
      source_folder: The path to the source folder.
  """
  for filename in os.listdir(source_folder):
    file_path = os.path.join(source_folder, filename)
    try:
      if os.path.isfile(file_path):
        os.remove(file_path)
    except OSError as e:
      print(f"Error deleting file {file_path}: {e}")  


if __name__ == "__main__":
  # Check if "photos" folder exists
  if not os.path.exists(photos_folder):
    print("Photos folder not found, exiting.")
    exit(1)

  # Find largest external storage
  largest_storage = find_largest_external_storage()

  if not largest_storage:
    print("No external storage found with enough space, exiting.")
    exit(1)

  # Get total and available space on desktop and external storage
  desktop_total, desktop_available = get_storage_info(desktop_path)
  external_total,external_available = get_storage_info(largest_storage)
  print("Desktop Total    Storage: \t"+str(desktop_total))
  print("Desktop Available Storage: \t"+str(desktop_available))
  print("External Total Storage: \t"+str(external_total))
  print("External Available Storage: \t"+str(external_available))

  # Check if external storage has more available space than desktop
  if external_available > sum(os.path.getsize(f) for f in photos_folder.iterdir() if f.is_file()):
    # Create backup folder on external storage
    external_backup_folder = largest_storage / backup_folder_name
    
    rsync_photos_to_backup(photos_folder,external_backup_folder)
    #todo add a verification check!
    print(f"Photos successfully copied to backup folder: {external_backup_folder}")

    #move backed up images here
    rsync_copy_and_delete_files(photos_folder,backedup_photos_folder)
    print(f"Photos successfully copied to backed_up folder and deleted from fresh folder: {backedup_photos_folder}")

    #copy_photos_to_backup(photos_folder, backup_folder)
    
    # Check if internal storage has less than X GB left
    x= internal_storage_minimum
    if desktop_available < x * 1024**3:  # x GB in bytes
      delete_original_photos(backedup_photos_folder)
      print("Original photos deleted after being backed up due to low internal storage.")
    else:
        print("More than "+str(x)+ "GB remain so original files are kept in desktop after backing up")
  else:
    print("External storage doesn't have enough space for backup.")

Ok, i got it!

this script (which is also updated on the github)

works like this


    Get the storage information (total and available space) of a path.
    Find the sizes of all external devices in terms of total storage capacity (not available because this will change)
    Rank them in order  of their total storage capacity
    Check if the first option has space available to copy the new files
      if not, choose the next option in terms of total storage
    Copy all the files from the internal storage to the external storage
    Move the files from the directory of "fresh" files to the internal "backedup" folder
    if the internal storage gets too small, delete the internal "backedup" folder

This means it will choose a single external, copy files to it until it gets too full, and then start copying to another.

and it will hold onto backups as long as the internal storage allows.
I ran it through a bunch of tests with different memory sticks and it seems to be working good!

full code here!

thanks for the help @julianstirling

"""
Backupper Script
This script is for folks collecting lots of data automatically that needs to get backed up at certain intervals

for instance saving a bunch of files to a folder, but then automatically copying them to larger external devices

This script first defines paths for the desktop, photos folder, and backup folder name. Then, it defines functions to:

    Get the storage information (total and available space) of a path.
    Find the sizes of all external devices in terms of total storage capacity (not available because this will change)
    Rank them in order  of their total storage capacity
    Check if the first option has space available to copy the new files
      if not, choose the next option in terms of total storage
    Copy all the files from the internal storage to the external storage
    Move the files from the directory of "fresh" files to the internal "backedup" folder
    if the internal storage gets too small, delete the internal "backedup" folder

Finally, the script checks if the photos folder exists and then finds the largest external storage. It compares the total space and available space on both the desktop and the external storage to determine if the external storage has enough space for the backup. If so, it creates a backup folder on the external storage and copies the photos. Otherwise, it informs the user about insufficient space.

Note:

    This script assumes the user running the script has read and write permissions to the desktop and any external storage devices.
    You might need to adjust the user name in desktop_path depending on your Raspberry Pi setup.


"""

import os
import subprocess
import shutil
from pathlib import Path

# Define paths
desktop_path = Path(
    "/home/pi/Desktop/Mothbox"
)  # Assuming user is "pi" on your Raspberry Pi
photos_folder = desktop_path / "photos"
backedup_photos_folder = desktop_path / "photos_backedup"

backup_folder_name = "photos_backup"
internal_storage_minimum = 1


def get_storage_info(path):
    """
    Gets the total and available storage space of a path.

    Args:
        path: The path to the storage device.

    Returns:
        A tuple containing the total and available storage in bytes.
    """

    try:
        stat = os.statvfs(path)
        return stat.f_blocks * stat.f_bsize, stat.f_bavail * stat.f_bsize
    except OSError:
        return 0, 0  # Handle non-existent or inaccessible storages


def find_largest_external_storage():
    """
    Finds the largest external storage device connected to the Raspberry Pi.

    Returns:
        The path to the largest storage device or None if none is found.
    """
    largest_storage = None
    largest_size = 0

    for mount_point in os.listdir("/media/pi"):
        path = Path(f"/media/pi/{mount_point}")
        if path.is_dir():
            total_size, available_size = get_storage_info(path)
            print(path)
            print(available_size)
            if available_size > largest_size:
                largest_storage = path
                largest_size = available_size
            """
      if total_size > largest_size:
        largest_storage = path
        largest_size = total_size
      """
    print("Largest Storage: " + str(largest_storage))
    print(largest_size)
    return largest_storage


def rsync_photos_to_backup(source_dir, dest_dir):
    if not os.path.exists(dest_dir):
        os.makedirs(dest_dir)

    # Build the rsync command with options for recursive copy, delete on source, and verbose output
    # rsync_cmd = ["rsync", "-avz", "--delete", source_dir, dest_dir] # copies the whole folder, not just files inside
    rsync_cmd = ["rsync", "-av", str(source_dir) + "/", dest_dir]

    # Call rsync using subprocess
    try:
        process = subprocess.run(rsync_cmd, check=True)
    except subprocess.CalledProcessError as err:
        raise RuntimeError(f"Oh no! Mothbox couldn't backup your files!") from err


def rsync_copy_and_delete_files(source_dir, dest_dir):
    """
    This function uses rsync to copy files from source_dir to dest_dir and then deletes the originals from source_dir if successful.

    Args:
      source_dir: The source directory containing the files to copy.
      dest_dir: The destination directory to copy the files to.

    Raises:
      subprocess.CalledProcessError: If the rsync command fails.
    """
    if not os.path.exists(dest_dir):
        os.makedirs(dest_dir)

    # Build the rsync command with options for recursive copy, delete on source, and verbose output
    # rsync_cmd = ["rsync", "-avz", "--delete", source_dir, dest_dir] # copies the whole folder, not just files inside
    rsync_cmd = ["rsync", "-av", str(source_dir) + "/", dest_dir]
    # Call rsync using subprocess
    process = subprocess.run(rsync_cmd, check=True)

    # If successful, iterate through copied files and delete them individually
    if process.returncode == 0:
        for root, _, files in os.walk(source_dir):
            for filename in files:
                source_file = os.path.join(root, filename)
                dest_file = os.path.join(dest_dir, filename)
                # Check if the file was successfully copied (exists in destination)
                if os.path.isfile(dest_file):
                    try:
                        os.remove(source_file)
                        print(f"Deleted: {source_file}")
                    except OSError as e:
                        print(f"Error deleting {source_file}: {e}")

    return process.returncode


# older way of just copying items in the folder
def copy_photos_to_backup(source_folder, target_folder):
    """
    Copies all files from the source folder to the target folder.

    Args:
        source_folder: The path to the source folder.
        target_folder: The path to the target folder.
    """
    if not os.path.exists(target_folder):
        os.makedirs(target_folder)
    for filename in os.listdir(source_folder):
        source_path = os.path.join(source_folder, filename)
        target_path = os.path.join(target_folder, filename)
        shutil.copy2(source_path, target_path)  # Preserves file metadata


def delete_original_photos(source_folder):
    """
    Deletes all files from the source folder.

    Args:
        source_folder: The path to the source folder.
    """
    for filename in os.listdir(source_folder):
        file_path = os.path.join(source_folder, filename)
        try:
            if os.path.isfile(file_path):
                os.remove(file_path)
        except OSError as e:
            print(f"Error deleting file {file_path}: {e}")


if __name__ == "__main__":
    # Check if "photos" folder exists
    if not os.path.exists(photos_folder):
        print("Photos folder not found, exiting.")
        exit(1)

    # Find largest external storage
    # largest_storage = find_largest_external_storage()

    # Get total and available space on desktop and external storage
    desktop_total, desktop_available = get_storage_info(desktop_path)
    # external_total,external_available = get_storage_info(largest_storage)
    print("Desktop Total    Storage: \t" + str(desktop_total))
    print("Desktop Available Storage: \t" + str(desktop_available))

    """
  Finds storage capacity of all external drives and ranks them by size.
  """
    disks = {}  # Dictionary to store disk name and capacity

    # Check potential mount points for external drives (adjust based on your system)
    for mount_point in os.listdir("/media/pi"):
        path = Path(f"/media/pi/{mount_point}")
        if path.is_dir():
            total_size, available_size = get_storage_info(path)
            disks[path] = total_size, available_size

    # Sort disks by capacity (descending)
    # Check if any disks were found before sorting and printing
    print("~~~sorting disks~~~~~~")
    if disks:
        sorted_disks = sorted(disks.items(), key=lambda item: item[1][0], reverse=True)
        print("External Drives (Ranked by Total Size - Descending):")
        for disk_name, capacity in sorted_disks:
            print(
                f"{disk_name}: total size {capacity[0]} GB - available size {capacity[1]} GB"
            )
    else:
        print("No external drives found.")
        print(
            "stuff never worked out with this backup, your files are not properly backedup"
        )

        exit(1)

    print("~~~sorted~~~~~~")

    thingsworkedok = False
    # this is the loop where we make stuff happen
    # iterate through the disks, starting with the largest
    # see if it has enough available space, if not, choose the next largest
    for disk_name, capacity in sorted_disks:
        total_available, external_available = capacity
        # Check if external storage has more available space than desktop
        if external_available > sum(
            os.path.getsize(f) for f in photos_folder.iterdir() if f.is_file()
        ):

            # Create backup folder on external storage
            external_backup_folder = disk_name / backup_folder_name

            try:
                rsync_photos_to_backup(photos_folder, external_backup_folder)
                # Proceed to the next step if successful
                print(
                    f"Photos successfully copied to backup folder: {external_backup_folder}"
                )

                # since we were successful, move backed up images here!
                rsync_copy_and_delete_files(photos_folder, backedup_photos_folder)
                print(
                    f"Photos successfully copied to backed_up folder and deleted from fresh folder: {backedup_photos_folder}"
                )
                thingsworkedok = True

                # After we backed up, we can check on our internal storage and see if we need to clean up
                # Check if internal storage has less than X GB left
                x = internal_storage_minimum
                if desktop_available < x * 1024**3:  # x GB in bytes
                    delete_original_photos(backedup_photos_folder)
                    print(
                        "Original photos deleted after being backed up due to low internal storage."
                    )
                else:
                    print(
                        "More than "
                        + str(x)
                        + "GB remain so original files are also kept in internal storage after backing up to external storage"
                    )
                print("we have finished backing up! yay!")
                break
            except subprocess.CalledProcessError as err:
                print(
                    "Error during backup, moving to next available storage if there is one!:",
                    err,
                )
                thingsworkedok = False
        else:
            print(
                "This External storage doesn't have enough space for backup.\n Trying next available storage if there is one "
            )

    if thingsworkedok == False:
        print(
            "stuff never worked out with this backup, your files are not properly backedup"
        )
    else:
        print("BACKUP COMPLETE")

There’s one extra challenge. I call this backup script before the device shuts down, but if there are enough files, it ends up with a LOT of 0byte files

i think the problem is that my script just isn’t waiting for all the copying to finish before shutting down, and thus some files get corrupted.
but im not smart enough at linux and python to tell the scripts:

HEY make sure you finished copying all the files, THEN shutdown

i try to do stuff like below, but it still ends up in corrupted files. If i just run the backupper script itself, it goes fine. This isn’t a big deal, because ill prob just call the backupper like 5 mins before shutting down or something, but is weird and annoying.

print("Backing up files before shutting down")
process = subprocess.Popen([sys.executable, script_path], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Wait for rsync to finish
while process.poll() is None:
    time.sleep(1)  # Check process status every second

# Wait for the child process to finish
stdout, stderr = process.communicate()

# Check for errors in the child process output
if process.returncode != 0:
    raise subprocess.CalledProcessError(f"rsync failed! {stderr.decode()}")

but it still results in corrupted files

Generally I’d avoid copying files around on your main drive as it makes it somewhat harder to trace what is happening if you have issues.

One strategy I like is to make a folder for each day (or each hour if you can’t fit a days worth of data on). You can then have a machine readable file that tells you which directories are backed up. You can use this file to not only say which directory is backed up, but also how many files copied, to which drive, and when. This way when you eventually delete the directory, you still have a record of what you are looking for elsewhere.

The shutdown thing is annoying. How are you shutting down? I don’t see a shutdown command in your script.

it’s a different script that calls the backup scrip