Coding/Development Resources¶

This is a page that contains little tidbits that in our lab we found useful for getting work done. A lot of random small things that we always get hung-up on so we compiled it on one place.

VM Linux Management¶

Allocating Additional Disk Space¶

Allocate additional space in VM Hypervisor
Follow this tutorial to repartition the VM disk

Changing Linux Hostname in Ubuntu¶

hostnamectl set-hostname new-hostname

Python¶

Profiling and creating a visualzaiton¶

Link.

Worked well for me. Specifically cProfile + gprof2dot + graphViz.

python -m cProfile -o myLog.profile ./test.py
gprof2dot -f pstats myLog.profile -o callingGraph.dot
dot -Tsvg callingGraph.dot -o callingGraph.svg

Creating chunks of big list¶

# Yield successive n-sized
# chunks from l.
def divide_chunks(l, n):
    # looping till length l
    for i in range(0, len(l), n): 
        yield l[i:i + n]

Building your Python package with¶

Check the new pyproject.toml file example data/pyproject_template.toml to configure the building process with poetry.

python3 -m build

Uploading a package to Pypi¶

It is recommended to test everything in TestPypiy first

Create an account in Pypi and/or TestPypi
Create your API token in Pypi or TestPypi using a 2FA application. Save the token in a file .pypirc in your home directory (~). This is the config file used for pypi data/.pypirc
Install twine:

python3 -m pip install --upgrade twine

Upload to testpypi or pypi

python3 -m twine upload --repository testpypi dist/*

python3 -m twine upload --repository pypi dist/*

4.1. Install from testpypi:

python3 -m pip install --index-url https://test.pypi.org/simple/ your-package

4.2. Install from pypi:

python3 -m pip install your-package

NextFlow¶

Flags for Command Line¶

script:
def flag = params.ms2_flag == true ? "--ms2_flag" : ''
"""
$flag
"""

Enable docker in NextFlow¶

Add the following lines in the nextflow.config file:

process.container = 'nextflow/examples:latest'
docker.enabled = true

Different Docker image for each process:

process foo {
  container 'image_name_1'

  '''
  do this
  '''
}

process bar {
  container 'image_name_2'

  '''
  do that
  '''

Execute a task over n files in parallel¶

Channel.fromPath( '<path>*.<file_format_to_process>' ) # The path is usually read from the command line argument.

Mitigate Input File Name Collisions¶

process create_file {
    input:
    each dummy

    output:
    path 'test.txt', emit: x

    """
    touch test.txt
    """
}

process mergeResults {
    conda "$TOOL_FOLDER/conda_env.yml"

    input:
    path tests, stageAs: "./test/test*.csv"

    output:
    path 'all_tests.csv'

    """
    cat test/*.csv > all_tests.csv
    """
}

workflow {
    // Run Create File a Bunch
    ch = Channel.from([1,2,3,4,5,6])
    create_file(ch) 
    mergeResults(create_file.out.x.collect())
}

Merge Large Number of CSV/TSV files¶

// Merge results in chunks of size params.merge_batch_size
process chunkResults {
    conda "$TOOL_FOLDER/conda_env.yml"

    input:
    path to_merge, stageAs: './results/*'  // A directory of files, "results/*"

    output:
    path "batched_results.tsv"

    """
    python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
    """
}

// Use a separate process to merge all the batched results
process mergeResults {
    conda "$TOOL_FOLDER/conda_env.yml"

    input:
    path to_merge, stageAs: './results/batched_results*.tsv' // Will automatically number inputs to avoid name collisions
    // Note to_merge can also be replcaed by an single path (e.g., path 'test_results.csv', stageAs: './results/batched_results*.tsv')

    output:
    path 'merged_results.tsv'

    """
    python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
    """
}

workflow {
    // Perform Computation
    results = doSomething()

    // Split outputs into chunks and merge
    chunked_results = chunkResults(results.buffer(size: params.merge_batch_size, remainder: true))
    // Collect all the batched results and merge them at the end
    merged_results = mergeResults(chunked_results.collect())
}

Git¶

Cleaning out already merged branches¶

git branch --merged | grep -v "\*" | xargs -n 1 git branch -d

Updating all submodules to the latest commit¶

git submodule update --remote --merge

Command Line¶

Stop all running python tasks¶

ps -u $(whoami) | grep '[p]ython' | awk '{print $1}' | xargs kill -9

For a dry run, use:

ps -u $(whoami) | grep '[p]ython' | awk '{print $1}'

Adding users¶

Add the user:

sudo useradd -m -s /bin/bash <username>

Create a passowrd for the user:

sudo passwd <username>

Add the user to the sudo group (if applicable)

sudo usermod -aG sudo <username>

Add user to any other group

sudo usermod -aG <groupname> <username>

Make sure folder permissions are correct

chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys

Docker¶

Start an interactive terminal in a container

docker exec -it <container_name_or_id> /bin/bash

Share a docker volume in different images in a docker-compose cluster

services:
  servie1:
    image: <image_name>
    volumes:
      - <name_of_docker_volume>: <path_of_the_image>
...
# In all images where the docker volume wants to be shared the volume should be added
volumes:
    <name_of_docker_volume>

Mount a host file system in different images in a docker-compose cluster

# In all images where the file system needs to be mount
services:
  servie1:
    image: <image_name>
    volumes:
    - <host_local_path>:<docker_path>

Killing all docker containers

docker kill $(docker ps -q)

Cleaning Up Images/Containers

docker system prune

SLURM¶

Since we utilize SLURM internally here are some things that help us figure out whats going on with the clusters:

Show allocations

scontrol show node

Last update: September 21, 2024 00:05:03