Skip to content

Coding/Development Resources

This is a page that contains little tidbits that in our lab we found useful for getting work done. A lot of random small things that we always get hung-up on so we compiled it on one place.

VM Linux Management

Allocating Additional Disk Space

  • Allocate additional space in VM Hypervisor
  • Follow this tutorial to repartition the VM disk

Changing Linux Hostname in Ubuntu

hostnamectl set-hostname new-hostname

Python

Profiling and creating a visualzaiton

Link.

Worked well for me. Specifically cProfile + gprof2dot + graphViz.

python -m cProfile -o myLog.profile ./test.py
gprof2dot -f pstats myLog.profile -o callingGraph.dot
dot -Tsvg callingGraph.dot -o callingGraph.svg

Creating chunks of big list

# Yield successive n-sized
# chunks from l.
def divide_chunks(l, n):
    # looping till length l
    for i in range(0, len(l), n): 
        yield l[i:i + n]

Building your Python package with

Check the new pyproject.toml file example data/pyproject_template.toml to configure the building process with poetry.

python3 -m build

Uploading a package to Pypi

It is recommended to test everything in TestPypiy first

  1. Create an account in Pypi and/or TestPypi
  2. Create your API token in Pypi or TestPypi using a 2FA application. Save the token in a file .pypirc in your home directory (~). This is the config file used for pypi data/.pypirc
  3. Install twine:
python3 -m pip install --upgrade twine
  1. Upload to testpypi or pypi

python3 -m twine upload --repository testpypi dist/*

python3 -m twine upload --repository pypi dist/*

4.1. Install from testpypi:

python3 -m pip install --index-url https://test.pypi.org/simple/ your-package

4.2. Install from pypi:

python3 -m pip install your-package

NextFlow

Flags for Command Line

script:
def flag = params.ms2_flag == true ? "--ms2_flag" : ''
"""
$flag
"""

Enable docker in NextFlow

Add the following lines in the nextflow.config file:

process.container = 'nextflow/examples:latest'
docker.enabled = true

Different Docker image for each process:

process foo {
  container 'image_name_1'

  '''
  do this
  '''
}

process bar {
  container 'image_name_2'

  '''
  do that
  '''

Execute a task over n files in parallel

Channel.fromPath( '<path>*.<file_format_to_process>' ) # The path is usually read from the command line argument. 

Mitigate Input File Name Collisions

process create_file {
    input:
    each dummy

    output:
    path 'test.txt', emit: x

    """
    touch test.txt
    """
}

process mergeResults {
    conda "$TOOL_FOLDER/conda_env.yml"

    input:
    path tests, stageAs: "./test/test*.csv"

    output:
    path 'all_tests.csv'

    """
    cat test/*.csv > all_tests.csv
    """
}

workflow {
    // Run Create File a Bunch
    ch = Channel.from([1,2,3,4,5,6])
    create_file(ch) 
    mergeResults(create_file.out.x.collect())
}

Merge Large Number of CSV/TSV files

// Merge results in chunks of size params.merge_batch_size
process chunkResults {
    conda "$TOOL_FOLDER/conda_env.yml"

    input:
    path to_merge, stageAs: './results/*'  // A directory of files, "results/*"

    output:
    path "batched_results.tsv"

    """
    python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
    """
}

// Use a separate process to merge all the batched results
process mergeResults {
    conda "$TOOL_FOLDER/conda_env.yml"

    input:
    path to_merge, stageAs: './results/batched_results*.tsv' // Will automatically number inputs to avoid name collisions
    // Note to_merge can also be replcaed by an single path (e.g., path 'test_results.csv', stageAs: './results/batched_results*.tsv')

    output:
    path 'merged_results.tsv'

    """
    python $TOOL_FOLDER/tsv_merger.py // Your Merge Script Here
    """
}

workflow {
    // Perform Computation
    results = doSomething()

    // Split outputs into chunks and merge
    chunked_results = chunkResults(results.buffer(size: params.merge_batch_size, remainder: true))
    // Collect all the batched results and merge them at the end
    merged_results = mergeResults(chunked_results.collect())
}

Git

Cleaning out already merged branches

git branch --merged | grep -v "\*" | xargs -n 1 git branch -d

Updating all submodules to the latest commit

git submodule update --remote --merge

Command Line

Stop all running python tasks

ps -u $(whoami) | grep '[p]ython' | awk '{print $1}' | xargs kill -9

For a dry run, use:

ps -u $(whoami) | grep '[p]ython' | awk '{print $1}'

Adding users

Add the user:

sudo useradd -m -s /bin/bash <username>

Create a passowrd for the user:

sudo passwd <username>

Add the user to the sudo group (if applicable)

sudo usermod -aG sudo <username>

Add user to any other group

sudo usermod -aG <groupname> <username>

Docker

Start an interactive terminal in a container

docker exec -it <container_name_or_id> /bin/bash

Share a docker volume in different images in a docker-compose cluster

services:
  servie1:
    image: <image_name>
    volumes:
      - <name_of_docker_volume>: <path_of_the_image>
...
# In all images where the docker volume wants to be shared the volume should be added
volumes:
    <name_of_docker_volume>

Mount a host file system in different images in a docker-compose cluster

# In all images where the file system needs to be mount
services:
  servie1:
    image: <image_name>
    volumes:
    - <host_local_path>:<docker_path>

Killing all docker containers

docker kill $(docker ps -q)

Cleaning Up Images/Containers

docker system prune

SLURM

Since we utilize SLURM internally here are some things that help us figure out whats going on with the clusters:

Show allocations

scontrol show node

Last update: November 2, 2023 23:49:40