2024-05-14 09:13:18 +00:00
3 changed files with 91 additions and 1 deletions
--- a/src/sandbox/res/sandbox-ci-repos.png
+++ b/src/sandbox/res/sandbox-ci-repos.png
--- a/src/sandbox/res/sandbox-ci-settings.png
+++ b/src/sandbox/res/sandbox-ci-settings.png
--- a/src/sandbox/training.md
+++ b/src/sandbox/training.md
@ -1,4 +1,94 @@
 # Training Environment 

-Currently under heavy development, coming soon....stayed tuned!
+This documentation is for advanced users which are aware of following tools: git, python/R, cuda, pytorch/tensorflow and basic container knowledge.

+## Overview
+Available are two worker agents with 
+- 12 CPUs 
+- 40 GB memory
+- 20 GB Nvidia GPU memory
+- 100 GB Hdd Diskspace
+
+Only two pipelines can run in parallel to ensure having the promised hardware resources. If more jobs occur, they will be stored in a queue and released after the fifo principle.
+
+
+## Development
+
+### Git
+Create a new git repository and commit your latest code here: https://git.sandbox.iuk.hdm-stuttgart.de/
+
+Repositories can be private or public - depends on your use case.
+
+
+### CI
+Connect your newly created repository here: https://ci.sandbox.iuk.hdm-stuttgart.de/ 
+1. After login, click on "+ Add repository"
+![repos](./res/sandbox-ci-repos.png)
+2. Enable the specific repository
+
+3. Go to the repositories [overview site](https://ci.sandbox.iuk.hdm-stuttgart.de/repos) and select your enabled repository
+4. Go to settings (clicking the settings icon)
+![repos](./res/sandbox-ci-settings.png)
+5. Set a reasonable timeout in minutes (e.g. 360 minutes for 6hours) if some training crashes/hangs
+6. Add additional settings like secrets or container registries, see the official [documentation](https://woodpecker-ci.org/docs/usage/project-settings) for additional settings
+
+
+### Pipeline File
+An example script can be found here:
+
+ https://git.sandbox.iuk.hdm-stuttgart.de/grosse/test-ci
+
+
+1. Create a new file in your repository `.woodpecker.yml` (of different regarding repository settings above)
+2. The content can look like following:
+
+```
+steps:
+  "train":
+    image: nvcr.io/nvidia/tensorflow:23.10-tf2-py3
+    commands:
+      - echo "starting python script"
+      - python run.py
+  "compress and upload":
+    image: alpine:3
+    commands:
+      - apk --no-cache add zip curl
+      - zip mymodel.zip mymodel.keras
+      - curl -F fileUpload=@mymodel.zip https://share.storage.sandbox.iuk.hdm-stuttgart.de/upload
+```
+See the official [documentation](https://woodpecker-ci.org/docs/usage/workflow-syntax) for the syntax.
+
+Generally, the pipeline is based on different steps, and in each step, another container environment can be chosen. In the example above, first an official tensorflow container with python 3 is used to run the training python script. In the second step, the model gets compressed and pushed on the temp sandbox storage.
+3. Commit and push
+4. See current state of the pipelines at the [overview site](https://ci.sandbox.iuk.hdm-stuttgart.de/repos)
+
+Hints: 
+- The first time an external container is pulled, depending on the size, container images can take quite a while as different organization (like dockerhub) limit the download speed. The Sandbox git also supports hosting container images...
+- Choose a proper way to output some reasonable logs during your training, so it wont spam the logs too heavily
+
+### Exporting trained model
+We provide a 3-months disposal internal storage.
+You can either use the a simple curl command `curl -F fileUpload=@mymodel.zip https://share.storage.sandbox.iuk.hdm-stuttgart.de/upload` to upload a file or a simple python script 
+
+```
+import requests
+import os
+
+myurl = 'https://share.storage.sandbox.iuk.hdm-stuttgart.de/upload'
+
+print("uploading file")
+files = {
+    'fileUpload':('mymodel.keras',  open('mymodel.keras', 'rb'),'application/octet-stream')
+}
+
+response = requests.post(myurl, files=files)
+print(response,response.text)
+
+```
+
+which returns a json
+
+```
+{"PublicUrl":"https://storage.sandbox.iuk.hdm-stuttgart.de/upload/49676006-94e4-4da6-be3f-466u786768979/mymodel.keras","Size":97865925,"Expiration":"2024-03-30T00:00:00Z"}
+
+```