grosse #1

Merged
grosse merged 4 commits from grosse into master 2024-05-14 09:13:18 +00:00
2 changed files with 22 additions and 6 deletions
Showing only changes of commit 49afe43ad7 - Show all commits

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 38 KiB

View File

@ -1,7 +1,7 @@
# Training Environment
This documentation is for advanced users which are aware of following tools: git, python/R, cuda, pytorch/tensorflow and basic container knowledge.
![repos](./res/training.svg)
## Overview
Available are two worker agents with
- 12 physical CPUs
@ -39,7 +39,7 @@ An example script can be found here:
https://git.sandbox.iuk.hdm-stuttgart.de/grosse/test-ci
1. Create a new file in your repository `.woodpecker.yml` (of different regarding repository settings above)
1. Create a new file in your repository `.woodpecker.yml` (or different regarding repository settings above)
2. The content can look like following:
```
@ -62,9 +62,7 @@ Generally, the pipeline is based on different steps, and in each step, another c
3. Commit and push
4. See current state of the pipelines at the [overview site](https://ci.sandbox.iuk.hdm-stuttgart.de/repos)
Hints:
- The first time an external container is pulled, depending on the size, container images can take quite a while as different organization (like dockerhub) limit the download speed. The Sandbox git also supports hosting container images...
- Choose a proper way to output some reasonable logs during your training, so it wont spam the logs too heavily
### Exporting trained model
We provide a 3-months disposal internal storage.
@ -91,4 +89,18 @@ which returns a json with the download url of your uploaded file.
```
{"PublicUrl":"https://storage.sandbox.iuk.hdm-stuttgart.de/upload/49676006-94e4-4da6-be3f-466u786768979/mymodel.keras","Size":97865925,"Expiration":"2024-03-30T00:00:00Z"}
```
```
## Troubleshooting:
- The first time an external container is pulled, depending on the size, container images can take quite a while as different organization (like dockerhub) limit the download speed. The Sandbox git also supports hosting container images...
- Choose a proper way to output some reasonable logs during your training, so it wont spam the logs too heavily
- training exists after 60 minutes: increase maximum duration in the ci repository settings
## Useful Links
- [Sandbox GIT](https://git.sandbox.iuk.hdm-stuttgart.de/)
- [Sandbox CI](https://ci.sandbox.iuk.hdm-stuttgart.de)
- [Git](https://git-scm.com/docs/gittutorial)
- [Woodpecker Syntax](https://woodpecker-ci.org/docs/2.3/usage/workflow-syntax)
- [PyTorch](https://pytorch.org/docs/stable/index.html)
- [TensorFlow](https://www.tensorflow.org/versions/r2.15/api_docs/python/tf)
- [NVIDIA PyTorch Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
- [NVIDIA Tensorflow Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow)