From 49afe43ad7ee08c180ff2f2f2fb971202485d1c7 Mon Sep 17 00:00:00 2001 From: Malte Grosse Date: Tue, 14 May 2024 18:09:26 +0900 Subject: [PATCH] changed training --- src/sandbox/res/training.svg | 4 ++++ src/sandbox/training.md | 24 ++++++++++++++++++------ 2 files changed, 22 insertions(+), 6 deletions(-) create mode 100644 src/sandbox/res/training.svg diff --git a/src/sandbox/res/training.svg b/src/sandbox/res/training.svg new file mode 100644 index 0000000..9c70de9 --- /dev/null +++ b/src/sandbox/res/training.svg @@ -0,0 +1,4 @@ + + + +SandboxScientific UserAccess SandboxLogin to Gitcreate Repositoryclone repositorycreate model & set parametercreate woodpecker.ymlgit commit & pushci/cd add new repositoryWoodpecker start trainingafter training upload model to datapoolprovide linkDatapoolaccess modelShut Down ServerGIT \ No newline at end of file diff --git a/src/sandbox/training.md b/src/sandbox/training.md index af7927e..39f1692 100644 --- a/src/sandbox/training.md +++ b/src/sandbox/training.md @@ -1,7 +1,7 @@ # Training Environment This documentation is for advanced users which are aware of following tools: git, python/R, cuda, pytorch/tensorflow and basic container knowledge. - +![repos](./res/training.svg) ## Overview Available are two worker agents with - 12 physical CPUs @@ -39,7 +39,7 @@ An example script can be found here: https://git.sandbox.iuk.hdm-stuttgart.de/grosse/test-ci -1. Create a new file in your repository `.woodpecker.yml` (of different regarding repository settings above) +1. Create a new file in your repository `.woodpecker.yml` (or different regarding repository settings above) 2. The content can look like following: ``` @@ -62,9 +62,7 @@ Generally, the pipeline is based on different steps, and in each step, another c 3. Commit and push 4. See current state of the pipelines at the [overview site](https://ci.sandbox.iuk.hdm-stuttgart.de/repos) -Hints: -- The first time an external container is pulled, depending on the size, container images can take quite a while as different organization (like dockerhub) limit the download speed. The Sandbox git also supports hosting container images... -- Choose a proper way to output some reasonable logs during your training, so it wont spam the logs too heavily + ### Exporting trained model We provide a 3-months disposal internal storage. @@ -91,4 +89,18 @@ which returns a json with the download url of your uploaded file. ``` {"PublicUrl":"https://storage.sandbox.iuk.hdm-stuttgart.de/upload/49676006-94e4-4da6-be3f-466u786768979/mymodel.keras","Size":97865925,"Expiration":"2024-03-30T00:00:00Z"} -``` \ No newline at end of file +``` +## Troubleshooting: +- The first time an external container is pulled, depending on the size, container images can take quite a while as different organization (like dockerhub) limit the download speed. The Sandbox git also supports hosting container images... +- Choose a proper way to output some reasonable logs during your training, so it wont spam the logs too heavily +- training exists after 60 minutes: increase maximum duration in the ci repository settings + +## Useful Links +- [Sandbox GIT](https://git.sandbox.iuk.hdm-stuttgart.de/) +- [Sandbox CI](https://ci.sandbox.iuk.hdm-stuttgart.de) +- [Git](https://git-scm.com/docs/gittutorial) +- [Woodpecker Syntax](https://woodpecker-ci.org/docs/2.3/usage/workflow-syntax) +- [PyTorch](https://pytorch.org/docs/stable/index.html) +- [TensorFlow](https://www.tensorflow.org/versions/r2.15/api_docs/python/tf) +- [NVIDIA PyTorch Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) +- [NVIDIA Tensorflow Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) \ No newline at end of file