dev_environment #2
			
				
			
		
		
		
	
										
											Binary file not shown.
										
									
								
							| 
		 Before Width: | Height: | Size: 20 KiB After Width: | Height: | Size: 20 KiB  | 
										
											Binary file not shown.
										
									
								
							| 
		 Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB  | 
										
											
												File diff suppressed because one or more lines are too long
											
										
									
								
							| 
		 Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 38 KiB  | 
| 
						 | 
				
			
			@ -1,4 +1,106 @@
 | 
			
		|||
# Training Environment 
 | 
			
		||||
 | 
			
		||||
Currently under heavy development, coming soon....stayed tuned!
 | 
			
		||||
This documentation is for advanced users which are aware of following tools: git, python/R, cuda, pytorch/tensorflow and basic container knowledge.
 | 
			
		||||

 | 
			
		||||
## Overview
 | 
			
		||||
Available are two worker agents with 
 | 
			
		||||
- 12 physical CPUs 
 | 
			
		||||
- 40 GB memory
 | 
			
		||||
- 20 GB Nvidia GPU memory
 | 
			
		||||
- 100 GB Hdd Diskspace
 | 
			
		||||
 | 
			
		||||
Only two pipelines can run in parallel to ensure having the promised hardware resources. If more jobs occur, they will be stored in a queue and released after the fifo principle. Storage is not persistent - every build/training job needs to saved somewhere external.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Development
 | 
			
		||||
 | 
			
		||||
### Git
 | 
			
		||||
Create a new git repository and commit your latest code here: https://git.sandbox.iuk.hdm-stuttgart.de/
 | 
			
		||||
 | 
			
		||||
Repositories can be private or public - depends on your use case.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### CI
 | 
			
		||||
Connect your newly created repository here: https://ci.sandbox.iuk.hdm-stuttgart.de/ 
 | 
			
		||||
1. After login, click on "+ Add repository"
 | 
			
		||||

 | 
			
		||||
2. Enable the specific repository
 | 
			
		||||
 | 
			
		||||
3. Go to the repositories [overview site](https://ci.sandbox.iuk.hdm-stuttgart.de/repos) and select your enabled repository
 | 
			
		||||
4. Go to settings (clicking the settings icon)
 | 
			
		||||

 | 
			
		||||
5. Set a reasonable timeout in minutes (e.g. 360 minutes for 6hours) if some training crashes/hangs
 | 
			
		||||
6. Add additional settings like secrets or container registries, see the official [documentation](https://woodpecker-ci.org/docs/usage/project-settings) for additional settings
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Pipeline File
 | 
			
		||||
An example script can be found here:
 | 
			
		||||
 | 
			
		||||
 https://git.sandbox.iuk.hdm-stuttgart.de/grosse/test-ci
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
1. Create a new file in your repository `.woodpecker.yml` (or different regarding repository settings above)
 | 
			
		||||
2. The content can look like following:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
steps:
 | 
			
		||||
  "train":
 | 
			
		||||
    image: nvcr.io/nvidia/tensorflow:23.10-tf2-py3
 | 
			
		||||
    commands:
 | 
			
		||||
      - echo "starting python script"
 | 
			
		||||
      - python run.py
 | 
			
		||||
  "compress and upload":
 | 
			
		||||
    image: alpine:3
 | 
			
		||||
    commands:
 | 
			
		||||
      - apk --no-cache add zip curl
 | 
			
		||||
      - zip mymodel.zip mymodel.keras
 | 
			
		||||
      - curl -F fileUpload=@mymodel.zip https://share.storage.sandbox.iuk.hdm-stuttgart.de/upload
 | 
			
		||||
```
 | 
			
		||||
See the official [documentation](https://woodpecker-ci.org/docs/usage/workflow-syntax) for the syntax.
 | 
			
		||||
 | 
			
		||||
Generally, the pipeline is based on different steps, and in each step, another container environment can be chosen. In the example above, first an official tensorflow container with python 3 is used to run the training python script. In the second step, the model gets compressed and pushed on the temp. sandbox storage.
 | 
			
		||||
3. Commit and push
 | 
			
		||||
4. See current state of the pipelines at the [overview site](https://ci.sandbox.iuk.hdm-stuttgart.de/repos)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Exporting trained model
 | 
			
		||||
We provide a 3-months disposal internal storage.
 | 
			
		||||
You can either use the a simple curl command `curl -F fileUpload=@mymodel.zip https://share.storage.sandbox.iuk.hdm-stuttgart.de/upload` to upload a file or a simple python script 
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
import requests
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
myurl = 'https://share.storage.sandbox.iuk.hdm-stuttgart.de/upload'
 | 
			
		||||
 | 
			
		||||
print("uploading file")
 | 
			
		||||
files = {
 | 
			
		||||
    'fileUpload':('mymodel.keras',  open('mymodel.keras', 'rb'),'application/octet-stream')
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
response = requests.post(myurl, files=files)
 | 
			
		||||
print(response,response.text)
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
which returns a json with the download url of your uploaded file.
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
{"PublicUrl":"https://storage.sandbox.iuk.hdm-stuttgart.de/upload/49676006-94e4-4da6-be3f-466u786768979/mymodel.keras","Size":97865925,"Expiration":"2024-03-30T00:00:00Z"}
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
## Troubleshooting: 
 | 
			
		||||
- The first time an external container is pulled, depending on the size, container images can take quite a while as different organization (like dockerhub) limit the download speed. The Sandbox git also supports hosting container images...
 | 
			
		||||
- Choose a proper way to output some reasonable logs during your training, so it wont spam the logs too heavily
 | 
			
		||||
- training exists after 60 minutes: increase maximum duration in the ci repository settings
 | 
			
		||||
 | 
			
		||||
## Useful Links
 | 
			
		||||
- [Sandbox GIT](https://git.sandbox.iuk.hdm-stuttgart.de/)
 | 
			
		||||
- [Sandbox CI](https://ci.sandbox.iuk.hdm-stuttgart.de)
 | 
			
		||||
- [Git](https://git-scm.com/docs/gittutorial)
 | 
			
		||||
- [Woodpecker Syntax](https://woodpecker-ci.org/docs/2.3/usage/workflow-syntax)
 | 
			
		||||
- [PyTorch](https://pytorch.org/docs/stable/index.html)
 | 
			
		||||
- [TensorFlow](https://www.tensorflow.org/versions/r2.15/api_docs/python/tf)
 | 
			
		||||
- [NVIDIA PyTorch Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
 | 
			
		||||
- [NVIDIA Tensorflow Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow)
 | 
			
		||||
		Loading…
	
		Reference in New Issue