Backend for the Autocomplete platform. An AI assisted coding platform.

Last update: Jan 31, 2022

Overview

Introduction

A custom predictor allows you to deploy your own prediction implementation, useful when the existing serving implementations don't fit your needs. If migrating from Cortex, the custom predictor work exactly the same way as PythonPredictor does in Cortex. Most PythonPredictors can be converted to custom predictor by copy pasting the code and renaming some variables.

The custom predictor is packaged as a Docker container. It is recommended, but not required, to keep large model files outside of the container image itself and to load them from a storage volume. This example follows that pattern. You will need somewhere to publish your Docker image once built. This example leverages Docker Hub, where storing public images are free and private images are cheap. Google Container Registry and other registries can also be used.

Make sure you use a GPU enabled Docker image as a base, and that you enable GPU support when loading the model.

Getting Started

After installing kubectl and adding your CoreWeave Cloud access credentials, the following steps will deploy the Inference Service. Clone this repository and folder, and execute all commands in there. We'll be using all the files.

Sign up for a Docker Hub account, or use a different container registry if you already have one. The free plan works perfectly fine, but your container images will be accessible by anyone. This guide assumes a private registry, requiring authentication. Once signed up, create a new repository. For the rest of the guide, we'll assume that the name of the new repository is gpt-6b.

Build the Docker image

Enter the custom-predictor directory. Build and push the Docker image. No modifications are needed to any of the files to follow along. The default Docker tag is latest. We strongly discourage you to use this, as containers are cached on the nodes and in other parts of the CoreWeave stack. Once you have pushed to a tag, do not push to that tag again. Below, we use simple versioning by using tag 1 for the first iteration of the image.
```
export DOCKER_USER=thotailtd
docker build -t $DOCKER_USER/gpt-6b:v1alpha1 .
docker push $DOCKER_USER/gpt-6b:v1alpha1
```

Set up repository access

Create a Secret with the Docker Hub credentials. The secret will be named docker-hub. This will be used by nodes to pull your private image. Refer to the Kubernetes Documentation for more details.
```
kubectl create secret docker-registry docker-hub --docker-server=https://index.docker.io/v1/ --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
```
Tell Kubernetes to use the newly created Secret by patching the ServiceAccount for your namespace to reference this Secret.
```
kubectl patch serviceaccounts default --patch "$(cat image-secrets-serviceaccount.patch.yaml)"
```

Download the model

As we don't want to bundle the model in the Docker image for performance reasons, a storage volume needs to be set up and the pre-trained model downloaded to it. Storage volumes are allocated using a Kubernetes PersistentVolumeClaim. We'll also deploy a simple container that we can use to copy files to our newly created volume.

Apply the PersistentVolumeClaim and the manifest for the sleep container.

$ kubectl apply -f model-storage-pvc.yaml
persistentvolumeclaim/model-storage created
$ kubectl apply -f sleep-deployment.yaml
deployment.apps/sleep created

The volume is mounted to /models inside the sleep container. Download the pre-trained model locally, create a directory for it in the shared volume and upload it there. The name of the sleep Pod is assigned to a variable using kubectl. You can also get the name with kubectl get pods.

The model will be loaded to Amazon S3 soon. Now I directly uploaded it to CoreWeave

export SLEEP_POD=$(kubectl get pod -l "app.kubernetes.io/name=sleep" -o jsonpath='{.items[0].metadata.name}')
kubectl exec -it $SLEEP_POD -- sh -c 'mkdir /models/sentiment'
kubectl cp ./sleep_383500 $SLEEP_POD:/models/sentiment/

(Optional) Instead of copying the model from the local filesystem, the model can be downloaded from Amazon S3. The Amazon CLI utilities already exist in the sleep container.

$ export SLEEP_POD=$(kubectl get pod -l "app.kubernetes.io/name=sleep" -o jsonpath='{.items[0].metadata.name}')
$ kubectl exec -it $SLEEP_POD -- sh
$# aws configure
$# mkdir /models/sentiment
$# aws s3 sync --recursive s3://thot-ai-models /models/sentiment/

Deploy the model

Modify sentiment-inferenceservice.yaml to reference your docker image.

Apply the resources. This can be used to both create and update existing manifests.

 $ kubectl apply -f sentiment-inferenceservice.yaml
 inferenceservice.serving.kubeflow.org/sentiment configured

List pods to see that the Predictor has launched successfully. This can take a minute, wait for Ready to indicate 2/2.
```
$ kubectl get pods
NAME                                                           READY   STATUS    RESTARTS   AGE
sentiment-predictor-default-px8xk-deployment-85bb6787d7-h42xk  2/2     Running   0          34s
```
If the predictor fails to init, look in the logs for clues kubectl logs sentiment-predictor-default-px8xk-deployment-85bb6787d7-h42xk kfserving-container.
Once all the Pods are running, we can get the API endpoint for our model. The API endpoints follow the Tensorflow V1 HTTP API.
```
$ kubectl get inferenceservices
NAME        URL                                                                          READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
sentiment   http://sentiment.tenant-test.knative.chi.coreweave.com/v1/models/sentiment   True    100                                23h
```
The URL in the output is the public API URL for your newly deployed model. A HTTPs endpoint is also available, however this one bypasses any canary deployments. Retrieve this one with kubectl get ksvc.

Run a test prediction on the URL from above. Remember to add the :predict postfix.

 $ curl -d @sample.json http://sentiment.tenant-test.knative.chi.coreweave.com/v1/models/sentiment:predict
{"predictions": ["positive"]}

Remove the InferenceService. This will delete all the associated resources, except for your model storage and sleep Deployment.

$ kubectl delete inferenceservices sentiment
inferenceservice.serving.kubeflow.org "sentiment" deleted
```# thot.ai-Back-End

Backend for the Autocomplete platform. An AI assisted coding platform.

Related tags

Overview

Introduction

Getting Started

Build the Docker image

Set up repository access

Download the model

Deploy the model

Owner

Tatenda Christopher Chinyamakobvu

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Residual2Vec: Debiasing graph embedding using random graphs

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Script to download some free japanese lessons in portuguse from NHK

Pretty-doc - Composable text objects with python

A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Pipelines de datos, 2021.

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

wxPython app for converting encodings, modifying and fixing SRT files

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

Backend for the Autocomplete platform. An AI assisted coding platform.

Related tags

Overview

Introduction

Getting Started

Build the Docker image

Set up repository access

Download the model

Deploy the model

Owner

Tatenda Christopher Chinyamakobvu

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Residual2Vec: Debiasing graph embedding using random graphs

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Script to download some free japanese lessons in portuguse from NHK

Pretty-doc - Composable text objects with python

A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Pipelines de datos, 2021.

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

wxPython app for converting encodings, modifying and fixing SRT files

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。