Overview of KerasCV and KerasNLP

# Overview of KerasCV and KerasNLP

Applied machinelearning with caros CV and caros NP myname is way and I'm a developer Advocatefrom Google ml team in this new videoseries we're going to explore how toleverage the two powerful caros domainlibraries to tackle common computervision and natural language processingtasks since these two libraries arebased on Caris we're going to assume youare familiar with basic caros apis andhave a solid understanding offundamental machine learningConcepts before diving into caros CV andcaros I want to introduce you to a veryexciting new feature from caras carascore this fantastic feature has amodular backend architecture and allowsyou to run caras code on top ofarbitrary Frameworks includingtensorflow Jacks and py torch it alsoenables you to seemlessly integratecaras components like layers models oror Matrix as part of lowlevel tensorflowJacks and pytorch workflows we'll bereleasing some videos dedicated to carcore in the future for now you can thinkof carot core as an abstract layer thatallows you to transparently swapexecuting backends without changing yourmodeling code like what's being shownhere if you want to learn more about carcore feel free to check out theannouncement link below you may beasking why are we even talking aboutcaros score in this videoseries since this is the video series oncaros CV and caros LP right goodquestion the reason is that caros CV andcaros NLP which of course uses caros asa core are already compatible with carascore so you can try different backendsin your CV and NLP applications today bydefault car CV and carp use tensorflowbackendsthere are two ways to make the changeone way is to use the Caris backendenvironment variable for example on theleft we are changing the back end to Jaxby setting the environment variable inShell or in Python code another way isto set the flag in config files as shownon the right in this case you need tochange both the backend flag to Jax inthe car. Json file and the multi backflag to Che in caros cv. Json or carosnl. Json file depending on which Libraryyou are using since the default back endin caros score is tensor F I want tohighlight that caros CV and caros inpmodels are still compatible with all ofyour favorite tensorflow featuresincluding tensorflow lights andtensorflow JS for inference on mobiledevices and the browser xlaa compilationand TPU acceleration for fast inferenceand DET tensor for multi acceleratorinference which is important for thelatest language models with weights aslarge as tens ofgigabytes now you understand how coolCara score is let's take a look at whatcaros CV and caros NLP can do for youthe last few years of machine learningInnovations have greatly expanded ourunderstanding of what's possible withmachine learning from break throughperformance in classical benchmarks toexciting New Field of generative AI it'scritical to stay up to dat with thelatest models to be competitive at thesame time state-of-the-art models havegotten increasingly complicated andoften require expensive pre-training tobe useful that's why we have open sourcethe car CV and carrp the easiest way todeploy state-of-the-art pre train modelsfor text and image classification objectdetection generative AI any more withthis libraries you will have access toCutting Edge models with a simplifiedand consistent API take a look at thesetwo examples you can find in a birclassifier or generate images from atext prompt with just a few lines ofcode there are a lot of machine learningtasks car CV and carp can help you within this first video we will take a quicktour of the unified car CV and Nmodeling apis so that you can appreciatehow easy car CV and LP are in thiscommon usecases let's start with carCV so in this example we would like toclassify images as cat a negative labelor dog a positivelabel the highest level model in carosCV and caros inlp is a task a task likeimage classifier is a caras modelconsisting of a backbone sub model and atask specific layers required forsolving a specificproblem backbone models in turn are aset of reusable layers generallypre-train on a separate task to extracthighly informative features from theinput data greatly reducing the amountof labeled data and compute resourcesrequired to get competitive performanceon yourtask in this example we use a retarchitecture for our backbone to loadmodels with pre- trended weights in carCV and caros NP use a from presetConstructor with a name of apreset in this case reset 50 imet is a50 layer reset model pre-trained on imetdata set you can find a list of allavailable presets in the class St scenor on cars.io website once we havecreated our backbone model we pass it tothe image classifier Constructor alongwith the number of classes we would liketopredict after that compile and fit themodel like any other caras workflow andyou're ready to go car 3 also supportsobject detection this is a significantlymore complex task than imageclassification because the model candetect any number of objects and mustpredict a class and a bonding box foreach one of them in this example theblue boxes and Cass are ground chooselabels for objects in the image whilethe orange boxes and classes areinferred by a car CV retina net modeldespite this added complexity creatingan object detection model with a retinanet architecture is very similar toimage classification we again choose apre-trend ret 50 backbone using thefront preset Constructor the maindifference is requiring labeled boundingboxes in the training set and specifyingthe bounding box format in the taskConstructor after that compile and fixthe model like any other caras workflowand you're ready to go data orentationis a critical pre-processing stepnecessary to maximize accuracy Incregion tasks in order to avoidoverfitting to the lighting cropping andother particularities of the trainingset it is important to rotate noise andeven mix together the original images toincrease the robustness of the trainingobjective for example the flowers in theupper left panel are original imagesfrom the training set and the flowers inthe lower right panel are versionsaugmented by carCV caros CV offers a wide variety oforentation layers or in a simple APIthat can transform classification labelsand object detection boning boxes alongwith the original images this includerandom flip for rotations random augmentfor intensity perations cut mix and mixup for creating composite images andmany more simply combine your desiredorentation layers in a caras model andmap over your data set before trainingthe model are you excited aboutgenerativeAI sure weare car CV provides that simpleinterfaces to the latest text to imagemodels such as stable diffusion togenerate novel images from your textprompts in state a stable diffusion taskwith a desired output image size andthen call the models text to imagemethod with a prompt and number ofoutputimages here we tried a photograph of anastronaut riding a horse and can't arguewith the results we're constantly addingmore generative AI capabilities to carCV here here are some Advanced featureswe already support texal inversion is amethod to teach stable diffusion modelsspecific visual concepts real examplesfor example we can provide a set ofinput images to teach the model aboutour teammates cat and then ask him to bevisualized as a Fantasycharacter prompt to prompt is a methodto modify the prompt to stable diffusionwhile keeping the image visuallyconsistent for example after askingstable defusion to generate a photo of adog with sunglasses we can replace dogwith cat and get a visually similarimage of a cat wearing sunglassesinstead F free to check out all ourgenerative AI guides at caros CVwebsite next let's talk about caros NLPwhich unlocks natural languageprocessing workflows with the same easyto use APIlet's start by training a Sentimentalanalysis classifier to predict whethermovie reviews are positive ornegative we instantiate a bird classfire task model using the front presetConstructor when difference from car CVthat caros in op task models likebroadcast fire include pre-processing bydefault this means you can pass rawstrings at both training and servingtime without worrying about using thecorre tokenization and packingmthod for this reason it is best to callfirm presets on the task model itselfrather than passing an explicit backbonethis will automatically give you amatching pre-processor class which willtokenize and path the input text tomatch the expectation of the backboneclass in this case we choose the birsbase in uncased preset which is one ofthe largest architectures trained onlowercase English data this backbone hasbeen pre-trained on gigabytes of Textdata to understand the meaning of wordsin context and extract more informationfrom our labeled examples you can find alist of all available presets in theclass do streen or on cars.iowebsite after fine-tuning our model on adata set of sentiment labeled moviereviews fromIMDb we can predict the sentiment of twonew movie reviews we can see that ourfirst review what an amazing movie isgiven a99.6% probability of having a positivesentiment while our second review isgiven the almost the same probability ofhaving negativesentiment car CV also offers textgeneration using popular models such asgpt2 and opt fine-tuning a Texgeneration model is as easy asclassification simply pass a data set ofTex that you want the model to imitateand preprocessing will be handledautomatically coo LM is a task modelthat predicts each token in an inputsequence given all the preceding tokensit is a canonical approach to traininggenerative tax models that can predictnew tokens given a user prompt in thisexample we load a pre-trend base gpt2model using fromont preset Constructorand finding it on 300,000 news articlesfromCNN The Daily Mail data set after thatwe compile and fit the model like anyother caros workflowCod LM tasks come with a Genera methodallowing you to specify a prompt andmaximum output lens to generate new texthere we generate the start of a newsarticle about snowfall in Buffalo NewYork wow that's a lot of snow as you cansee we have now seen a number of highlevel apis to build state of the artmodels with just a few lines of code butwhat if we want more customization or tobuild something entirely new the Carisecosystem of libraries is built aroundthe principle of progressive disclosureof complexity this means that simple usecases should be simple and advanced usecases should be possible we accomplishthis by building up our highest levelapis from a set of lower level modulesthat are as well documented and easy touse as all of the models we we haveshown you so far for example supposethat your data sets includes relativelyshort text segments and trainingrequires many passes through the data inthat case the buildingpre-processor in bird class fire may notbe a good fit for you since it will padall sequences to the max length of 512tokens and recompute the pre-processingin every Epoch of trainingfor this situations you don't have toFork our code or start from scratch ourbirth classifier is built fromcustomizable bir backround bir Proprocessor and bir tokenizerclasses that each have their own frontpreset methods to access thepre-processor directly simply call ourbir PR processor class with the samepreset name as the classifier and anycustom parameters you want to specifysuch as a shorter sequen L you can thenapply the preprocessing yourself in thisexample including caching so that thetokenization is not recomputed in eachEpoch to avoid invoking the preprocessortwice in your workflow simply setpreprocessor tonoun in the task Constructor compile andfit the model like another carasworkflow and you are good to go needeven moreflexibility want to build your owncustom Transformer based cassier fromscratch car CV offers all the necessaryPrimitives to build a unique n LP modelin just a few lines of code thesePrimitives are the very same methods weuse to build the state-of-the-art modelsyou have already seen using the carosfunctional API the first step tocreating a new model is todeclare an input tensor in this case ourinput is variable lens sequence of tokenIDs we then pass this sequence to anembedding layer that learns a uniqueVector representation for each token IDand sequence position returning theirsum for the each token in the sequencewe then pass the embedding output to astack of configurable Transformer incodelayer that apply a sequence of madattention and fit forward layers to theinput the output of this stack is ourfinal sequence representation to producea single classification output for thetoken sequence the common practice is toput a placeholder token at the beginningof each sequence and pass this token'srepresentation as the input to a fitforward layer with the same number ofoutputs as classic to predict as withonly caras model we pass the functionalinputs and outputs to theConstructor to get a model instancecompile and fit the model like any othercaras workflow and you're good to go sothat's it for our overview of Cara CVandcarrp here are some additional resourcesyou can check out in our next videowe're going to dive deeper into imageclass with car CV 


# How to leverage KerasCV for image classification

hi there welcome back my name is way andI'm a developer Advocate from Google mlteam today we're going to take a closerlook at car CV in one of the most commonmachine learning use cases imageclassification we will start with basicinference with a pre-rain classifierthen we'll find you a pre-rain backboneand finally we're going to take the moreadvanced task of training an imageclassifier fromscratch first basic inference car CVoffers a number of pre-t trend backbonemodels including efficient net mobilenet ret and so on you can find thecomplete list of backround models in theGitHub link below in this case we justload the efficient net V2 model pre-trend on image net data set using thesimple front preset API let's take a cimage as atest now we have the classifier and thetest image we can just call the predictmethod and we will get the predictionresults finally we look up the top twopredicted classes from the label fileand we can get the right class Egyptiancat as you can see with just a few linesof code we have used a pre-train modelto classify an image it's supersimple moving on to fine-tuning apre-trend background model while we candirectly use a pre-trend efficient netmodel to make predictions we can improveaccuracy by fine-tuning a custom CLclass ifier as an example we're going tofind you a model for classifying Castondogs we first load the Caston dogs dataset and we resize the training data aspre-processing using a helper functiondefinedbefore we can take a peek at somepre-processed sampleimages now we load the efficient netmodel again and compile the model notethat this time we specified number ofclasses as two since there are only twolabels cat and dog then we call thefamiliar fit method to kick off finetuning after training is finished we canround the prediction and get the resultscat that's it for fine tuning our nexttopic is more advanced training andimage classifier from scratch this timewe're going to use the cartech 101 dataset we split and Shuffle the data herethen then we batch the data set andvisualize a few sampleimages the carch 101 data set hasdifferent sizes for every image so weuse direct batch API to batch themtogether while maintaining eachindividual images shape information wemay also perform some data orentationhere but in the interest of time we'reskipping that part our next video willfocus on data orentation so make sure towatch that one next next we Define theoptimizer with a warmup cosign Decayschedule the specific detail of thisschedule is not very important here nextstep is to build our model by stacking afew layers on top of the efficient netV2 backbone we also Define thecategorical cross entropy loss finallywe compile and fit the model and theCara CV will train a powerful classifierfor you so in conclusion from what wehave talked about today you can see howeasy it is to leverage car CV and reusethe building models and modules to buildpowerful imageclassifiers in our next video we'regoing to cover another important topicdata orentation

# Augmenting image data with Keras CV

hi there welcome back to our videoseries of Applied machine learning withCara CV andcarp my name is way and I'm a developerAdvocates from Google mlteam today we're going to cover a veryimportant topic data orentation Cara CVoffers a wide Suite of preprocessinglayers implementing common datadocumentation techniques and makes iteasy to assemble state-of-the-artindustry grid data organization pipelinefor image classification and objectdetection tasks to demonstrate how toperform data augmentation let's firstload up the Oxford Flower 102 data setwhich includes all kinds of flowers wefurther pre-process the data set here byshuffling and batching let's take a lookat some sample flowers they lookbeautiful now our data is ready and wecan talk about data augmentation carotCV offers a large number of dataaugmentation layers but three of themost useful layers are probably Randaugment cut mix and mix up these threelayers are used in nearly allstate-ofthe-art image classificationpipelines so let me show you how theywork run arment as you can probablyguess from the name select a randomoperation from list of operations then asamples a random number if that numberis smaller than the rate parameter itwill apply the random operation to thegiven image other than the rateparameter the value range parameterspecifies the range of values the imageshave augmentation per image specifiesthe number of operations to applymagnitude and magnitude St dep basicallydetermine the normal distribution usedto sample for each dataaugmentation you can see some sampleresults after applying Rand argumentscutom mix and mix up are the other twoimportant operations cut mix randomlycuts out portions of one image andplaces them over another and mix upinterpolates the pixel values betweentwo images F free to learn more aboutthem in the links Below on the right youcan also see the results after applyingcut mix and mix up coming back to Randargument while the default Rand argumentis pretty powerful there may be someoccasions where you want to customize itfor example you want to exclude anorentation or add another orentation inthis case you can use random orentationpipeline random orentation pipeline is alayer that works similarly to Randarguments but it gives you theflexibility to customize yourargumentation pipeline for example herewe removing the random rotation layerand adding the grid mask layer in thepipeline now we can apply the customizedPipeline and use the result on the righthere's another example of building apipeline with grid mask and gr scalelayers finally let's train aconvolutional neuron network withargumentation as an exercise we use cutmiix mixup and Rand Argent in thisexample we use efficient net V2 as abackground and compile the model finallywe train the model so to conclude todaywe have learned how to augment imagedata with some of the most popularaugmentation operations Rand augment cutmix and mix up they're building into carCV and you can easily leverage them inyour own use cases you can alsocustomize the augmentation pipelines tosuit your own needs

# How to perform object detection with KerasCV

## with KerasCV and KerasNLP.
to use KerasCV to perform image data augmentation. Object detection is a process of identifying, classifying, and localizing objects within a given image.
Typically, your inputs are images, and your labels are bounding boxes with optional class labels.
Here's an example.
In this image, the object detection model has not only classified the object as a TV monitor but also identified the four corner points of the bounding boxes around the TV object. On the right, you can see a sample of model output, a class and a bounding box.
In reality, the input image may be more complicated and has more objects in it. In this case, object detection is done by generating many anchor boxes of varying shapes and sizes across the input image and assigning them each a class 
label, as well as four data points to pinpoint the bounding boxes.
So let's learn how to do this with KerasCV.start with basic object detection, with a pre-trained classifier.
Then we will train a custom object detection model.
It's pretty straightforward to perform detections with a pre-trained model.
to use the RetinaNet model, pre-trained on the PASCAL VOC data set. Other than RetinaNet, KerasCV also offers Yolo v8 and Yolo x models for object detection.
Note the xywh bounding box format.
xywh stands for xy, width, and height, which basically gives you the top left corner, plus the width and height of the bounding box.
There's another format, xyxy, gives you the top left and bottom right corners to pinpoint the bounding box.
To use the RetinaNet model with the ResNet 50 backbone, we need to make sure the input image size is dividable by 64.
resize the test image to 640 by 640 here.
We map the class IDs to class names so that the KerasCV visualization tool can print out the class names for each object, along with the bounding boxes.

As you can see, the model is able to detect three objects and predict their respective classes.
But the object on the right is arguably a little bit hard to say for sure.
If you're familiar with object detection algorithms, you probably remember something called NMS, non-max suppression.
NMS is a traditional algorithm that solves the problem of a model detecting multiple boxes for the same object.
It's quite easy to set up an NMS layer and pass it into the pre-trained RetinaNet model in KerasCV.

# Generate images with stable diffusion using KerasCV

Applied machine learning withcar CV andcarp my name is way and I'm a developerAdvocate from GoogleMLT in our last episode we showed youhow to use car CV to perform objectdetection and if you recall at the endwe used the stable diffusion to generateseveral artificial images and thenconduct detection on thoseimages So today we're going to take acloser look at image generation withstable diffusion using caros CV stablediffusion is a powerful text to imagemodel open sourced by stabilityAI while there exists multiple opensource implementations that allows youto easily create images from textualprompts Cara CV is offering a fewadvantages this include xla compilationand mixed Precision support whichtogether achieves state-of-the-artgenerationspeed it's super easy to invoke staplediffusion with car CV as you can seehere we're passing in a string which isoften called prompt and the batch sizeofthree and the model is able to generatethree stuning images which is exactlywhat the prompt is describing it is thateasy let's try a differentprompt another three amazingimages this just looks like magic so howdoes it work as a core it's a diffusionprocess that takes a learned Machiningmodel to generate images from randomnoises there's a great caras tutorialthat walks you through how to implementthis diffusion model with scas step bystep I highly recommend you check outthe link below other than the diffusionmodel there's a text encoder which turnsyour prompt streen into into a latentVector this Vector will be concatenatedto a randomly generated noise patch thenew Vector will be repeatedly denoisedby the diffusion model finally thelatent image goes through the decoderwhich turns the 64x 64 IM image patchinto a higher resolution52 by 512image for more detailed explanation Ihighly recommend video from ourcolleague Diva you can find the linkbelow there are several open sourcestabil diffusion models available butcar CV enjoys several unique advantagesto make the Run much faster to see howlet's start Benchmark the unoptimizedmodel at the Baseline it takes about 8seconds to generate three images onan a100 GPUnow let's turn on mix Precision whichbasically means performing computationsusing float 16 Precision while storingfloat 32weights this is faster because Nvidiagpus have specializedfp16 of Kernel that runs faster thantheir fb32 counter carts this time ittakes only 6 seconds to finish nextlet's try out xcla compilation we can dothis by by setting the G compile flag toTrue while constructing the modelagain now let's Benchmark the xoa modelit takes about 6.3 seconds finally wecan put everything together and turn onboth mix precision and xlaa compilationthis time it only takes about235.64 seconds almost half of the time it takesfor the unoptimized model at the verybeginning we can compare all the resultshere I should point out that theBenchmark was done on an a100 GPU soyour result mayvary as you can see it's super easy torun the stabil diffusion model with CaraCV and the tensor tooling mix precisionand XL compilation gives you even moreperformance and allows you to experimentmuchfaster okay so let's wrap up for todayso far we have coveredusing car CV to perform imageclassification data augmentation objectdetection