Technically Speaking

LLM Classifiers: Don't Just Classify, Conquer

Aryan Garg — Sun, 20 Jul 2025 08:41:39 GMT

Here is an everyday ask: you need to categorize data. Customer tickets, product reviews, medical images, financial transactions, or something else that’s just boring. The classic playbook says to grab a battle tested algorithm like Logistic Regression or a Gradient Boosting, feed it labeled data, and call it a day. It's safe, reliable, and ... completely unimaginative.

What if, instead, you “handed” the job to a Large Language Model? It sounds like using a sledgehammer to crack a nut. A slow, expensive, and notoriously unpredictable sledgehammer. It’s the kind of idea that gets you laughed out of a planning meeting. And yet, it might be the smartest move you’ll make all year.

The Paradox: Why Are LLMs Such Awkward Classifiers?

On paper, LLMs are the worst possible candidates for a classification job. The two concepts are fundamentally at odds.

Feature

Classic Classification

Large Language Models

Task	Narrow & Specific: Is this email spam or not?	Broad & Generative: Write a sonnet about spam.
Output	Deterministic: A single, predictable label.	Stochastic: Creative, varied, sometimes nonsensical text.
Speed	Milliseconds: Built for high-throughput systems.	Seconds (or more): Notoriously slow.
Interpretability	High: We can often see why a decision was made.	Zero: A black box wrapped in an enigma.

When to Stick with the Classics: The Case for Traditional ML

Before we go further, let's be clear, LLMs are not a silver bullet. In many scenarios, using a classic text classifier isn't just a good option, it's the best option. These traditional models are fast, cheap, and highly effective for well-defined problems. You should absolutely stick with a classic model like Naive Bayes, SVM, or Gradient Boosting when:

Speed is Critical: If your application requires near-instantaneous, high-throughput classification (e.g., real-time ad bidding or initial spam filtering), the latency of a large LLM is a non-starter.
The Problem is Simple: If your classes are clearly distinct and you have a good amount of labeled data, a traditional model will likely achieve high accuracy without the overhead and cost of an LLM.
Budgets are Tight: Training and hosting a classic model is orders of magnitude cheaper than paying for every single classification via an LLM API call. For routine tasks at scale, these costs add up quickly.
Interpretability is Non-Negotiable: In regulated industries like finance or healthcare, you must be able to explain why your model made a specific decision. Classic models offer this transparency. LLMs do not.

...And Why You Should Use Them Anyway

So, if classic classifiers are so effective, why entertain the LLM madness at all? Because for a certain class of chaotic, real-world problems, the benefits aren't just incremental; they're transformative.

Zero-to-One Speed: Forget data collection and training cycles. You can build a prototype classifier as fast as you can write your first prompt. This lets you validate ideas with users before you commit a single line of production code.
The End of "Retraining": Adding a new category? With a classic model, you're back to the data labeling mines. With an LLM, you often just need to update the prompt. This isn't a minor convenience; it's a fundamental shift in operational agility.
Embracing the Mess: Real-world data is a disaster. It's filled with typos, slang, sarcasm, and missing information. Traditional models choke on this. LLMs, trained on the messy entirety of the internet, often handle it. Multi-modal models can even classify based on a combination of text, images, and audio in a single pass.
The Language Barrier Dissolves: Need to classify user feedback in Hindi, then English, and finally French? A single, well-designed LLM system can handle it without needing separate models for each language. This is a game-changer for global products.

Example: The Chaos of Customer Intent

Human Ambiguity: A customer might say, "My internet is broken," which sounds technical. But the one of the possible reasons it's broken is an unpaid bill. The true intent could be Billing, not Technical Support.
Evolving Dialogue: The conversation is a moving target.
- Bot: "How can I help you?"
- Customer: "I have a problem with my plan."
- Bot: "Is it your mobile plan or your home internet plan?"
- Customer: "The second one."
- "The second one" is meaningless in isolation. The classifier needs the full context.
Organizational Mismatch: The customer thinks they want to "cancel their service" (Cancellation intent). But what they really need is to pause it for a month while they travel, a process handled by the Sales team. The team structure doesn't match the customer's mental model.
Noisy Data: Speech-to-text errors, background noise, regional dialects, it's all part of the noise.

The Modern Architectures: Beyond Simple Prompting

If you think LLM classification is just about few-shot prompting, you're living in 2023. SOTA techniques aren’t so SOTA anymore.

1. The Semantic Searchlight (RAG for Classification)

This is our go-to. Instead of one giant prompt, we treat our intent descriptions as a database.

Setup: Each of our intents (Billing, Sales, Tech Support) has a detailed description, including edge cases and examples. We embed these descriptions into a vector space.
Inference:
1. Take the incoming customer query (e.g., "My bill is wrong") and embed it.
2. Perform a vector search to find the top 3-5 most similar intent descriptions.
3. Inject only these candidates into a prompt for the LLM.
4. The LLM's task is now much simpler: "Given the query, which of these 3 options is the best fit?"
Pros: Dramatically smaller prompts (lower cost/latency), higher accuracy because you're filtering out irrelevant options, and it's interpretable (you know which candidates were considered).
Cons: Your retrieval quality is paramount. If the right intent isn't in the top 5, the LLM can't pick it.

2. Finetune the Hell out of it (Finetuning)

Here, we modify the LLM itself to be a classification expert. Instead of generating text, we want it to output probabilities for our specific labels.

Setup: Take a base open-source model (like Llama 3 or a future equivalent). Add a small "classification head" to its final layer—this can be as simple as a single linear layer.
Training: Fine-tune this modified model on your labeled dataset. The model learns to map its vast internal understanding of language directly to your specific set of intents.
Pros: Unmatched accuracy and speed for your specific domain. You get the LLM's world knowledge baked into a highly specialized tool.
Cons: This is the most complex approach, requiring ML engineering expertise for training and deployment.

3. Agent 47 but with Water Pistols (Agents with Guardrails)

This is a hybrid approach that balances automation with safety.

Setup: An LLM acts as the primary classifier, but with a crucial safety rail.
Inference:
1. The LLM makes an initial classification (e.g., predicts Cancellation).
2. Before executing, a second, simpler model or a set of business rules verifies the decision. For example, a rule might check: "Does the user's account history show recent travel bookings? If so, flag for human review, as they might want to pause, not cancel."
3. Only verified classifications are passed through to the next stage.
Pros: Gives you the flexibility of an LLM with the safety of a rules-based system.
Cons: Can add latency and requires careful design of the verification step.

Your Action Plan: How to Get Started Today

Create a "Consensus Corpus": Before you write a single prompt, grab 100 real data points which match your use case, maybe from production (ideal) or synthetically generated. Sit down and label them. This exercise is invaluable for aligning yourself with the task and exposing ambiguities in your categories. This becomes your "golden set" for testing.
Benchmark the Basics: Define your baseline. If 40% of your tickets are Billing, then any model must be better than 40% accurate. Better yet, run your golden set through a classic ML model. This gives you a real performance target to beat.
Prototype with RAG This is the sweet spot of power and practicality. Use a vector database service and a powerful API model (like GPT-4 or Gemini) to quickly test the architecture. Measure its performance against your golden set.
Analyze the Errors, Not Just the Accuracy: Don't just look at the final score. Where is it failing? Is it confusing Sales with Cancellations? This tells you where your intent descriptions need more detail or where your retrieval is weak. The business impact of a misclassification is not uniform; failing to detect a Customer Complaint is far worse than missing a General Inquiry.

Why not just build a fully autonomous LLM agent to handle everything?

Valid question, but that's a recipe for disaster. We don't need the LLM's creativity to solve a routine billing issue. We need speed, accuracy, and control. Letting an agent run wild could lead to it incorrectly modifying a user's account or giving out confidential information. The goal is to use the LLM's intelligence as a scalpel, not a wrecking ball.

Final Thoughts: It's a Culture Shift

Adopting LLMs for classification isn't just a technical change; it's a change in mindset. You move from being a "model trainer" to a "system designer." Your skills in prompt engineering, system architecture, and critical analysis of model outputs become more important than your ability to tune hyperparameters.

It's a challenging path, unexpected behavior and new failure modes at every turn. But the reward is a system that is more flexible, scalable, and intelligent than anything that has come before. Stop thinking of classification as just putting things in boxes. Start thinking of it as understanding.

Peace Love and Plants 🪴

Friendship over with Attention. Now FNet is my best friend

Aryan Garg — Wed, 07 May 2025 03:50:17 GMT

The Transformer architecture, introduced in the seminal paper "Attention Is All You Need", has revolutionized AI, particularly in Natural Language Processing (NLP). Its success hinges on the self-attention mechanism, which allows the model to dynamically weigh the importance of different words or tokens in an input sequence relative to each other, capturing context and long-range dependencies effectively.

The Attention Bottleneck: Power vs. Price

Think of self-attention like this: for every word in a sentence, the model calculates an "attention score" indicating how relevant every other word is to understanding that specific word's meaning in context. This is done using Query (Q), Key (K), and Value (V) projections derived from the input embeddings. The scores determine how much information from other words (Values) should be blended into the current word's representation.

While incredibly powerful, this pairwise comparison is computationally intensive. The complexity scales quadratically (O(N²)) with the sequence length (N) in terms of both computation and memory. This means doubling the input length (e.g., from a paragraph to a full document) quadruples the resources needed for the attention layers. This quadratic scaling becomes a major bottleneck for:

Training: Making it expensive and time-consuming to train models on large datasets or long sequences.
Inference: Limiting the speed at which models can process long inputs in real-time applications.
Deployment: Making it challenging to run large Transformers on resource-constrained hardware like smartphones or embedded systems (AI at the Edge).

Enter FNet: Computing with Fourier Transforms

Researchers at Google AI proposed a startlingly efficient alternative in their paper "FNet: Mixing Tokens with Fourier Transforms". Their model, FNet, completely replaces the computationally heavy self-attention layers within the Transformer encoder block with a standard, parameter-free Fourier Transform.

What's a Fourier Transform? Originating in signal processing, the Discrete Fourier Transform (DFT) decomposes a sequence (like the sequence of token embeddings) into its constituent frequencies. It essentially reveals the underlying periodic patterns or "resonances" within the data. The Fast Fourier Transform (FFT) is simply a highly efficient algorithm for computing the DFT, reducing its complexity from O(N²) to O(N log N).

In FNet, the FFT is applied to mix information across the token sequence. The intuition is that analyzing the frequency components provides a form of global context, capturing interactions between tokens without the need for explicit pairwise comparisons. It's a shift from learning adaptive relationships (attention) to leveraging a fixed, mathematically defined structure (FFT) for mixing information globally. FNet applies the FFT along both the sequence and hidden dimensions, taking only the real part of the complex-valued output for simplicity and empirical effectiveness.

Scalable Performance, Huge Speedups

The results published in the FNet paper were compelling:

Speed: FNet trains significantly faster than standard BERT-Base models – up to 80% faster on GPUs and 70% faster on TPUs for typical sequence lengths (512 tokens). O(N log N) scaling makes it increasingly advantageous for longer sequences.
Accuracy: Despite its simplicity and lack of parameters in the mixing layer, FNet achieves 92-97% of BERT's accuracy on the diverse tasks within the GLUE benchmark. While there's a slight accuracy trade-off compared to the best attention-based models, its performance demonstrates remarkable viability.
Hybrid Approach: An "FNet-Hybrid" model, strategically replacing only some attention layers with FFT layers (specifically, keeping attention only in the final two layers), recovered most of the performance gap, reaching 99% of BERT's accuracy. This highlights a potential synergy, using FFT for efficient global mixing and attention for finer-grained, adaptive refinement.
Long Sequences (LRA): When tested on the Long Range Arena (LRA) benchmark, designed specifically for evaluating models on long-context tasks, FNet matched the accuracy of top-performing "efficient Transformer" variants (like Reformer, Performer) while being significantly faster and more memory-efficient than all competitors on GPUs across the tested sequence lengths.

Experiments confirmed key design choices: mixing is crucial (removing it entirely fails), the FFT's structure provides a useful inductive bias (random mixing underperforms), and adding learnable parameters to the FFT itself didn't help, reinforcing the value of the fixed transform.

Market Implications

FNet's efficiency isn't just an academic curiosity; it has tangible implications:

Short-Term: The dramatic reduction in computational cost immediately benefits latency-sensitive applications (real-time translation, content moderation) and enables more powerful AI at the Edge. Running complex NLP models directly on devices becomes more feasible, improving privacy and reducing reliance on cloud connectivity. Companies offering AI infrastructure (like Clika) can leverage such architectures to provide faster, cheaper inference solutions.
Long-Term: FNet and similar research signal a potential shift away from monolithic, attention-heavy models towards more diverse, efficient architectures. This could foster innovation in areas like low-power LLMs and truly ubiquitous on-device intelligence. It challenges the economic moat of large cloud AI providers whose value relies partly on managing attention's complexity at scale. Furthermore, it underscores that architectural innovation can sometimes bypass the need for purely brute-force compute scaling (i.e., ever-larger GPU clusters), potentially impacting hardware markets. This trend aligns with a move towards more distributed and decentralized AI systems.
Jevons' Paradox: The idea that increasing efficiency in resource use can sometimes lead to an overall increase in resource consumption because the lower cost spurs greater demand. In this context, making Transformer-like models much cheaper and faster could dramatically increase their adoption and the variety of applications they're used for, potentially leading to more overall AI computation, even if individual tasks are more efficient.

The Bigger Picture: Democratizing AI

FNet exemplifies how revisiting fundamental mathematical tools can lead to breakthroughs in efficiency. By demonstrating that a parameter-free FFT can effectively replace complex self-attention for many tasks, it challenges core assumptions in model design. This focus on efficiency is crucial for democratizing AI, making powerful models more accessible, affordable, and deployable in a wider range of environments, especially beyond large data centers. While attention remains state-of-the-art for peak performance, FNet proves that highly efficient alternatives are viable, paving the way for a future

with smarter, leaner AI.

PaliGemma 2 - VLMs made easy

Aryan Garg — Sun, 08 Dec 2024 13:05:51 GMT

Introduction

The evolution of vision-language models has been nothing short of remarkable. From their early stages of independently handling images and text to their current ability to seamlessly integrate the two, these models have reached new heights. Imagine describing the content of a photo, answering detailed questions about it, or creating vivid images from mere text — these are the feats made possible by modern vision-language models.

Fine-tuning these models is the key to unlocking their full potential. While pre-trained models like PaliGemma 2 offer impressive capabilities out of the box, adapting them to specific datasets or tasks can significantly boost their performance. Fine-tuning ensures the model not only generalizes well but also excels in understanding the context and nuances of your application, whether it's medical imaging, e-commerce, or creative content generation.

Meet PaliGemma 2

PaliGemma 2, the latest open-source vision-language model released by Google, is a testament to how far these technologies have come. This sophisticated system takes images and text as inputs and generates textual outputs. Whether you’re creating captions for photos or answering intricate visual questions, PaliGemma 2 is designed to handle it all.

Key Components

SigLIP-So400m: The image encoder, built with a philosophy similar to CLIP, excels at jointly understanding images and text. It processes visual data with remarkable accuracy, making it a robust foundation for multimodal tasks.
Gemma-2B: The text decoder, a powerhouse explicitly crafted for generating coherent and contextually rich text.

By connecting SigLIP's capabilities with Gemma via a simple linear adapter, PaliGemma 2 emerges as a comprehensive solution. Pre-trained on image-text datasets, it is versatile enough to tackle various tasks, such as:

Image Captioning: Generating detailed descriptions of images.
Segmentation: Identifying and labeling objects in images.
Question Answering: Given multimodal inputs of Images and related questions, we can have it answer questions for us.

Let’s get started

🖥

Before diving into fine-tuning PaliGemma 2, it's crucial to be prepared for the resource demands. This process will require a TON of GPU memory. If you're planning to experiment with Kaggle's free-tier environment, note that its 2x T4 GPUs were not powerful enough.

🦾

However, You can try using Google Cloud Platform with AI Notebooks and opt for a NVIDIA A100 GPU, which provides significantly more memory and computational power. This setup should offer a smoother experience for fine-tuning the model effectively.

Installing Packages

!pip install -q -U git+https://github.com/huggingface/transformers.git datasets accelerate peft
!pip install -U bitsandbytes  # for QLoRA and LoRA

Loading our Authentication Keys from HuggingFace

To fine-tune PaliGemma 2 or work with any Hugging Face tools, you'll need to authenticate using an access token. Follow these steps to generate and export it:

Get Your Access Token
Log in to your Hugging Face account and navigate to the Access Tokens page.
- If you don’t already have a token, create one by clicking "New Token".
- Assign the necessary scope (e.g., write access for fine-tuning tasks).
Load your Token into your code

from kaggle_secrets import UserSecretsClient
import os
user_secrets = UserSecretsClient()
hf_secret = user_secrets.get_secret("HF General")
os.environ["HF_General"] = hf_secret

import os  
os.environ["HF_General"] = ""

Authenticate using the Hugging Face

!huggingface-cli login --token $HF_General
print("Done Authentication")

Loading our Data

To fine-tune PaliGemma 2, we’ll use a Chart Question Answering (ChartQA) dataset available on Hugging Face's datasets library. This dataset includes pairs of images and questions about them, along with corresponding answers, making it perfect for multimodal fine-tuning tasks.

from datasets import load_dataset
print("Started to Load Dataset")
train_ds = load_dataset('HuggingFaceM4/ChartQA', split="train+val")
print("Done Loading Dataset")

cols_remove = ["human_or_machine"]
train_ds = train_ds.remove_columns(cols_remove)

test_ds = load_dataset('HuggingFaceM4/ChartQA', split="test") 
test_ds = test_ds.remove_columns(cols_remove)

Loading the (Pre) Processor

To prepare our dataset for Paligemma 2, we’ll use the PaliGemmaProcessor. This processor handles both image processing and text tokenization, simplifying the workflow for fine-tuning vision-language models.

Loading the Processor

First, load the processor for the 224x224 version of PaliGemma 2, which is more memory-efficient and suitable for general-purpose tasks:

from transformers import PaliGemmaProcessor
model_id = "google/paligemma2-3b-pt-224"
processor = PaliGemmaProcessor.from_pretrained(model_id)
print("Done Loading Model")

There are higher-resolution versions available (448x448 and 896x896) as well as models with larger number of Parameters (10B, 28B) for tasks requiring more precision, like OCR or detailed segmentation. However, these demand more GPU memory and computation power.

Set the device to ‘cuda’ to use the GPU and load the model. We will Specify that the model should use bfloat16 (Brain Float 16) precision for its parameters. bfloat16 is a 16-bit floating point format that helps speed up computation and reduces memory usage while maintaining a similar range to float32.

Preparing the model layers

To prepare PaliGemma 2 for fine-tuning, we freeze the vision tower by setting requires_grad=False for its parameters, preserving its pre-trained visual features, while enabling training for the multi-modal projector by setting requires_grad=True, allowing it to adapt image-text alignment to the task. This setup ensures efficient use of pre-trained features while optimizing task-specific components.

# Vision Tower Parameters (Image Encoder)
for param in model.vision_tower.parameters():
    param.requires_grad = False

# Multi-Modal Projector Parameters (Fine-Tuning the Decoder)
for param in model.multi_modal_projector.parameters():
    param.requires_grad = True

We will load the model, and freeze the image encoder and the projector, and only fine-tune the decoder. If your images are within a particular domain, which might not be in the dataset the model was pre-trained with, you might want to skip freezing the image encoder. —Hugging Face Blog.

Why Freeze the Image Encoder and Projector?

Freezing the image encoder and multi-modal projector in a pre-trained model offers several benefits:

General Features: The image encoder, often trained on large datasets like ImageNet, has learned to extract universal visual features that are widely applicable.
Pre-Trained Integration: The multi-modal projector is already designed to align image and text features effectively, minimizing the need for additional fine-tuning.
Resource Efficiency: By reducing the number of trainable parameters, freezing these components speeds up training and lowers computational demands, making the process more efficient.

This strategy allows the model to leverage pre-trained strengths while focusing training resources on task-specific components.

Why Fine-Tune the Decoder?

Task Specificity: The decoder must be fine-tuned for the specific task. Fine-tuning allows it to learn how to generate the appropriate output based on the particular types of input it will receive in your application.

💡

Define a collate_fn function. The function returns the final batch of tokens containing the tokenized text, images, and labels, all converted to the appropriate format and moved to the right device for efficient computation.

import torch
device = "cuda"

image_token = processor.tokenizer.convert_tokens_to_ids("")
def collate_fn(examples):
  texts = ["Answer the following Question: " + example["query"] for example in examples]
  labels= [example['label'][0] for example in examples]
  images = [example["image"].convert("RGB") for example in examples]
  tokens = processor(text=texts, images=images, suffix=labels,
                    return_tensors="pt", padding="longest",
                    tokenize_newline_separately=False)

  tokens = tokens.to(torch.bfloat16).to(device)
  return tokens

Defining the Trainer

Hugging Face makes it really easy to finetune models, either through their GUI based AutoTrain as well as their Trainer Module.

from transformers import TrainingArguments
args = TrainingArguments(
    num_train_epochs=2,
    remove_unused_columns=False,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=1,
    learning_rate=2e-5,
    weight_decay=1e-6,
    adam_beta2=0.999,
    logging_steps=100,
    optim="adamw_hf",
    save_strategy="epoch",
    save_steps=5000,
    push_to_hub=True,
    save_total_limit=1,
    output_dir="paligemma2-3b-pt-224_HuggingFaceM4_ChartQA",
    bf16=True,
    report_to=["tensorboard"],
    dataloader_pin_memory=False,
    gradient_checkpointing=True,
    dataloader_drop_last=True,
)

from transformers import Trainer

trainer = Trainer(
        model=model,
        train_dataset=train_ds ,
        eval_dataset = test_ds,
        data_collator=collate_fn,
        args=args
        )

trainer.train()

And that’s it

Your model should be training now, Give it an hour or so and you’ll be ready with your very own finetuned version of PaliGemma 2.

You can Infer from the model using the code below:

from transformers import AutoProcessor, PaliGemmaForConditionalGeneration

model_id = "YourUserID/paligemma2-3b-pt-224_HuggingFaceM4_ChartQA"
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained("google/paligemma2-3b-pt-224")

from PIL import Image
import requests


prompt = "Question"
image_file = "Link to Image"
raw_image = Image.open(requests.get(image_file, stream=True).raw)

inputs = processor(prompt, raw_image.convert("RGB"), return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)

print(processor.decode(output[0], skip_special_tokens=True)[len(prompt):])

Conclusion

Fine-tuning PaliGemma 2 marks a significant step in leveraging advanced vision-language models for specialized tasks. By customizing the model to your specific dataset, you enhance its ability to perform with greater accuracy and relevance in applications like image captioning and visual question answering. Freezing the image encoder while training the decoder efficiently utilizes computational resources, allowing the model to focus on generating precise textual outputs. Setting up the appropriate environmental resources, such as using a GPU with sufficient memory, ensures a smoother fine-tuning process. As you finalize your model, you're not just adapting a powerful tool to your needs—you're expanding the possibilities of multimodal AI in your field. Embrace this opportunity to push the boundaries and see how fine-tuned models can revolutionize your projects.

An honest Guide to Optimize LLMs for upto 10x Inference

Aryan Garg — Tue, 23 Apr 2024 18:32:02 GMT

Introduction

The AI revolution has officially gone mainstream. From crafting the perfect 'Good Morning' message with Chat-GPT to generating human-like responses, Large Language Models (LLMs) have taken the world by storm. But behind the scenes, these behemoths of AI require staggering amounts of compute power and energy to train. The latest example is Llama3, Meta AI's massive model trained on two super clusters of 24,000+ Nvidia H100 GPUs each. As the scale of these models continues to grow, so do the costs of building and maintaining them. In fact, some projections suggest that the compute and electrical power needed to train such models could soon surpass the requirements of small countries.

Inference Time Optimizations.

In this landscape, optimizing inference time has become crucial. While model parameter count gets most of the attention, inference time - the time it takes for a model to make a prediction from a given input - is a critical metric that can make or break the usability of an AI system. In the context of language models, inference time is often measured in tokens per second (tk/s). Reducing inference time can significantly lower operational costs, making AI more accessible and sustainable in the future.

In the below Image, the training time would be the time to train the Neural network to identify images of cats, and inference time would be the time it takes for the pre-trained neural network to return a confidence value, if a cat is in the image.

In this discussion, we'll delve into the world of inference time optimizations, exploring techniques and strategies to speed up your PyTorch models without sacrificing the final output.

Quantization using PyTorch

Quantization is a technique used to reduce the precision of model weights from floating-point numbers to integers. This process, also known as weight quantization, aims to decrease the memory footprint and computational requirements of LLMs, making them more efficient and deployable on resource-constrained devices. By representing model parameters with fewer bits, quantization can lead to significant reductions in model size, inference time, and energy consumption, while maintaining acceptable accuracy. However, quantization can also introduce accuracy degradation, and careful tuning of quantization parameters is necessary to balance the trade-off between model efficiency and accuracy.

Let's code out a simple example using facebook/mbart-large-50-many-to-many-mmt model. This model developed by Facebook can easily Translate to 50 languages from any of its supported base languages. It has over 611 Million parameters. To magnify the efforts of each of the following optimizations, we will be running them on the CPU, but also sharing statistics of their GPU counterparts.

We can easily initiate the model by using the HuggingFace Transformers Library.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

As well as other imports we may need from PyTorch

import torch
import torch.quantization

We have a choice on how much we want to quantize the model. This can range from float32(basically no change since models are usually stored in Float32) to int1 (which is not available on PyTorch Quantization, but was talked about extensively in this paper by Microsoft Research). The options available on Pyorch are:

torch.quint8
torch.qint8
torch.qint32
torch.float16

Out of these, torch.float16 would perform the 'worst' while torch.quint8 will anecdotally perform the best. Let us translate from Chinese to English with a relatively complex phrase picked up from Byjus.

article_ch = '印度是一片美麗的土地，擁有多種野生動物和豐富的文化多樣性。孟加拉虎被認為是印度的國獸。印度每年 8 月 15 日慶祝獨立紀念日。人們慶祝這個節日是為了紀念印度從英國統治下獲得自由。三色國旗稱為“Tiranga”，由藏紅花、白色和綠色設計，國旗中央為海軍藍色的阿肖克脈輪。 「阿育王獅都」是該國的國徽。國家座右銘是 "Satyameva Jayate"，意思是只有真理才能獲勝。 為了順利管理國家，並使其成為一個獨立的國家，需要一部於1950年1月26日生效的憲法。 印度是一個擁有多種不同語言和多種宗教的國家，如佛教、耆那教、伊斯蘭教、印度教等。 。'

After running the quantized and non-optimized model we see the following differences in Translation with their approximate time of execution below.

quantized_model = torch.quantization.quantize_dynamic(
    model, dtype=torch.qint8
)
tokenizer.src_lang = "zh_CN"

encoded_ch = tokenizer(article_ch, return_tensors="pt")

generated_tokens = quantized_model.generate(
    **encoded_ch,
    forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"]
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

OUTPUT:

India is a beautiful land with a wide variety of wildlife and rich cultural diversity. The Bengal tiger is considered to be India's national beast. India celebrates Independence Day on August 15, every year. It is celebrated to commemorate India's liberation from British rule. The three-coloured flag is called "Tiranga", designed with Tibetan red flowers, white and green, and the flag is centered on the Navy's blue Ashok Ring. The lion is the national emblem of the country. The right wing of the flag is "Satyameva Jayate", meaning that only truth can prevail. In order to successfully govern the country and make it an independent country, it is necessary to have a constitution in force on January 26, 1950. India is a country with a wide variety of languages and religions, such as Buddhism, Jainism, Islam, Hinduism,

Inference Time: 26.802 sec

encoded_ch = tokenizer(article_ch, return_tensors="pt")
generated_tokens = model.generate(
    **encoded_ch,
    forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"]
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

OUTPUT:

India is a beautiful land with a wide variety of wildlife and rich cultural diversity. The Bengal tiger is considered to be India's national animal. India celebrates Independence Day on August 15 every year. It is celebrated to commemorate India's liberation from British rule. The three-coloured flag is called "Tiranga", designed with Tibetan red flowers, white and green, with the flag centered on the Navy's blue Ashok Ring. "Ayurveda Lion" is the national emblem of the country. The country's right-hand inscription is "Satyameva Jayate", meaning that only truth can prevail. In order to successfully govern the country and make it an independent country, it is necessary to have a constitution that entered into force on January 26, 1950. India is a country with a wide variety of languages and religions, such as Buddhism, Jainism,

Inference Time: 99.025

We can see with a simple 2 lines of additional code, we have generated an improvement of 3.69 (nice!) with little loss to the end result. The final output of both LLMs are not identical to the initial input string, but we can chalk that up to Google Translate not being the best at what it does.

Optimum by Hugging Face

Optimum is an open-source library developed by Hugging Face. It leverages various optimization techniques, such as quantization, pruning, and knowledge distillation. Optimum enables developers to reduce the computational requirements and memory usage of their models, making them more efficient and deployable on resource-constrained devices. Since its release, Optimum has gained immense popularity within the machine learning community, with thousands of stars on GitHub and widespread adoption in industries such as computer vision, natural language processing, and autonomous driving. Its popularity can be attributed to its ease of use, flexibility, and the significant performance improvements it offers, making it an essential tool for anyone looking to deploy AI models in real-world applications. By providing a simple and standardized way to optimize models, Optimum has enabled developers to focus on building innovative applications rather than worrying about the underlying infrastructure associated with machine learning tasks.

A key part of using Optimum would be converting the model to ONNX(Open Neural Network Exchange). ONNX is an open format used to represent deep learning models, allowing them to be exchanged and executed across different frameworks and platforms. Developed by Microsoft, Amazon, and Facebook, ONNX provides a common language for AI models, enabling seamless interoperability between various deep learning frameworks, such as TensorFlow, PyTorch etc. This open standard enables developers to train models in one framework and deploy them in another, without the need for retraining or rewriting the model.

Out of the gate, Optimum allows us to either pragmatically use its interface or navigate through via its CLI. We will be using the CLI in this example. We will be attempting to reduce the inference time on a well known summarization model, i.e. t5-small developed by Google AI.

Start by downloading the required libraries

pip install optimum[onnxruntime-gpu]
pip install optimum[onnxruntime]

Now using the optimum-cli we can optimize the model on 4 levels:

O1 basic general optimizations
O2 basic and extended general optimizations, transformers-specific fusions
O3 same as O2 with GELU approximation
O4 same as O3 with mixed precision (fp16, GPU-only)

optimum-cli export onnx --model t5-small --optimize O3 t5_onnx/ --device cuda

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
import torch

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')
onnx_model = ORTModelForSeq2SeqLM.from_pretrained("t5_onnx")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
_ = model.to(device).eval()

import random
import time

sentences = [
    "In recent years, advancements in artificial intelligence (AI) have revolutionized various industries, from healthcare to finance and beyond. AI technologies such as machine learning and natural language processing have enabled computers to perform tasks that were once thought to be exclusive to human intelligence. For instance, AI-powered systems can now diagnose diseases, predict stock market trends, and even generate creative content like music and art. These developments have sparked both excitement and concern among experts and the general public. While AI offers immense potential for improving efficiency and solving complex problems, there are also fears about its impact on jobs, privacy, and ethical considerations surrounding its use.",
    "The rise of renewable energy sources, such as solar and wind power, has gained significant momentum in recent years as the world seeks to address climate change and reduce reliance on fossil fuels. Governments, businesses, and individuals are increasingly investing in renewable energy infrastructure and technologies to transition towards a more sustainable energy system. Solar photovoltaic (PV) panels and wind turbines have become common sights in many parts of the world, harnessing the power of sunlight and wind to generate electricity. This shift towards renewable energy is not only driven by environmental concerns but also by economic factors, as the cost of renewable energy technologies continues to decline, making them increasingly competitive with traditional energy sources.",
    "The internet has transformed the way we communicate, access information, and conduct business on a global scale. With the proliferation of smartphones and high-speed internet connections, people are more connected than ever before, allowing for instant communication and collaboration across geographical boundaries. Social media platforms have become central hubs for sharing ideas, connecting with friends and family, and consuming news and entertainment. E-commerce has also experienced exponential growth, with online shopping becoming a convenient and preferred method for many consumers. However, along with the benefits of connectivity come challenges such as cybersecurity threats, online privacy concerns, and the spread of misinformation. As the internet continues to evolve, it remains crucial for individuals, businesses, and policymakers to address these issues while harnessing the full potential of digital technology.",
]

len_dataset = 1

texts = []
for _ in range(len_dataset):
    n_times = random.randint(1, 5)
    texts.append(" ".join(random.choice(sentences) for _ in range(n_times)))

summarization = pipeline("summarization", model=model, tokenizer=tokenizer, max_length= 100)
start = time.time()
print(summarization(texts))
end = time.time()
print(f"Average response time for original T5: {(end-start)/len_dataset} ms")

OUTPUT:

'in recent years, advancements in artificial intelligence (AI) have revolutionized various industries, from healthcare to finance and beyond . AI-powered systems can now diagnose diseases, predict stock market trends, and even generate creative content like music and art .'

Inference Time: 3.17 s

onnx_summarization = pipeline("summarization", model=onnx_model, tokenizer=tokenizer, max_length=100)
start = time.time()
print(onnx_summarization(texts))
end = time.time()
print(f"Average response time for optimized onnx T5: {(end-start)/len_dataset} ms")

Inference Time: 1.17 s

The Same inference at 2.71 times the speed. Imaging what would happen if we Quantize the model too. That might be a bit out of scope for this article, but we could have a theoretical speedup of 9.99. Almost 10x of the base speed. Although both models we used were inherently different and this may not be a fair method of judging the speed of the LLM.

Static vs Dynamic Quantization

As discussed previously, quantization is a process in machine learning and deep learning that reduces the precision of a model's weights and activations from floating-point numbers to integers. This is done to reduce the memory footprint and computational requirements of the model, making it more efficient and suitable for deployment on resource-constrained devices.

There are two types of quantization: static quantization and dynamic quantization:

Static Quantization

In static quantization, the quantization parameters (such as the scale and zero-point) are determined during the training process or during a separate calibration step. The model is then quantized using these fixed parameters, and the resulting quantized model is used for inference.

Faster inference: Since the quantization parameters are fixed, the inference process is faster and more efficient.
Lower memory usage: The quantized model requires less memory, making it suitable for deployment on devices with limited memory.

Dynamic Quantization

In dynamic quantization, the quantization parameters are determined dynamically during inference, based on the input data. This means that the model adapts to the input distribution and adjusts the quantization parameters accordingly.

Improved accuracy: Dynamic quantization can adapt to changing input distributions, leading to improved accuracy and reduced accuracy loss.

Flexibility: Dynamic quantization can be used on different hardware platforms and with different input distributions, without requiring retraining or recalibration.

Pre Training Quantization

Pre-training quantization, also known as quantization-aware training, involves quantizing the model's weights and activations during the training process. This means that the model is trained using quantized values, rather than full-precision floating-point numbers.

Advantages:

Improved accuracy: Pre-training quantization can lead to improved accuracy, as the model is trained to adapt to the quantization noise and errors.
Better optimization: The model is optimized for the quantized precision, which can lead to better convergence and optimization.
Faster deployment: Since the model is already quantized, it can be deployed directly on hardware that supports quantized inference, without the need for additional quantization steps.

Challenges:

Training complexity: Pre-training quantization can increase the training complexity, as the model needs to adapt to the quantization noise and errors.
Hyperparameter tuning: Hyperparameter tuning can be more challenging, as the optimal hyperparameters may vary depending on the quantization precision.

Post Training Quantization

Post-training quantization, also known as quantization after training, involves quantizing a pre-trained model's weights and activations after the training process is complete. This is a more common approach, as it allows for the use of pre-trained models and fine-tuning them for specific hardware platforms.

Advantages:

Flexibility: Post-training quantization allows for the use of pre-trained models, which can be fine-tuned for specific hardware platforms.
Simpler deployment: Post-training quantization is a simpler process, as it only requires quantizing the pre-trained model's weights and activations.
Wider applicability: Post-training quantization can be applied to a wide range of models and hardware platforms.

Challenges:

Accuracy loss: Post-training quantization can result in accuracy loss, as the model is not optimized for the quantized precision.
Calibration required: Post-training quantization often requires calibration to determine the optimal quantization parameters, which can be time-consuming.

In general, pre-training quantization can lead to improved accuracy, with models like MobileNetV2 achieving an accuracy of 72.0% on the ImageNet benchmark, while reducing the model size by 75%. On the other hand, post-training quantization can offer significant space savings, with models like ResNet-50 requiring only 7.5MB of storage space, a reduction of 90% compared to the full-precision model. With the post-training quantized model achieving an accuracy of 69.5% on the same benchmark. Despite this, post-training quantization can still be a viable option for many applications, especially those where memory constraints are a major concern.

Conclusion

In conclusion, optimizing inference time is crucial for making AI systems more accessible and sustainable. We explored techniques to speed up PyTorch models without sacrificing accuracy, including quantization, Optimum, and static vs dynamic quantization, demonstrating significant reductions in model size, inference time, and energy consumption. As AI continues to evolve, optimizing inference time will become increasingly important, and by leveraging these techniques, developers can build more efficient and deployable AI models, making AI more accessible and sustainable for a wider range of applications.

My Journey as a Developer - DevRetro 2023

Aryan Garg — Sat, 23 Dec 2023 07:14:06 GMT

Hello World 🤖

Hey folks, I've been putting off this article for what feels like an eternity—blame it on the whirlwind of University exams and then a generous sprinkle of holiday season lethargy. Another year of college is officially in the rearview mirror, and boy, do I have many more tales to spin compared to last year's Dev Retro. Let's kick things off by dissecting the grand plans I had envisioned for this year, contrasting them with the reality that unfolded. I'll then take you on a rollercoaster ride through the major events that peppered my journey in the tech realm this past year, capping it all off with some crystal ball gazing into what I predict awaits in the upcoming year. Buckle up, enjoy the read, and catch you on the flip side!

Dissecting My Predictions from Last Year

I'm going to be taking excerpts from last year's Dev Retro and trying to justify them a bit. Reading through my old Dev Retro did make me cringe a bit, but it's nice to see that I've grown both in my Technical skills as well as my Vocabulary skills (*wink* wink ChatGPT)

I thought I'd write last year's Dev Retro as a Bride prepares for an English Wedding, "Something Borrowed; Something Blue; Something Old; Something New," but looking back, that wasn't it. The titles seem outdated and inconsistent with the blog's core message.

Technical, this year was the first year I started actual development, from tiny HTML Pages using the Basics of CSS to Dynamically Loaded Pages using the popular Framework Django

I quit Web Development probably 5 seconds after that blog went out. It wasn't for me. I get so bored writing out the most basic logic for no reason.

Starting with Data Science and understanding the math behind it, this year has allowed me to start growing and exploring

There we go, that's more like it! Let's Go Data ~~Science.~~ I fell in love with data this year. Starting with Data Science and the ETL process, to Data Engineering, where I was able to Intern at one of India's Biggest FinTech companies, and finally, settling on a more mixed role in Data Engineering and Machine Learning (DL).

Developing simple Dart apps using Flutter and understanding how DApps work in the Solidity Framework made me realize what the future of Web3 might look like

Quit after 5 seconds. NEXT

.. solving some (easy to medium difficulty) Data Structures and Algorithm problems.

I wish I'd continued to do that. I wouldn't be scrambling for a summer Internship for the summer of 2023 (Resume Plug, just in case you're a recruiter)

Maybe not the most significant flex, but I also got featured on Hashnode's Twitter account!

Yup, it's the most considerable flex of the 2020s to date. Allow me to embed it again ;)

https://twitter.com/hashnode/status/1561362487014002692

Yup, that's enough of a diss of 2022 Aryan. Snapping back to reality, the following section focuses on what I accomplished this year.

Hop into the Way Back Machine

Makeathon 5: Organising it all

If you don't know, I'm the Joint Secretary of the Microsoft Learn Student Chapter present at my university. Still, this story predates that when I was just an ordinary "Core Member" (basically a glorified Grunt Worker). Anyway, we plan the most elaborate Hackathon in Punjab, where we Invite special guests and speakers and end it all with a 24-hour hackathon. What made this year special? I got to interview Mr. Mr. Mr. Richard Stallman (I hope someone gets that joke). It was amazing; a few friends and I got the unbelievable opportunity of a lifetime to interview the man who created GNU. It was a total fanboy moment, and I don't have words to express how amazing that one hour was for me. Unfortunately, Mr. Stallman got diagnosed with Cancer this year, and you can see it on his face in the image below. His contributions to Free and open source make the world we live in today.

Here are a few more pictures from Makeathon, just because (sorry for the huge photo)

Interning at Paytm

You read that right; I got the unholy opportunity to work at Paytm over the summer. I worked on an in-house SQL Optimizer for the Paytm Business Analysts in the Data Engineering Department. It was a stint for two months, and I learned a lot about the underlying processing of SQL because of the Internship. (Thanks, Vikash and Anand, for making it a fantastic learning opportunity)

Visiting PyDelhi and (almost giving a Lightning Talk)

PyDelhi was super fun. I got in super cheap because of the student discount and spent the two days in hostels to save on Hotels. I learned a lot and got decent Swags from the Sponsors ;). I also got to explore Delhi for the first time in years.

Notice how my card doesn't have my name on it? They forgot to get my card printed on the first day, and I got it on Day 2. Still, it was a fantastic opportunity to learn from amazing people. (Plug to Vipul Gupta (X); he had a fantastic talk)

Internship No. 2: Researching for DST, Haryana

I jumped when I saw a call for students for a research role for a project sponsored by the Department of Science and Technology, Haryana. I gave the interview a day early (that's how anxious I was) and got shortlisted. We're working on a way to regulate type 1 diabetes in patients using ML, but apart from that, I can't dive into the details just yet.

SIH: So close, yet so far

Me, a classmate, her friend, and 3 of our seniors had teamed up for the Smart India Hackathon. The problem statement we chose to ideate on was about providing feedback to the government about its actions via social media, newspapers, e-newspapers, and articles (including YouTube) from the web. We worked on a prototype that used nontraditional Machine Learning applications like BERT and Whisper and presented our pitch deck. Unfortunately, we got put on the Waitlist for our Problem statement, and no other team backed out. It was a quick month, but we worked a lot. Cheers to SPAARS :)

Working on Next Year

As for what's next, considering my love affair with data, diving into an open-source project in data science or machine learning sounds like a plan, and maybe, just maybe, organizing a tech event or workshop at my university to share the knowledge. Here's to more tech adventures in the coming year!

Why Postgres should be the last database you'll ever need

Aryan Garg — Thu, 12 Oct 2023 11:02:46 GMT

Being a sucker for reading unnecessary books in fields I have no experience in got me into flipping through the Google Site Reliability Engineering book, where I had read the most elegant concept that seems obvious at first but isn't applied in the real world very often. It read

Simple software breaks less often and is easier and faster to fix when it does break

- Google's Site Reliability Engineering Book

Thinking about this concept and how companies leverage a vast number of technologies to build their data-driven infrastructure, and with my limited experience using ~~the best~~ SQL-based database platform, I got the gears turning in my head. In a world where everyone uses Redis as an in-memory Cache, where Apache Kafka is the defacto real-time message queue, and where time series data has its own database, it seems a bit redundant because, in this limited example, we still need to be experts in all three. In this article, given the acid-compliant nature of Postgres, I'll focus on replacing all three problems with simple Postgres-based tables.

Terminating Timescale

Replacing Timescale, the time-series database, with PostgreSQL involves leveraging the powerful features of PostgreSQL to handle time-series data efficiently. In PostgreSQL, the table layout is crucial in achieving optimal performance. Instead of relying on Timescale's hypertables, specialized tables for time-series data, you can create a regular PostgreSQL table with a timestamp column. Indexing this timestamp column is essential for quick data retrieval.

-- Step 1: Creating a Table
CREATE TABLE time_series_data (
    id SERIAL PRIMARY KEY,
    event_timestamp TIMESTAMPTZ NOT NULL,
    data_value DOUBLE PRECISION NOT NULL,
    -- Add other necessary columns as per your data requirements
);

-- Step 2: Indexing on Time Stamp
CREATE INDEX idx_event_timestamp ON time_series_data (event_timestamp);

To maintain the advantages of partitioning in Timescale, you can use PostgreSQL's table partitioning feature. This involves creating child tables inherited from a master table, each handling a specific time range. Proper indexing on these child tables ensures that queries for a particular period are executed swiftly.

-- Step 3: Partition the master table
CREATE TABLE time_series_data_2023 PARTITION OF time_series_data
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

Similarly, you might want to alter your configuration settings to better align with how Timescale operates. This might mean changing some of the default options given below:

Shared Buffer: 25% to 30% of available memory
Effective Cache Size: 50% to 75% of available memory
Working Memory: Adjust based on the complexity of queries and available memory
Maintenance Working Memory: Sufficient for maintenance operations like index creation
Write Ahead Log: Set to replica or logical for better performance
Maximum and Minimum WaL Value: Adjust based on the write intensity and available disk space.
Checkpoint Completion Target: Aim to balance write performance and checkpoint duration.
Auto Vacuum: Enable and configure autovacuum settings for automatic maintenance.

You can alter these via the following generic DML command
```
 ALTER SYSTEM SET shared_buffers = '2GB';
```

Killing Kafka with a Message Queue

A queue in its most basic essence is JUST A QUEUE, a data structure that follows the FIFO (First-in, First-Out) rule. That means that new data is added to the tail of the queue, and data is read from the head. So, a straightforward implementation will only deal with enqueuing messages to the tail and dequeuing from the head. The table would have the following schema:

-- Step 1: Create the Schema
CREATE TABLE queue_table (
    id UUID PRIMARY KEY,
    inserted_at TIMESTAMP NOT NULL DEFAULT NOW(),
    message_payload BLOB
);

-- Step 2: Index the inserted_at column on its sorted values
CREATE INDEX inserted_at_idx
    ON queue_table (inserted_at ASC);

Now, to insert the data into the queue is as simple as inserting data into the table, and to receive and delete, we can delete the data and return them.

-- Adding to Queue
INSERT INTO queue_table (id, inserted_at, message_payload)
    VALUES (gen_random_uuid(), NOW(), RAWTOHEX('top secret information'));

-- Returning the Data
DELETE FROM queue_table qt
WHERE qt.id =
    (SELECT qt_inner.id
    FROM queue_table qt_inner
    ORDER BY qt_inner.inserted_at ASC
    FOR UPDATE SKIP LOCKED
    LIMIT 1)
RETURNING qt.id, qt.inserted_at, qt.message_payload;

Although not the most optimal, it can still comfortably do a few thousand transactions per second.

Redundant Redis

In PostgreSQL, you can create a cache table with columns such as key and corresponding value pair to store the cached data. We will also require a timestamp to tell when the content was added. Indexing the key column ensures fast retrieval of cached values. To emulate Redis's in-memory performance, consider adjusting PostgreSQL's configuration settings. Increase shared_buffers to allocate more memory for caching and adjust effective_cache_size accordingly. Additionally, configure work_mem to optimize memory usage during query execution. While PostgreSQL may not match Redis in pure in-memory caching speed, its versatility and integration capabilities make it a compelling alternative. Implementing

this in Postgres would involve the following table creation and modification commands.

-- Step 1: Create table to cache results
CREATE TABLE redis_type_cache (
    _key TEXT NOT NULL,
    _value TEXT NOT NULL,
    inserted_at TIMESTAMP NOT NULL DEFAULT NOW()
);

-- Step 2: Create an index on _KEY
CREATE INDEX redis_type_cache_key ON redis_type_cache USING HASH (_key);

Note that this will keep writing to the cache infinitely, which will hog all your memory. To solve this, we can use a corn job to remove old records by following the steps given below:

CREATE OR REPLACE FUNCTION delete_old_rows()
RETURNS VOID AS $$
BEGIN
    DELETE FROM redis_type_cache
    WHERE inserted_at < NOW() - INTERVAL '36 hours';
END;
$$ LANGUAGE plpgsql;


PG_HOST="your_host"
PG_DATABASE="your_database"
PG_USER="your_user"
PG_PASSWORD="your_password"  # Consider using a more secure method for password handling

psql -h $PG_HOST -d $PG_DATABASE -U $PG_USER -c "SELECT delete_old_rows();" -W $PG_PASSWORD
# save file as delete_row.sh


> chmod +x delete_old_rows.sh  # make the file executable
> crontab -e

# add the following to crontab so that the file runs everyday at 6pm
0 18 * * * delete_row.sh

Conclusion

In conclusion, the idea of simplifying data infrastructure by leveraging the power of a versatile and reliable database like PostgreSQL is compelling. By replacing specialized tools like Timescale, Kafka, and Redis with well-designed PostgreSQL tables, you can achieve simplicity, robustness, and ease of maintenance.

For time-series data, the approach involves creating regular PostgreSQL tables with proper indexing and partitioning for efficient data retrieval. In the realm of message queues, a basic FIFO queue can be implemented using a simple table structure. This approach eliminates the need for Apache Kafka, offering a straightforward and efficient way to handle message queuing directly within PostgreSQL. Even for in-memory caching, PostgreSQL can serve as a capable alternative to Redis. By creating a cache table with appropriate indexing and periodic cleanup processes, you can achieve caching functionality within the same database that handles other aspects of your data.

The key takeaway is that simplicity often leads to reliability. A consolidated approach using PostgreSQL not only simplifies the technology stack but also makes it easier to manage and maintain the entire data infrastructure. That being said, there can be no true alternatives to the dependencies mentioned in the article, because of their accepted use in corportations, but it was fun thinking that in an alternative universe, everything could just be Postgres. No doubt, if you need to scale your infrastructure to the level of Google or Microsoft, where you deal in petabytes of data each week, these expert technologies are optimised for results, but if you are just starting our or creating a hobby project, a simple Postgres Database isn't all that bad.

The Subtle Art of Story-Telling Using Tableau

Aryan Garg — Thu, 13 Jul 2023 03:55:01 GMT

Data tells us a story no author could ever compose. It shows us never before observed patterns that may slip through the crack. The power of data analytics is more important than ever in the rapid-paced market, where the slightest difference is enough to save or cost a company millions of dollars in revenue. This high level of data analytics was often hidden behind layers of complex programming languages and frameworks. Still, since Tableau released its product in 2003, it has helped thousands of companies visualize billions of rows worth of data.

Tableau is a powerful data visualization and business intelligence tool that allows users to analyze and present data visually, engaging, and interactively. With a user-friendly interface, Tableau enables individuals and organizations to easily connect to various data sources, whether spreadsheets, databases, or cloud services. It offers a wide range of visualization options, including charts, graphs, maps, and dashboards, which enables users to explore data from different angles and gain valuable insights. Tableau and its lesser-known associate, Tableau Prep, provide a low-code application to import, clean, and optimize your data sources from a central data lake before using them for visualizations. In this article, we will discuss an example dataset I cleaned using Tableau Prep and then visualized using various KPIs and graphs on Tableau Desktop.

Getting to know the data

This specific dataset is a collection of datasets I found on data.gov.in. It contains data about the percent distribution and absolute number of foreign individuals that entered the country in various years (2001 - 2020), the amount of money (USD and INR) spent by foreign visitors, and area-specific domestic and foreign foot traffic. Samples of the data have been provided below, but you can download the data from these sources [1][2][3].

Year	FTAs	% distribution by Age- Group (in years) - 0-14	% distribution by Age- Group (in years) - 15-24	% distribution by Age- Group (in years) - 25-34	% distribution by Age- Group (in years) - 35-44	% distribution by Age- Group (in years) - 45-54	% distribution by Age- Group (in years) - 55-64	% distribution by Age- Group (in years) - 65 & above	% distribution by Age- Group (in years) - Not Reported
2001	2537282	7	10.8	20.1	21.1	19.4	11.9	6.7	3
2002	2384364	9.2	10	19.4	21.6	19.4	11.5	7.7	1.2
2003	2726214	7.2	10	19.5	21.6	19.4	11.5	7.7	3.1
2004	3457477	8.5	9.8	18.8	21.3	19.4	12.8	8.2	0.2
2005	3918610	8.6	9.6	18.8	21.3	19.5	13	8.7	0.5

Circle	Name of the Monument	Domestic-2019-20	Foreign-2019-20	Domestic-2020-21	Foreign-2020-21	% Growth 2021-21/2019-20-Domestic	% Growth 2021-21/2019-20-Foreign
Agra	Taj Mahal	4429710	645415	1259892	9034	-71.56	-98.6
Agra	Agra Fort	1627154	386522	371242	2810	-77.18	-99.27
Agra	Fatehpur Sikri	454376	184751	107835	574	-76.27	-99.69
Agra	Akbar Tomb Sikandra	229270	19625	99509	321	-56.6	-98.36
Agra	Mariam tomb Sikandra	22517	414	9765	31	-56.63	-92.51

Year	FEE in `terms -`Crore	FEE in ` terms - % Change over previous year	FEE in US$ terms - US $ Million	FEE in US$ terms - % Change over previous year
1991	4318	NA	1861	NA
2001	15083	-3.5	3198	-7.6
2002	15064	-0.1	3103	-3
2003	20729	37.6	4463	43.8
2004	27944	34.8	6170	38.2

Data Cleaning 🧹

After loading the data, the first step of any Data Visualisation Project is to clean it so that your visualizations can be neat and convey all the relevant information you extract. Of course, this is possible using Python and accessory modules like Pandas and Numpy, but Tableau Prep provides a low/no-code experience. The most you'll ever code is when writing basic SQL queries. Our complete "Data Cleaning Pipeline" is strictly no-code and, in its entirety, can be seen below.

Here, I've labeled each step to understand better what it does. Still, the basic gist includes renaming columns to more accurately portray their meaning, Altering and regrouping these columns to negate outliers in data better, and then joining the two data sources (via Inner Join) to get our final output. Below we can see one of the two final data sources.

Year	FTAs	Age % 0-14	Age % 15-24	Age % 25-34	Age % 35-44	Age % 45-54	Age % 55-64	Age % 65+	Age % Not Reported	FEE in INR Crore	FEE in % Change over previous year (INR)	FEE in US $ Million	FEE in % Change over previous year (US$)
1/1/2001	2537282	0.07	0.108	0.201	0.211	0.194	0.119	0.067	0.03	15083	-0.035	3198	-0.076
1/1/2002	2384364	0.092	0.1	0.194	0.216	0.194	0.115	0.077	0.012	15064	-0.001	3103	-0.03
1/1/2003	2726214	0.072	0.1	0.195	0.216	0.194	0.115	0.077	0.031	20729	0.376	4463	0.438
1/1/2004	3457477	0.085	0.098	0.188	0.213	0.194	0.128	0.082	0.002	27944	0.348	6170	0.382
1/1/2005	3918610	0.086	0.096	0.188	0.213	0.195	0.13	0.087	0.005	33123	0.185	7493	0.214

Time to Visualize, Visualize, Visualize 📊

Quoting Daniel Bourke, a personal hero, let's begin visualizing the data we just created. Luckily, Tableau Prep extracts can be opened directly into Tableau Desktop as a .hyper, .csv or a .xlsx file. Here we will also use our second data source, available as download file 2. Getting straight to the point, we see all our data sources and relevant column names on the left-hand pane after we import our data sources.

The names in blue are known as discrete values, while the ones in green are known as continuous values. More information can be found in this article by Tableau, but to explain with a table:

Feature	Blue Fields	Green Fields
Data type	Discrete	Continuous
How data is displayed	Headers	Axes
Examples	State, Country, Product Name	Sales, Profit, Weight

On the right of the pane, we see our workspace, where we can drag and drop our columns to create KPIs, graphs, and dashboards. I won't be going through how to make every KPI or visualization on Tableau, but we'll construct basic graphs based on the available measures and dimensions. Below are some of the more interesting plots.

Something interesting I found was the year-on-year growth for 2001-2019, but because of the COVID-19 Pandemic, we can see the money spent in 2020 was equivalent to 2008, a 12-year deficit.

Even though Agra is 4th in terms of the number of monuments, it is the city where the most amount of foreign income is generated (because of the Taj Mehal and surrounding Monuments)

It's shocking that even though Mumbai has the highest number of monuments, its gross income from foreign and domestic tourists places it close to the middle of the total rankings.

It isn't surprising to see how strong a hold the Taj Mahal has compared to other monuments in terms of International and Domestic earnings. It is about 20% of the international income from tourism.

Final Thoughts

Tableau is a fine piece of software that makes data visualizations easy to make and, with its interactive menus, ensures that little to no code is required to complete the toughest visualizations. From simple bar graphs to parsing GeoData via coordinates or location names, Tableau can speed up the data analyzing task. It even provides ways of importing your data from Google BigQuery or Amazon Redshift. But it does lack the satisfaction of coding, which I severely missed while working on this project. The complete data visualization can be found here on Tableau Public.

Mojo Programming Language: The Future of Data Science?

Aryan Garg — Sun, 14 May 2023 02:30:39 GMT

In recent years, the field of data science has exploded in popularity. With the ever-increasing amount of data being generated, there is a growing demand for professionals who can collect, analyze, and interpret this data. However, one of data scientists' most significant challenges is the lack of a suitable programming language.

Python is the most popular programming language for data science, but it has some significant limitations. For example, Python is inefficient, making it difficult to train large machine-learning models. Additionally, Python is unsuited for writing low-level code, often necessary for working with AI hardware.

This is where Mojo comes in. Mojo is a new programming language that was designed specifically for data science. It combines Python's usability with C's performance, making it the ideal language for developing and deploying AI applications.

Features of Mojo

Performance: Mojo is up to 35,000 times faster than Python, making it possible to train large machine-learning models in a fraction of the time.
Flexibility: Mojo is a general-purpose programming language that can be used for various tasks.
Ease of use: Mojo has a clean syntax that is easy to learn and use.
Community support: Mojo has a strong community of developers constantly adding new features and improvements.

Why Mojo Might be a paradigm shift in data science

Mojo has the potential to revolutionize the field of data science by providing a powerful and flexible programming language that is well-suited for developing and deploying AI applications. With its speed, efficiency, and ease of use, Mojo can help data scientists to be more productive and to create more powerful AI models.

Current Drawbacks

Mojo is still under development: Mojo is a relatively new programming language, and it is still under development. This means there may be some bugs or limitations that have not yet been addressed.
Mojo is not as widely adopted as Python: Mojo is not as widely adopted as Python, which means that fewer resources may be available for learning and using the language.
Mojo is not as well-suited for some tasks as Python: Mojo is a general-purpose programming language but not as well-suited for some tasks as Python. For example, Mojo is not as good at writing web applications as Python.

Where you can check out the language

The Mojo programming language is still under development but is available for preview on the Modular website. To learn more about Mojo, visit the Modular website or join the Mojo community on Discord.

Dev Retro 2022: Begining my Journey into Development

Aryan Garg — Sat, 07 Jan 2023 10:23:28 GMT

Introduction

What do Google, YouTube, Spotify, Reddit, Apple, and Snapchat have in Common? Except for the fact that they are multi-billion dollar tech giants whose algorithms know us better than our own family. They all gave us personal "stat cards" about the year, from This Year in Search by Google to Wrapped by Spotify. Since this is my first year of blogging, I could only publish 13 articles. Next year, I hope to at least double that, if not triple the number, in the mid-30s! So here is my attempt at HashNode's Iconic New (soon-to-be) Ritual of Dev Retro 2022!

Learning Something New 💡

The year started unlike any year whatsoever. After almost a year of using protective gear and sanitizing my hands, COVID finally got to me, and I was stuck in COVID Isolation for the first couple of days of the New Year. That's when I learned the art of taking things slow; instead of rushing into it, I took it slow and processed every step of the journey, from medication to recovery to even post-COVID symptoms. Moving on to be more Technical, this year was the first year I started actual development, from tiny HTML Pages using the Basics of CSS to Dynamically Loaded Pages using the popular Framework Django for the backend, from creating a Discord bot for fun to creating a Discord Bot for a server with over 12,000 members. Starting with Data Science and understanding the math behind it, this year has allowed me to start growing and exploring. It allowed me to branch out and get a general lay of the land. Developing simple Dart apps using Flutter and understanding how DApps work in the Solidity Framework made me realize what the future of Web3 might look like. I completed my first year of college, which was a well-welcomed change. Meeting new people allowed me to broaden my horizons and ultimately learn even more than ever before. Maybe not the most significant flex, but I also got featured on Hashnode's Twitter account!

https://twitter.com/hashnode/status/1561362487014002692

Improving on Something Old 💹

"What skills did I bring into 2022?" was the first real question I had while writing this part of the article. Thinking about it, I came into the year with no fundamental skills. Knew a bit of Python from the Good Ol' School days. But The first year of college helped broaden my horizons on what domains I've now been able to discover. From the first Hello World in C to now solving some (easy to medium difficulty) Data Structures and Algorithm problems. Although I haven't started doing Leetcode, understanding the steps necessary to solve a problem is always the first step. I understood the Dev cycle and ideated on projects that could solve the future's problems.

Ending it on a Good Note 🙌🏻

Of course, this wouldn't have been possible if I hadn't discovered the concept of Technical Writing initially to apply for Microsoft Student Ambassador Program. Still, I ultimately decided not to apply. Hopefully, another year filled with tech and, this time, improving my skills to make the best out of myself! 2022 was full of learning, while 2023 will be filled with understanding and deploying ;)

Penalties and The World Cup ⚽

Aryan Garg — Mon, 19 Dec 2022 05:58:08 GMT

Introduction and Inspiration 💡

First of all, Welcome back to Technical Speaking. It's been a while (three months, to be exact), and while I was busy with Uni and didn't get a chance to write, I'm still writing this blog while my End Semester Exams are going on. Not the brightest idea, but It's also World Cup season, and rules are meant to be broken this Holiday.

Getting into the real tech here, I found this excellent video by Vox Media, where they used a dataset of all the penalty shootouts in modern World Cup History (1982-Present). The dataset contained relevant data from the 1982 World cup in Spain to the 2018 World Cup in Russia. I thought of taking it a step further by adding the data for the 2022 World Cup in Qatar. Currently, the Quarter Finals are underway, and I'm watching Argentina decimate the Dutch 1-0 (watch me regret these words later).

Tech Stack and Code 👨🏻‍💻

I tried using plotly for my graphs, this time since Matplotlib graphs are a bit stale. Partially since a few Kaggle Graphs were already pre-written in plotly, and well *Hippady Hoppady, your code is now my property*.

Enough chatter; let's dive deep into the code ;)

Note: Even though this article may appear to be written in the past (because it was), all the graphs are Up-to-date as of the 18th of December, 2022 after the final.

import pandas
import pandas as pd
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
import base64
init_notebook_mode()

df = pd.read_csv('WorldCupShootouts.csv')
print("CSV file Loaded")

Just your standard importing of Libraries as well as creating the initial DataFrame from the CSV

Bar Graphs 🤓

Displaying the DataFrame, we can see over 330 Spot Kicks spread over 40 years, starting from Germany vs. France 2(5)-2(4) in the 1982 Spain World Cup Semi-Finals.

df_country_count = pd.DataFrame(df.dropna().groupby(['Team']).size()).sort_values(by=0,ascending=False).reset_index().rename(columns={0:"Total Penalty Kicks"})
px.bar(df_country_count, x='Team', y="Total Penalty Kicks").show()

Now by graphing a plot between the Team Name and corresponding Penalty kicks, we can see that Argentina has had the most Penalty Opportunities.

And by plotting a similar graph, we can see that Argentina scored the most significant number of penalties.

df_most_goals = pd.DataFrame(df[df.Goal==1].groupby(['Team']).size()).sort_values(by=0, ascending=False).reset_index().rename(
    columns={0: "Total Penalties Scored"})
px.bar(df_most_goals, x='Team', y="Total Penalties Scored").show()

Goals Zones 🥅

Now we're going to look at goal zones and how shots were fired; for this, you will need to understand how zoning works. Here is a small infographic so you can understand how it works ;)

Now that you have a general sense of how the graphs will look let's define the function to help create these beautiful graphs.

def show_shots(df: pandas.DataFrame, x, y, size, size_max, hover_name, hover_data, color, title, image_filename="goal.jpg"):
    init_notebook_mode()
    fig = px.scatter(df,
                 x=x,
                 y=y,  
                 size= size,
                 size_max = size_max,
                 color = color,
                 hover_name = hover_name,
                 hover_data = hover_data,
                 range_x = (0,900),
                 range_y = (581,0),
                 width = 900,
                 height = 581,
                 labels = {x:'', y:''})
    plotly_logo = base64.b64encode(open(image_filename, 'rb').read())
    fig.update_layout(xaxis_showgrid=False, 
                    yaxis_showgrid=False,
                    xaxis_showticklabels=False,
                    yaxis_showticklabels=False,
                    title= title,
                    images= [dict(
                    source='data:image/jpg;base64,{}'.format(plotly_logo.decode()),
                    xref="paper", yref="paper",
                    x=0, y=1,
                    sizex=1, sizey=1,
                    xanchor="left",
                    yanchor="top",
                    sizing = 'stretch',
                    layer="below")])
    iplot(fig)

Now, we will try to determine which zone had the most "On Target" shots using a simple Group By Query in our DataFrame.

shot_coords = {
    1:[216,150],
    2:[448,150],
    3:[680,150],
    4:[216,250],
    5:[448,250],
    6:[680,250],
    7:[216,350],
    8:[448,350],
    9:[680,350]
}

df_target = df[df.OnTarget == 1]

df_target['Zone_x'] = df_target['Zone'].apply(lambda x: shot_coords[int(x)][0])
df_target['Zone_y'] = df_target['Zone'].apply(lambda x: shot_coords[int(x)][1])

df_zone = pd.DataFrame(df_target.groupby(['Zone','Zone_x', 'Zone_y']).size()).reset_index()
df_zone.rename(columns = {0:'Number of Shots'}, inplace= True)

show_shots(df_zone, 'Zone_x', 'Zone_y', 'Number of Shots', 70, 'Zone', ['Zone', 'Number of Shots'], 'Number of Shots', 'Shot Location (On Target Shots)')

Looking at On Target shots can not be all of the Image; let's now look to the other extreme, the zone where the ball could never get on Target.

df_Offtarget = df[df.OnTarget == 0]

df_Offtarget['Zone_x'] = df_Offtarget['Zone'].apply(lambda x: shot_coords[int(x)][0])
df_Offtarget['Zone_y'] = df_Offtarget['Zone'].apply(lambda x: shot_coords[int(x)][1])

df_zone = pd.DataFrame(df_Offtarget.groupby(['Zone','Zone_x', 'Zone_y']).size()).reset_index()
df_zone.rename(columns = {0:'Number of Shots'}, inplace= True)

show_shots(df_zone, 'Zone_x', 'Zone_y', 'Number of Shots', 70, 'Zone', ['Zone', 'Number of Shots'], 'Number of Shots', 'Intended Shot Location (Off Target Shots)')

Oddly enough, most shots are inclined towards zone 7 (shown as a yellow circle here). We can also see that no shots in sectors 5 and 8 were ever missed, probably because they have the least risk of missing the net altogether.

Now looking at goals in absolute numbers, let's figure out which zone was lucky! Which zone had the most goals, as well as which had the least number of successful attempts?

df_zone = pd.DataFrame(df_target.groupby(['Zone','Zone_x', 'Zone_y', 'Goal']).size()).reset_index()
df_zone.rename(columns = {0:'Number of Shots'}, inplace= True)

show_shots(df_zone, 'Zone_x', 'Zone_y', 'Number of Shots', 70, 'Zone', ['Zone', 'Number of Shots'], 'Goal', 'Shot Success by Zone (On Target Shots)')

for i in range(df_zone.shape[0]):
    zone = df_zone.loc[i, 'Zone']
    df_goal = df_zone[df_zone.Zone == zone]
    tot = df_goal['Number of Shots'].sum()
    goal = df_goal[df_goal.Goal == 1.0]['Number of Shots'].sum()
    df_zone.loc[i, 'Success Percentage'] = goal/tot

df_zone = df_zone[df_zone.Goal == 1.0]
show_shots(df_zone, 'Zone_x', 'Zone_y', 'Number of Shots', 70, 'Zone', ['Zone', 'Number of Shots', 'Success Percentage'], 'Success Percentage', 'Shot Success by Zone (On Target Shots)')

We notice what seemed obvious, the highest success rates are in the corners, with the upper right corner having a success rate of 100% (talk about beating the house), while zones 5 (center) and 8 (Lower Middle) have the lowest. Since Zone 7 (Bottom Left) had the most shots, it also had the most saves and goals. Now you know, if you're ever in a world cup match, ALWAYS aim for the corners. The top third is your best chance to net one in.

Enough talk about the shooter. Let's talk keeper statistics. How do we determine the number of times a keeper has chosen a specific side? The answer might be closer than you might think. Just scroll on down!

keeper_coords = {
    'L':[216,250],
    'C':[448,250],
    'R':[680,250],
}

df.dropna(inplace=True)

df.replace('l', 'L', inplace=True)
df['Keeper_x'] = df['Keeper'].apply(lambda x: keeper_coords[x][0])
df['Keeper_y'] = df['Keeper'].apply(lambda x: keeper_coords[x][1])

df_keeper = pd.DataFrame(df.groupby(['Keeper','Keeper_x', 'Keeper_y']).size()).reset_index()
df_keeper.rename(columns = {0:'Number of Shots'}, inplace= True)

show_shots(df_keeper, 'Keeper_x', 'Keeper_y', 'Number of Shots', 70, 'Keeper', ['Keeper', 'Number of Shots'], 'Number of Shots', 'Keeper Location')

Similarly, we can plot where the keeper landed most of the time and how many of those were goals vs. no goals.

keeper_coords = {
    'L':[216,250],
    'C':[448,250],
    'R':[680,250],
}

df.dropna(inplace=True)
df_no_goal = df[df.Goal==0]
df_no_goal.replace('l', 'L', inplace=True)
df_no_goal['Keeper_x'] = df_no_goal['Keeper'].apply(lambda x: keeper_coords[x][0])
df_no_goal['Keeper_y'] = df_no_goal['Keeper'].apply(lambda x: keeper_coords[x][1])

df_no_goal_keeper = pd.DataFrame(df_no_goal.groupby(['Keeper','Keeper_x', 'Keeper_y']).size()).reset_index()
df_no_goal_keeper.rename(columns = {0:'Number of Shots'}, inplace= True)
print(df_no_goal_keeper)

show_shots(df_no_goal_keeper, 'Keeper_x', 'Keeper_y', 'Number of Shots', 70, 'Keeper', ['Keeper', 'Number of Shots'], 'Number of Shots', 'Keeper Location (No Goal)')

keeper_coords = {
    'L':[216,250],
    'C':[448,250],
    'R':[680,250],
}

df.dropna(inplace=True)

df.replace('l', 'L', inplace=True)
df = df[df.Goal == 1]
df['Keeper_x'] = df['Keeper'].apply(lambda x: keeper_coords[x][0])
df['Keeper_y'] = df['Keeper'].apply(lambda x: keeper_coords[x][1])

df_keeper = pd.DataFrame(df.groupby(['Keeper','Keeper_x', 'Keeper_y']).size()).reset_index()
df_keeper.rename(columns = {0:'Number of Shots'}, inplace= True)

show_shots(df_keeper, 'Keeper_x', 'Keeper_y', 'Number of Shots', 70, 'Keeper', ['Keeper', 'Number of Shots'], 'Number of Shots', 'Keeper Location (Goal)')

We can see that the keeper stays in the middle the last time (take the hint) while he saves the most shots on the left side. Top Right is looking enticing now.

Foot Preference 👟

foot_coords = {
    'L':[270,520],
    'R':[600,520],
}

df.dropna(inplace=True)

df.replace('l', 'L', inplace=True)
df['Foot_x'] = df['Foot'].apply(lambda x: foot_coords[x][0])
df['Foot_y'] = df['Foot'].apply(lambda x: foot_coords[x][1])

df_feet = pd.DataFrame(df.groupby(['Foot','Foot_x', 'Foot_y']).size()).reset_index()
df_feet.rename(columns = {0:'Number of Shots'}, inplace= True)

show_shots(df_feet, 'Foot_x', 'Foot_y', 'Number of Shots', 70, 'Foot', ['Foot', 'Number of Shots'], 'Number of Shots', 'Left or Right Footed', 'stance.jpg')

Most players shoot with their right foot, which makes it way more likely for the ball to end up on the left side of the goal (Zone 1,4, and 7). This trend can be seen in our data and the corresponding plots. Left-footed players scoring in the top right will make or break this world Cup!

Conclusion [Post World Cup Final Results]

Vamos Argentina, Messi, and his team deserved the win after that heart-pounding 120-minute match which ultimately ended in penalties. Don't worry. The Data was updated, and the graphs were redrawn. For all my Data, as well as the Jupyter Notebook, you can check out the GitHub link to the Repository. Cheers, and see you in LA, 2026!

Python in the Browser? What in the world is PyScript

Aryan Garg — Mon, 12 Sep 2022 11:30:42 GMT

I don't like Web Development, especially Front-end Development with HTML and CSS, so when I heard about Pyscript, my mind started wandering into a new false reality. A reality where we didn't need javascript (I never got the hang of Javascript). Today, we aim to understand the basics of PyScript and try to understand why it was even developed.

Since the creation of Javascript in 1995, it first competed with NCSA Mosaic. After a very successful launch by then Browser Giant, Netscape, it gained the trust of the developer community and, by the next year, was handed off to an international scripting organization called ECMA (European Computer Manufacturers Association), which is responsible for the development and upkeep of this language to this day.

Brendan Eich created JavaScript to fill the need for a “glue language” used by informal programmers and designers. This allowed programmers to use JavaScript to put together components and automate interactions. At this point in our JavaScript history, there were two dominating web browsers: Netscape Navigator (with JavaScript) and Internet Explorer (with Jscript). And by the time the browser world shifted and Internet Explorer became the dominant browser, JavaScript evolved into the endorsed standard for writing interactive processing run in a web browser, to the point where it is necessary to develop web apps today.

Now, almost 30 years into the active development of Javascript, a challenger approaches. Created by Anaconda, Pyscript aims to bring development for 99% by allowing users to create rich Python applications in the browser using HTML's interface and the power of Pyodide, WASM, and modern web technologies. PyScript is still in heavy development, so it isn't advised to use it in a production environment. All warnings aside, let's dive deep into Python in the browser.

Installing the Pyscript Framework

Click here to download the zip file.
Copy and paste the following into your tag

<link rel="stylesheet" href="path/to/pyscript.css" />
<script defer src="path/to/pyscript.js">script>

Copy and paste the commands into your tag

<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.js">script>

Yup, it's as easy as that. You can now write Python inside your HTML file!

Writing your first "Program" in HTML

Yes, I just proved you wrong. We're gonna be programming in HTML. Let's write the iconic "Hello World" program into our IDE of choice.

Every IDE has a different shortcut to create the boilerplate code, so you don't end up typing it out every time. For JetBrains IDEs, we can use Ctrl+J to open the Template Menu and click on one of the many HTML formats available.

In the tag, insert this little snippet of text. If you know anything about python, you'll be able to decode what this one-liner will do.

 print('Hello, World!')

If it wasn't clear to you, we're "printing" hello world onto the screen. Open this file using a modern browser, and after a couple of seconds of loading, you'll finally have written a program in HTML.

Using Packages in PyScript

To use non-standard packages in Pyscript, we have to declare their use within tags separating them with new lines. The simple format for declaring numpy and pandas into your HTML code would be:

      <py-env>
        - numpy
        - pandas
      py-env>

Remember to declare this in the head tag just below the