<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Technically Speaking]]></title><description><![CDATA[Technically speaking is a blog that aims to spark an interest in the coolest of things in development! Written by a college student so you know its about to get *fresh* in here]]></description><link>https://blog.arygarg.me</link><generator>RSS for Node</generator><lastBuildDate>Wed, 29 Apr 2026 05:38:19 GMT</lastBuildDate><atom:link href="https://blog.arygarg.me/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[LLM Classifiers: Don't Just Classify, Conquer]]></title><description><![CDATA[Here is an everyday ask: you need to categorize data. Customer tickets, product reviews, medical images, financial transactions, or something else that’s just boring. The classic playbook says to grab a battle tested algorithm like Logistic Regressio...]]></description><link>https://blog.arygarg.me/llm-classifiers</link><guid isPermaLink="true">https://blog.arygarg.me/llm-classifiers</guid><category><![CDATA[llm]]></category><category><![CDATA[AI]]></category><category><![CDATA[chatgpt]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[RAG ]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Sun, 20 Jul 2025 08:41:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1752991953039/9c0dc6a7-5252-4b38-9b27-3abc260c77e2.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Here is an everyday ask: you <em>need</em> to categorize data. Customer tickets, product reviews, medical images, financial transactions, or something else that’s just boring. The classic playbook says to grab a battle tested algorithm like Logistic Regression or a Gradient Boosting, feed it labeled data, and call it a day. It's safe, reliable, and ... completely unimaginative.</p>
<p>What if, instead, you “handed” the job to a Large Language Model? It sounds like using a sledgehammer to crack a nut. A slow, expensive, and notoriously unpredictable sledgehammer. It’s the kind of idea that gets you laughed out of a planning meeting. And yet, it might be the smartest move you’ll make all year.</p>
<h2 id="heading-the-paradox-why-are-llms-such-awkward-classifiers">The Paradox: Why Are LLMs Such Awkward Classifiers?</h2>
<p>On paper, LLMs are the worst possible candidates for a classification job. The two concepts are fundamentally at odds.</p>
<table><tbody><tr><td><p><strong>Feature</strong></p></td><td><p><strong>Classic Classification</strong></p></td><td><p><strong>Large Language Models</strong></p></td></tr></tbody></table>

<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Task</strong></td><td><strong>Narrow &amp; Specific:</strong> Is this email spam or not?</td><td><strong>Broad &amp; Generative:</strong> Write a sonnet about spam.</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Output</strong></td><td><strong>Deterministic:</strong> A single, predictable label.</td><td><strong>Stochastic:</strong> Creative, varied, sometimes nonsensical text.</td></tr>
<tr>
<td><strong>Speed</strong></td><td><strong>Milliseconds:</strong> Built for high-throughput systems.</td><td><strong>Seconds (or more):</strong> Notoriously slow.</td></tr>
<tr>
<td><strong>Interpretability</strong></td><td><strong>High:</strong> We can often see <em>why</em> a decision was made.</td><td><strong>Zero:</strong> A black box wrapped in an enigma.</td></tr>
</tbody>
</table>
</div><h2 id="heading-when-to-stick-with-the-classics-the-case-for-traditional-ml">When to Stick with the Classics: The Case for Traditional ML</h2>
<p>Before we go further, let's be clear, LLMs are not a silver bullet. In many scenarios, using a classic text classifier isn't just a good option, it's the <em>best</em> option. These traditional models are fast, cheap, and highly effective for well-defined problems. You should absolutely stick with a classic model like Naive Bayes, SVM, or Gradient Boosting when:</p>
<ul>
<li><p><strong>Speed is Critical:</strong> If your application requires near-instantaneous, high-throughput classification (e.g., real-time ad bidding or initial spam filtering), the latency of a large LLM is a non-starter.</p>
</li>
<li><p><strong>The Problem is Simple:</strong> If your classes are clearly distinct and you have a good amount of labeled data, a traditional model will likely achieve high accuracy without the overhead and cost of an LLM.</p>
</li>
<li><p><strong>Budgets are Tight:</strong> Training and hosting a classic model is orders of magnitude cheaper than paying for every single classification via an LLM API call. For routine tasks at scale, these costs add up quickly.</p>
</li>
<li><p><strong>Interpretability is Non-Negotiable:</strong> In regulated industries like finance or healthcare, you <em>must</em> be able to explain why your model made a specific decision. Classic models offer this transparency. LLMs <strong>do not</strong>.</p>
</li>
</ul>
<h2 id="heading-and-why-you-should-use-them-anyway">...And Why You Should Use Them Anyway</h2>
<p>So, if classic classifiers are so effective, why entertain the LLM madness at all? Because for a certain class of chaotic, real-world problems, the benefits aren't just incremental; they're transformative.</p>
<ul>
<li><p><strong>Zero-to-One Speed:</strong> Forget data collection and training cycles. You can build a prototype classifier as fast as you can write your first prompt. This lets you validate ideas with users <em>before</em> you commit a single line of production code.</p>
</li>
<li><p><strong>The End of "Retraining":</strong> Adding a new category? With a classic model, you're back to the data labeling mines. With an LLM, you often just need to update the prompt. This isn't a minor convenience; it's a fundamental shift in operational agility.</p>
</li>
<li><p><strong>Embracing the Mess:</strong> Real-world data is a disaster. It's filled with typos, slang, sarcasm, and missing information. Traditional models choke on this. LLMs, trained on the messy entirety of the internet, often handle it. Multi-modal models can even classify based on a combination of text, images, and audio in a single pass.</p>
</li>
<li><p><strong>The Language Barrier Dissolves:</strong> Need to classify user feedback in Hindi, then English, and finally French? A single, well-designed LLM system can handle it without needing separate models for each language. This is a game-changer for global products.</p>
</li>
</ul>
<h2 id="heading-example-the-chaos-of-customer-intent">Example: The Chaos of Customer Intent</h2>
<ul>
<li><p><strong>Human Ambiguity:</strong> A customer might say, "My internet is broken," which sounds technical. But the one of the possible <em>reasons</em> it's broken is an unpaid bill. The true intent could be <code>Billing</code>, not <code>Technical Support</code>.</p>
</li>
<li><p><strong>Evolving Dialogue:</strong> The conversation is a moving target.</p>
<ul>
<li><p><strong>Bot:</strong> "How can I help you?"</p>
</li>
<li><p><strong>Customer:</strong> "I have a problem with my plan."</p>
</li>
<li><p><strong>Bot:</strong> "Is it your mobile plan or your home internet plan?"</p>
</li>
<li><p><strong>Customer:</strong> "The second one."</p>
</li>
<li><p>"<strong>The second one</strong>" is meaningless in isolation. The classifier needs the full context.</p>
</li>
</ul>
</li>
<li><p><strong>Organizational Mismatch:</strong> The customer thinks they want to "cancel their service" (<code>Cancellation</code> intent). But what they really need is to pause it for a month while they travel, a process handled by the <code>Sales</code> team. The team structure doesn't match the customer's mental model.</p>
</li>
<li><p><strong>Noisy Data:</strong> Speech-to-text errors, background noise, regional dialects, it's all part of the noise.</p>
</li>
</ul>
<h2 id="heading-the-modern-architectures-beyond-simple-prompting">The Modern Architectures: Beyond Simple Prompting</h2>
<p>If you think LLM classification is just about few-shot prompting, you're living in 2023. SOTA techniques aren’t so SOTA anymore.</p>
<h4 id="heading-1-the-semantic-searchlight-rag-for-classification">1. The Semantic Searchlight (RAG for Classification)</h4>
<p>This is our go-to. Instead of one giant prompt, we treat our intent descriptions as a database.</p>
<ul>
<li><p><strong>Setup:</strong> Each of our intents (<code>Billing</code>, <code>Sales</code>, <code>Tech Support</code>) has a detailed description, including edge cases and examples. We embed these descriptions into a vector space.</p>
</li>
<li><p><strong>Inference:</strong></p>
<ol>
<li><p>Take the incoming customer query (e.g., "My bill is wrong") and embed it.</p>
</li>
<li><p>Perform a vector search to find the top 3-5 most similar intent descriptions.</p>
</li>
<li><p>Inject <em>only these candidates</em> into a prompt for the LLM.</p>
</li>
<li><p>The LLM's task is now much simpler: "Given the query, which of these 3 options is the best fit?"</p>
</li>
</ol>
</li>
<li><p><strong>Pros:</strong> Dramatically smaller prompts (lower cost/latency), higher accuracy because you're filtering out irrelevant options, and it's interpretable (you know which candidates were considered).</p>
</li>
<li><p><strong>Cons:</strong> Your retrieval quality is paramount. If the right intent isn't in the top 5, the LLM can't pick it.</p>
</li>
</ul>
<h4 id="heading-2-finetune-the-hell-out-of-it-finetuning">2. Finetune the Hell out of it (Finetuning)</h4>
<p>Here, we modify the LLM itself to be a classification expert. Instead of generating text, we want it to output probabilities for our specific labels.</p>
<ul>
<li><p><strong>Setup:</strong> Take a base open-source model (like Llama 3 or a future equivalent). Add a small "classification head" to its final layer—this can be as simple as a single linear layer.</p>
</li>
<li><p><strong>Training:</strong> Fine-tune this modified model on your labeled dataset. The model learns to map its vast internal understanding of language directly to your specific set of intents.</p>
</li>
<li><p><strong>Pros:</strong> Unmatched accuracy and speed for your specific domain. You get the LLM's world knowledge baked into a highly specialized tool.</p>
</li>
<li><p><strong>Cons:</strong> This is the most complex approach, requiring ML engineering expertise for training and deployment.</p>
</li>
</ul>
<h4 id="heading-3-agent-47-but-with-water-pistols-agents-with-guardrails">3. Agent 47 but with Water Pistols (Agents with Guardrails)</h4>
<p>This is a hybrid approach that balances automation with safety.</p>
<ul>
<li><p><strong>Setup:</strong> An LLM acts as the primary classifier, but with a crucial safety rail.</p>
</li>
<li><p><strong>Inference:</strong></p>
<ol>
<li><p>The LLM makes an initial classification (e.g., predicts <code>Cancellation</code>).</p>
</li>
<li><p>Before executing, a second, simpler model or a set of business rules <em>verifies</em> the decision. For example, a rule might check: "Does the user's account history show recent travel bookings? If so, flag for human review, as they might want to pause, not cancel."</p>
</li>
<li><p>Only verified classifications are passed through to the next stage.</p>
</li>
</ol>
</li>
<li><p><strong>Pros:</strong> Gives you the flexibility of an LLM with the safety of a rules-based system.</p>
</li>
<li><p><strong>Cons:</strong> Can add latency and requires careful design of the verification step.</p>
</li>
</ul>
<h2 id="heading-your-action-plan-how-to-get-started-today">Your Action Plan: How to Get Started <strong>Today</strong></h2>
<ol>
<li><p><strong>Create a "Consensus Corpus":</strong> Before you write a single prompt, grab 100 real data points which match your use case, maybe from production (ideal) or synthetically generated. Sit down and label them. This exercise is invaluable for aligning yourself with the task and exposing ambiguities in your categories. This becomes your "golden set" for testing.</p>
</li>
<li><p><strong>Benchmark the Basics:</strong> Define your baseline. If 40% of your tickets are <code>Billing</code>, then any model must be better than 40% accurate. Better yet, run your golden set through a classic ML model. This gives you a real performance target to beat.</p>
</li>
<li><p><strong>Prototype with RAG</strong> This is the sweet spot of power and practicality. Use a vector database service and a powerful API model (like GPT-4 or Gemini) to quickly test the architecture. Measure its performance against your golden set.</p>
</li>
<li><p><strong>Analyze the Errors, Not Just the Accuracy:</strong> Don't just look at the final score. Where is it failing? Is it confusing <code>Sales</code> with <code>Cancellations</code>? This tells you where your intent descriptions need more detail or where your retrieval is weak. The business impact of a misclassification is not uniform; failing to detect a <code>Customer Complaint</code> is far worse than missing a <code>General Inquiry</code>.</p>
</li>
</ol>
<h3 id="heading-why-not-just-build-a-fully-autonomous-llm-agent-to-handle-everything">Why not just build a fully autonomous LLM agent to handle everything?</h3>
<p>Valid question, but that's a recipe for disaster. We don't need the LLM's creativity to solve a routine billing issue. We need speed, accuracy, and control. Letting an agent run wild could lead to it incorrectly modifying a user's account or giving out confidential information. The goal is to use the LLM's intelligence as a scalpel, not a wrecking ball.</p>
<h2 id="heading-final-thoughts-its-a-culture-shift">Final Thoughts: It's a Culture Shift</h2>
<p>Adopting LLMs for classification isn't just a technical change; it's a change in mindset. You move from being a "model trainer" to a "system designer." Your skills in prompt engineering, system architecture, and critical analysis of model outputs become more important than your ability to tune hyperparameters.</p>
<p>It's a challenging path, unexpected behavior and new failure modes at every turn. But the reward is a system that is more flexible, scalable, and intelligent than anything that has come before. Stop thinking of classification as just putting things in boxes. Start thinking of it as understanding.</p>
<p><em>Peace Love and Plants 🪴</em></p>
]]></content:encoded></item><item><title><![CDATA[Friendship over with Attention. Now FNet is my best friend]]></title><description><![CDATA[The Transformer architecture, introduced in the seminal paper "Attention Is All You Need", has revolutionized AI, particularly in Natural Language Processing (NLP). Its success hinges on the self-attention mechanism, which allows the model to dynamic...]]></description><link>https://blog.arygarg.me/fnet-vs-attention</link><guid isPermaLink="true">https://blog.arygarg.me/fnet-vs-attention</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[technology]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Wed, 07 May 2025 03:50:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1746589095849/00880682-1dd0-457c-9012-0cb34e8840a8.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Transformer architecture, introduced in the seminal paper "<a target="_blank" href="https://arxiv.org/abs/1706.03762?hl=en-US">Attention Is All You Need</a>", has revolutionized AI, particularly in Natural Language Processing (NLP). Its success hinges on the self-attention mechanism, which allows the model to dynamically weigh the importance of different words or tokens in an input sequence relative to each other, capturing context and long-range dependencies effectively.</p>
<h2 id="heading-the-attention-bottleneck-power-vs-price">The Attention Bottleneck: Power vs. Price</h2>
<p>Think of self-attention like this: for every word in a sentence, the model calculates an "attention score" indicating how relevant every other word is to understanding that specific word's meaning in context. This is done using Query (Q), Key (K), and Value (V) projections derived from the input embeddings. The scores determine how much information from other words (Values) should be blended into the current word's representation.</p>
<p>While incredibly powerful, this pairwise comparison is computationally intensive. The complexity scales quadratically (O(N²)) with the sequence length (N) in terms of both computation and memory. This means doubling the input length (e.g., from a paragraph to a full document) quadruples the resources needed for the attention layers. This quadratic scaling becomes a major bottleneck for:</p>
<ul>
<li><p>Training: Making it expensive and time-consuming to train models on large datasets or long sequences.</p>
</li>
<li><p>Inference: Limiting the speed at which models can process long inputs in real-time applications.</p>
</li>
<li><p>Deployment: Making it challenging to run large Transformers on resource-constrained hardware like smartphones or embedded systems (AI at the Edge).</p>
</li>
</ul>
<h2 id="heading-enter-fnet-computing-with-fourier-transforms">Enter FNet: Computing with Fourier Transforms</h2>
<p>Researchers at Google AI proposed a startlingly efficient alternative in their paper "FNet: Mixing Tokens with Fourier Transforms". Their model, FNet, completely replaces the computationally heavy self-attention layers within the Transformer encoder block with a standard, parameter-free Fourier Transform.</p>
<p>What's a Fourier Transform? Originating in signal processing, the Discrete Fourier Transform (DFT) decomposes a sequence (like the sequence of token embeddings) into its constituent frequencies. It essentially reveals the underlying periodic patterns or "resonances" within the data. The Fast Fourier Transform (FFT) is simply a highly efficient algorithm for computing the DFT, reducing its complexity from O(N²) to O(N log N).</p>
<p>In FNet, the FFT is applied to mix information across the token sequence. The intuition is that analyzing the frequency components provides a form of global context, capturing interactions between tokens without the need for explicit pairwise comparisons. It's a shift from learning adaptive relationships (attention) to leveraging a fixed, mathematically defined structure (FFT) for mixing information globally. FNet applies the FFT along both the sequence and hidden dimensions, taking only the real part of the complex-valued output for simplicity and empirical effectiveness.</p>
<h2 id="heading-scalable-performance-huge-speedups">Scalable Performance, Huge Speedups</h2>
<p>The results published in the FNet paper were compelling:</p>
<ul>
<li><p>Speed: FNet trains significantly faster than standard BERT-Base models – up to 80% faster on GPUs and 70% faster on TPUs for typical sequence lengths (512 tokens). O(N log N) scaling makes it increasingly advantageous for longer sequences.</p>
</li>
<li><p>Accuracy: Despite its simplicity and lack of parameters in the mixing layer, FNet achieves <strong>92-97%</strong> of BERT's accuracy on the diverse tasks within the GLUE benchmark. While there's a slight accuracy trade-off compared to the best attention-based models, its performance demonstrates remarkable viability.</p>
</li>
<li><p>Hybrid Approach: An "FNet-Hybrid" model, strategically replacing only some attention layers with FFT layers (specifically, keeping attention only in the final two layers), recovered most of the performance gap, reaching 99% of BERT's accuracy. This highlights a potential synergy, using FFT for efficient global mixing and attention for finer-grained, adaptive refinement.</p>
</li>
<li><p>Long Sequences (LRA): When tested on the Long Range Arena (LRA) benchmark, designed specifically for evaluating models on long-context tasks, FNet matched the accuracy of top-performing "efficient Transformer" variants (like Reformer, Performer) while being significantly faster and more memory-efficient than all competitors on GPUs across the tested sequence lengths.</p>
</li>
</ul>
<p>Experiments confirmed key design choices: mixing is crucial (removing it entirely fails), the FFT's structure provides a useful inductive bias (random mixing underperforms), and adding learnable parameters to the FFT itself didn't help, reinforcing the value of the fixed transform.</p>
<h2 id="heading-market-implications">Market Implications</h2>
<p>FNet's efficiency isn't just an academic curiosity; it has tangible implications:</p>
<ul>
<li><p>Short-Term: The dramatic reduction in computational cost immediately benefits latency-sensitive applications (real-time translation, content moderation) and enables more powerful AI at the Edge. Running complex NLP models directly on devices becomes more feasible, improving privacy and reducing reliance on cloud connectivity. Companies offering AI infrastructure (like Clika) can leverage such architectures to provide faster, cheaper inference solutions.</p>
</li>
<li><p>Long-Term: FNet and similar research signal a potential shift away from monolithic, attention-heavy models towards more diverse, efficient architectures. This could foster innovation in areas like low-power LLMs and truly ubiquitous on-device intelligence. It challenges the economic moat of large cloud AI providers whose value relies partly on managing attention's complexity at scale. Furthermore, it underscores that architectural innovation can sometimes bypass the need for purely brute-force compute scaling (i.e., ever-larger GPU clusters), potentially impacting hardware markets. This trend aligns with a move towards more distributed and decentralized AI systems.</p>
</li>
<li><p>Jevons' Paradox: The idea that increasing efficiency in resource use can sometimes lead to an overall increase in resource consumption because the lower cost spurs greater demand. In this context, making Transformer-like models much cheaper and faster could dramatically increase their adoption and the variety of applications they're used for, potentially leading to more overall AI computation, even if individual tasks are more efficient.</p>
</li>
</ul>
<h2 id="heading-the-bigger-picture-democratizing-ai">The Bigger Picture: Democratizing AI</h2>
<p>FNet exemplifies how revisiting fundamental mathematical tools can lead to breakthroughs in efficiency. By demonstrating that a parameter-free FFT can effectively replace complex self-attention for many tasks, it challenges core assumptions in model design. This focus on efficiency is crucial for democratizing AI, making powerful models more accessible, affordable, and deployable in a wider range of environments, especially beyond large data centers. While attention remains state-of-the-art for peak performance, FNet proves that highly efficient alternatives are viable, paving the way for a future</p>
<p><em>with</em> <em>smarter, leaner AI.</em></p>
]]></content:encoded></item><item><title><![CDATA[PaliGemma 2 - VLMs made easy]]></title><description><![CDATA[Introduction
The evolution of vision-language models has been nothing short of remarkable. From their early stages of independently handling images and text to their current ability to seamlessly integrate the two, these models have reached new heigh...]]></description><link>https://blog.arygarg.me/paligemma-2</link><guid isPermaLink="true">https://blog.arygarg.me/paligemma-2</guid><category><![CDATA[Google]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[huggingface]]></category><category><![CDATA[finetuning]]></category><category><![CDATA[VLM]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Sun, 08 Dec 2024 13:05:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1733661916659/5268b3ad-67c5-4364-90aa-692989f51f43.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>The evolution of vision-language models has been nothing short of remarkable. From their early stages of independently handling images and text to their current ability to seamlessly integrate the two, these models have reached new heights. Imagine describing the content of a photo, answering detailed questions about it, or creating vivid images from mere text — these are the feats made possible by modern vision-language models.</p>
<p>Fine-tuning these models is the key to unlocking their full potential. While pre-trained models like PaliGemma 2 offer impressive capabilities out of the box, adapting them to specific datasets or tasks can significantly boost their performance. Fine-tuning ensures the model not only generalizes well but also excels in understanding the context and nuances of your application, whether it's medical imaging, e-commerce, or creative content generation.</p>
<hr />
<h3 id="heading-meet-paligemma-2">Meet PaliGemma 2</h3>
<p>PaliGemma 2, the latest open-source vision-language model released by Google, is a testament to how far these technologies have come. This sophisticated system takes images and text as inputs and generates textual outputs. Whether you’re creating captions for photos or answering intricate visual questions, PaliGemma 2 is designed to handle it all.</p>
<h4 id="heading-key-components">Key Components</h4>
<ul>
<li><p><strong>SigLIP-So400m</strong>: The image encoder, built with a philosophy similar to CLIP, excels at jointly understanding images and text. It processes visual data with remarkable accuracy, making it a robust foundation for multimodal tasks.</p>
</li>
<li><p><strong>Gemma-2B</strong>: The text decoder, a powerhouse explicitly crafted for generating coherent and contextually rich text.</p>
</li>
</ul>
<p>By connecting <strong>SigLIP</strong>'s capabilities with <strong>Gemma</strong> via a simple linear adapter, <strong>PaliGemma 2</strong> emerges as a comprehensive solution. Pre-trained on image-text datasets, it is versatile enough to tackle various tasks, such as:</p>
<ul>
<li><p><strong>Image Captioning</strong>: Generating detailed descriptions of images.</p>
</li>
<li><p><strong>Segmentation</strong>: Identifying and labeling objects in images.</p>
</li>
<li><p><strong>Question Answering</strong>: Given multimodal inputs of Images and related questions, we can have it answer questions for us.</p>
</li>
</ul>
<hr />
<h1 id="heading-lets-get-started">Let’s get started</h1>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🖥</div>
<div data-node-type="callout-text">Before diving into fine-tuning PaliGemma 2, it's crucial to be prepared for the resource demands. This process will require <strong>a TON of GPU memory</strong>. If you're planning to experiment with Kaggle's free-tier environment, note that its <strong>2x T4 GPUs</strong> were not powerful enough.</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">🦾</div>
<div data-node-type="callout-text">However, You can try using <strong>Google Cloud Platform</strong> with <strong>AI Notebooks</strong> and opt for a <strong>NVIDIA A100 GPU</strong>, which provides significantly more memory and computational power. This setup should offer a smoother experience for fine-tuning the model effectively.</div>
</div>

<h3 id="heading-installing-packages">Installing Packages</h3>
<pre><code class="lang-python">!pip install -q -U git+https://github.com/huggingface/transformers.git datasets accelerate peft
!pip install -U bitsandbytes  <span class="hljs-comment"># for QLoRA and LoRA</span>
</code></pre>
<h3 id="heading-loading-our-authentication-keys-from-huggingface">Loading our Authentication Keys from HuggingFace</h3>
<p>To fine-tune <strong>PaliGemma 2</strong> or work with any Hugging Face tools, you'll need to authenticate using an access token. Follow these steps to generate and export it:</p>
<ol>
<li><p><strong>Get Your Access Token</strong><br /> Log in to your Hugging Face account and navigate to the Access Tokens page.</p>
<ul>
<li><p>If you don’t already have a token, create one by clicking "New Token".</p>
</li>
<li><p>Assign the necessary scope (e.g., <code>write</code> access for fine-tuning tasks).</p>
</li>
</ul>
</li>
<li><p><strong>Load your Token into your code</strong></p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> kaggle_secrets <span class="hljs-keyword">import</span> UserSecretsClient
<span class="hljs-keyword">import</span> os
user_secrets = UserSecretsClient()
hf_secret = user_secrets.get_secret(<span class="hljs-string">"HF General"</span>)
os.environ[<span class="hljs-string">"HF_General"</span>] = hf_secret
</code></pre>
<p><strong>OR</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os  
os.environ[<span class="hljs-string">"HF_General"</span>] = <span class="hljs-string">"&lt;your_access_token&gt;"</span>
</code></pre>
<h3 id="heading-authenticate-using-the-hugging-face">Authenticate using the Hugging Face</h3>
<pre><code class="lang-python">!huggingface-cli login --token $HF_General
print(<span class="hljs-string">"Done Authentication"</span>)
</code></pre>
<h3 id="heading-loading-our-data">Loading our Data</h3>
<p>To fine-tune <strong>PaliGemma 2</strong>, we’ll use a <a target="_blank" href="https://huggingface.co/datasets/HuggingFaceM4/ChartQA"><strong>Chart Question Answering</strong></a> <strong>(ChartQA) dataset</strong> available on Hugging Face's <code>datasets</code> library. This dataset includes pairs of images and questions about them, along with corresponding answers, making it perfect for multimodal fine-tuning tasks.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset
print(<span class="hljs-string">"Started to Load Dataset"</span>)
train_ds = load_dataset(<span class="hljs-string">'HuggingFaceM4/ChartQA'</span>, split=<span class="hljs-string">"train+val"</span>)
print(<span class="hljs-string">"Done Loading Dataset"</span>)
</code></pre>
<pre><code class="lang-python">cols_remove = [<span class="hljs-string">"human_or_machine"</span>]
train_ds = train_ds.remove_columns(cols_remove)
</code></pre>
<pre><code class="lang-python">test_ds = load_dataset(<span class="hljs-string">'HuggingFaceM4/ChartQA'</span>, split=<span class="hljs-string">"test"</span>) 
test_ds = test_ds.remove_columns(cols_remove)
</code></pre>
<h3 id="heading-loading-the-pre-processor">Loading the (Pre) Processor</h3>
<p>To prepare our dataset for <strong>Paligemma 2</strong>, we’ll use the <code>PaliGemmaProcessor</code>. This processor handles both image processing and text tokenization, simplifying the workflow for fine-tuning vision-language models.</p>
<h4 id="heading-loading-the-processor">Loading the Processor</h4>
<p>First, load the processor for the 224x224 version of <strong>PaliGemma 2</strong>, which is more memory-efficient and suitable for general-purpose tasks:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> PaliGemmaProcessor
model_id = <span class="hljs-string">"google/paligemma2-3b-pt-224"</span>
processor = PaliGemmaProcessor.from_pretrained(model_id)
print(<span class="hljs-string">"Done Loading Model"</span>)
</code></pre>
<p>There are higher-resolution versions available (448x448 and 896x896) as well as models with larger number of Parameters (10B, 28B) for tasks requiring more precision, like OCR or detailed segmentation. However, these demand more GPU memory and computation power.</p>
<p>Set the device to ‘cuda’ to use the GPU and load the model. We will Specify that the model should use <code>bfloat16</code> (Brain Float 16) precision for its parameters. <code>bfloat16</code> is a 16-bit floating point format that helps speed up computation and reduces memory usage while maintaining a similar range to <code>float32</code>.</p>
<h3 id="heading-preparing-the-model-layers">Preparing the model layers</h3>
<p>To prepare <strong>PaliGemma 2</strong> for fine-tuning, we freeze the vision tower by setting <code>requires_grad=False</code> for its parameters, preserving its pre-trained visual features, while enabling training for the multi-modal projector by setting <code>requires_grad=True</code>, allowing it to adapt image-text alignment to the task. This setup ensures efficient use of pre-trained features while optimizing task-specific components.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Vision Tower Parameters (Image Encoder)</span>
<span class="hljs-keyword">for</span> param <span class="hljs-keyword">in</span> model.vision_tower.parameters():
    param.requires_grad = <span class="hljs-literal">False</span>

<span class="hljs-comment"># Multi-Modal Projector Parameters (Fine-Tuning the Decoder)</span>
<span class="hljs-keyword">for</span> param <span class="hljs-keyword">in</span> model.multi_modal_projector.parameters():
    param.requires_grad = <span class="hljs-literal">True</span>
</code></pre>
<blockquote>
<p>We will load the model, and freeze the image encoder and the projector, and only fine-tune the decoder. If your images are within a particular domain, which might not be in the dataset the model was pre-trained with, you might want to skip freezing the image encoder. —Hugging Face Blog.</p>
</blockquote>
<h3 id="heading-why-freeze-the-image-encoder-and-projector">Why Freeze the Image Encoder and Projector?</h3>
<p>Freezing the <strong>image encoder</strong> and <strong>multi-modal projector</strong> in a pre-trained model offers several benefits:</p>
<ul>
<li><p><strong>General Features</strong>: The image encoder, often trained on large datasets like ImageNet, has learned to extract universal visual features that are widely applicable.</p>
</li>
<li><p><strong>Pre-Trained Integration</strong>: The multi-modal projector is already designed to align image and text features effectively, minimizing the need for additional fine-tuning.</p>
</li>
<li><p><strong>Resource Efficiency</strong>: By reducing the number of trainable parameters, freezing these components speeds up training and lowers computational demands, making the process more efficient.</p>
</li>
</ul>
<p>This strategy allows the model to leverage pre-trained strengths while focusing training resources on task-specific components.</p>
<hr />
<h2 id="heading-why-fine-tune-the-decoder">Why Fine-Tune the Decoder?</h2>
<p><strong>Task Specificity:</strong> The decoder must be fine-tuned for the specific task. Fine-tuning allows it to learn how to generate the appropriate output based on the particular types of input it will receive in your application.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Define a <code>collate_fn</code> function. The function returns the final batch of tokens containing the tokenized text, images, and labels, all converted to the appropriate format and moved to the right device for efficient computation.</div>
</div>

<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
device = <span class="hljs-string">"cuda"</span>

image_token = processor.tokenizer.convert_tokens_to_ids(<span class="hljs-string">"&lt;image&gt;"</span>)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">collate_fn</span>(<span class="hljs-params">examples</span>):</span>
  texts = [<span class="hljs-string">"Answer the following Question: "</span> + example[<span class="hljs-string">"query"</span>] <span class="hljs-keyword">for</span> example <span class="hljs-keyword">in</span> examples]
  labels= [example[<span class="hljs-string">'label'</span>][<span class="hljs-number">0</span>] <span class="hljs-keyword">for</span> example <span class="hljs-keyword">in</span> examples]
  images = [example[<span class="hljs-string">"image"</span>].convert(<span class="hljs-string">"RGB"</span>) <span class="hljs-keyword">for</span> example <span class="hljs-keyword">in</span> examples]
  tokens = processor(text=texts, images=images, suffix=labels,
                    return_tensors=<span class="hljs-string">"pt"</span>, padding=<span class="hljs-string">"longest"</span>,
                    tokenize_newline_separately=<span class="hljs-literal">False</span>)

  tokens = tokens.to(torch.bfloat16).to(device)
  <span class="hljs-keyword">return</span> tokens
</code></pre>
<h3 id="heading-defining-the-trainer">Defining the Trainer</h3>
<p>Hugging Face makes it really easy to finetune models, either through their GUI based AutoTrain as well as their Trainer Module.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments
args = TrainingArguments(
    num_train_epochs=<span class="hljs-number">2</span>,
    remove_unused_columns=<span class="hljs-literal">False</span>,
    per_device_train_batch_size=<span class="hljs-number">2</span>,
    gradient_accumulation_steps=<span class="hljs-number">4</span>,
    warmup_steps=<span class="hljs-number">1</span>,
    learning_rate=<span class="hljs-number">2e-5</span>,
    weight_decay=<span class="hljs-number">1e-6</span>,
    adam_beta2=<span class="hljs-number">0.999</span>,
    logging_steps=<span class="hljs-number">100</span>,
    optim=<span class="hljs-string">"adamw_hf"</span>,
    save_strategy=<span class="hljs-string">"epoch"</span>,
    save_steps=<span class="hljs-number">5000</span>,
    push_to_hub=<span class="hljs-literal">True</span>,
    save_total_limit=<span class="hljs-number">1</span>,
    output_dir=<span class="hljs-string">"paligemma2-3b-pt-224_HuggingFaceM4_ChartQA"</span>,
    bf16=<span class="hljs-literal">True</span>,
    report_to=[<span class="hljs-string">"tensorboard"</span>],
    dataloader_pin_memory=<span class="hljs-literal">False</span>,
    gradient_checkpointing=<span class="hljs-literal">True</span>,
    dataloader_drop_last=<span class="hljs-literal">True</span>,
)
</code></pre>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer

trainer = Trainer(
        model=model,
        train_dataset=train_ds ,
        eval_dataset = test_ds,
        data_collator=collate_fn,
        args=args
        )
</code></pre>
<pre><code class="lang-python">trainer.train()
</code></pre>
<h3 id="heading-and-thats-it">And that’s it</h3>
<p>Your model should be training now, Give it an hour or so and you’ll be ready with your very own finetuned version of PaliGemma 2.</p>
<p>You can Infer from the model using the code below:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor, PaliGemmaForConditionalGeneration

model_id = <span class="hljs-string">"YourUserID/paligemma2-3b-pt-224_HuggingFaceM4_ChartQA"</span>
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(<span class="hljs-string">"google/paligemma2-3b-pt-224"</span>)
</code></pre>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">import</span> requests


prompt = <span class="hljs-string">"Question"</span>
image_file = <span class="hljs-string">"Link to Image"</span>
raw_image = Image.open(requests.get(image_file, stream=<span class="hljs-literal">True</span>).raw)
</code></pre>
<pre><code class="lang-python">inputs = processor(prompt, raw_image.convert(<span class="hljs-string">"RGB"</span>), return_tensors=<span class="hljs-string">"pt"</span>)
output = model.generate(**inputs, max_new_tokens=<span class="hljs-number">20</span>)

print(processor.decode(output[<span class="hljs-number">0</span>], skip_special_tokens=<span class="hljs-literal">True</span>)[len(prompt):])
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Fine-tuning PaliGemma 2 marks a significant step in leveraging advanced vision-language models for specialized tasks. By customizing the model to your specific dataset, you enhance its ability to perform with greater accuracy and relevance in applications like image captioning and visual question answering. Freezing the image encoder while training the decoder efficiently utilizes computational resources, allowing the model to focus on generating precise textual outputs. Setting up the appropriate environmental resources, such as using a GPU with sufficient memory, ensures a smoother fine-tuning process. As you finalize your model, you're not just adapting a powerful tool to your needs—you're expanding the possibilities of multimodal AI in your field. Embrace this opportunity to push the boundaries and see how fine-tuned models can revolutionize your projects.</p>
]]></content:encoded></item><item><title><![CDATA[An honest Guide to Optimize LLMs for upto 10x Inference]]></title><description><![CDATA[Introduction
The AI revolution has officially gone mainstream. From crafting the perfect 'Good Morning' message with Chat-GPT to generating human-like responses, Large Language Models (LLMs) have taken the world by storm. But behind the scenes, these...]]></description><link>https://blog.arygarg.me/an-honest-guide-to-optimize-llms-for-upto-10x-inference</link><guid isPermaLink="true">https://blog.arygarg.me/an-honest-guide-to-optimize-llms-for-upto-10x-inference</guid><category><![CDATA[AI]]></category><category><![CDATA[optimization]]></category><category><![CDATA[huggingface]]></category><category><![CDATA[llm]]></category><category><![CDATA[Python]]></category><category><![CDATA[nlp]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Tue, 23 Apr 2024 18:32:02 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1713896656614/5a565c29-ab49-4478-b70c-fc1c4307d36a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>The AI revolution has officially gone mainstream. From crafting the perfect 'Good Morning' message with Chat-GPT to generating human-like responses, Large Language Models (LLMs) have taken the world by storm. But behind the scenes, these behemoths of AI require staggering amounts of compute power and energy to train. The latest example is Llama3, Meta AI's massive model trained on two super clusters of 24,000+ Nvidia H100 GPUs each. As the scale of these models continues to grow, so do the costs of building and maintaining them. In fact, <a target="_blank" href="https://www.scientificamerican.com/article/the-ai-boom-could-use-a-shocking-amount-of-electricity/">some</a> projections suggest that the compute and electrical power needed to train such models could soon surpass the requirements of small countries.</p>
<h2 id="heading-inference-time-optimizations">Inference Time Optimizations.</h2>
<p>In this landscape, optimizing inference time has become crucial. While model parameter count gets most of the attention, inference time - the time it takes for a model to make a prediction from a given input - is a critical metric that can make or break the usability of an AI system. In the context of language models, inference time is often measured in tokens per second (tk/s). Reducing inference time can significantly lower operational costs, making AI more accessible and sustainable in the future.</p>
<p>In the below Image, the training time would be the time to train the Neural network to identify images of cats, and inference time would be the time it takes for the pre-trained neural network to return a confidence value, if a cat is in the image.</p>
<p><img src="https://backblazeprod.wpenginepowered.com/wp-content/uploads/2023/11/bb-bh-Training-vs-Inference_Final-1536x875.png" alt class="image--center mx-auto" /></p>
<p>In this discussion, we'll delve into the world of inference time optimizations, exploring techniques and strategies to speed up your PyTorch models without sacrificing the final output.</p>
<h3 id="heading-quantization-using-pytorch">Quantization using PyTorch</h3>
<p>Quantization is a technique used to reduce the precision of model weights from floating-point numbers to integers. This process, also known as weight quantization, aims to decrease the memory footprint and computational requirements of LLMs, making them more efficient and deployable on resource-constrained devices. By representing model parameters with fewer bits, quantization can lead to significant reductions in model size, inference time, and energy consumption, while maintaining acceptable accuracy. However, quantization can also introduce accuracy degradation, and careful tuning of quantization parameters is necessary to balance the trade-off between model efficiency and accuracy.</p>
<p><img src="https://www.allaboutcircuits.com/uploads/articles/qc-tech_quantization_gif-2_final.jpg" alt class="image--center mx-auto" /></p>
<p>Let's code out a simple example using <code>facebook/mbart-large-50-many-to-many-mmt</code> model. This model developed by Facebook can easily Translate to 50 languages from any of its supported base languages. It has over 611 Million parameters. To magnify the efforts of each of the following optimizations, we will be running them on the CPU, but also sharing statistics of their GPU counterparts.</p>
<p>We can easily initiate the model by using the HuggingFace Transformers Library.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"facebook/mbart-large-50-many-to-many-mmt"</span>)
model = AutoModelForSeq2SeqLM.from_pretrained(<span class="hljs-string">"facebook/mbart-large-50-many-to-many-mmt"</span>)
</code></pre>
<p>As well as other imports we may need from PyTorch</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.quantization
</code></pre>
<p>We have a choice on how much we want to quantize the model. This can range from float32(basically no change since models are usually stored in Float32) to int1 (which is not available on PyTorch Quantization, but was talked about extensively in <a target="_blank" href="https://arxiv.org/abs/2402.17764">this</a> paper by Microsoft Research). The options available on Pyorch are:</p>
<ul>
<li><p><code>torch.quint8</code></p>
</li>
<li><p><code>torch.qint8</code></p>
</li>
<li><p><code>torch.qint32</code></p>
</li>
<li><p><code>torch.float16</code></p>
</li>
</ul>
<p>Out of these, <code>torch.float16</code> would perform the 'worst' while <code>torch.quint8</code> will anecdotally perform the best. Let us translate from Chinese to English with a relatively complex phrase picked up from <a target="_blank" href="https://byjus.com/english/paragraph-on-india/">Byjus</a>.</p>
<pre><code class="lang-python">article_ch = <span class="hljs-string">'印度是一片美麗的土地，擁有多種野生動物和豐富的文化多樣性。孟加拉虎被認為是印度的國獸。印度每年 8 月 15 日慶祝獨立紀念日。人們慶祝這個節日是為了紀念印度從英國統治下獲得自由。三色國旗稱為“Tiranga”，由藏紅花、白色和綠色設計，國旗中央為海軍藍色的阿肖克脈輪。 「阿育王獅都」是該國的國徽。國家座右銘是 "Satyameva Jayate"，意思是只有真理才能獲勝。 為了順利管理國家，並使其成為一個獨立的國家，需要一部於1950年1月26日生效的憲法。 印度是一個擁有多種不同語言和多種宗教的國家，如佛教、耆那教、伊斯蘭教、印度教等。 。'</span>
</code></pre>
<p>After running the quantized and non-optimized model we see the following differences in Translation with their approximate time of execution below.</p>
<pre><code class="lang-python">quantized_model = torch.quantization.quantize_dynamic(
    model, dtype=torch.qint8
)
tokenizer.src_lang = <span class="hljs-string">"zh_CN"</span>

encoded_ch = tokenizer(article_ch, return_tensors=<span class="hljs-string">"pt"</span>)

generated_tokens = quantized_model.generate(
    **encoded_ch,
    forced_bos_token_id=tokenizer.lang_code_to_id[<span class="hljs-string">"en_XX"</span>]
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=<span class="hljs-literal">True</span>)
</code></pre>
<p><strong>OUTPUT:</strong></p>
<p>India is a beautiful land with a wide variety of wildlife and rich cultural diversity. The Bengal tiger is considered to be India's national beast. India celebrates Independence Day on August 15, every year. It is celebrated to commemorate India's liberation from British rule. The three-coloured flag is called "Tiranga", designed with Tibetan red flowers, white and green, and the flag is centered on the Navy's blue Ashok Ring. The lion is the national emblem of the country. The right wing of the flag is "Satyameva Jayate", meaning that only truth can prevail. In order to successfully govern the country and make it an independent country, it is necessary to have a constitution in force on January 26, 1950. India is a country with a wide variety of languages and religions, such as Buddhism, Jainism, Islam, Hinduism,</p>
<p>Inference Time: 26.802 sec</p>
<pre><code class="lang-plaintext">encoded_ch = tokenizer(article_ch, return_tensors="pt")
generated_tokens = model.generate(
    **encoded_ch,
    forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"]
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
</code></pre>
<p><strong>OUTPUT:</strong></p>
<p>India is a beautiful land with a wide variety of wildlife and rich cultural diversity. The Bengal tiger is considered to be India's national animal. India celebrates Independence Day on August 15 every year. It is celebrated to commemorate India's liberation from British rule. The three-coloured flag is called "Tiranga", designed with Tibetan red flowers, white and green, with the flag centered on the Navy's blue Ashok Ring. "Ayurveda Lion" is the national emblem of the country. The country's right-hand inscription is "Satyameva Jayate", meaning that only truth can prevail. In order to successfully govern the country and make it an independent country, it is necessary to have a constitution that entered into force on January 26, 1950. India is a country with a wide variety of languages and religions, such as Buddhism, Jainism,</p>
<p>Inference Time: 99.025</p>
<p>We can see with a simple 2 lines of additional code, we have generated an improvement of 3.69 (nice!) with little loss to the end result. The final output of both LLMs are not identical to the initial input string, but we can chalk that up to Google Translate not being the best at what it does.</p>
<h2 id="heading-optimum-by-hugging-face">Optimum by Hugging Face</h2>
<p>Optimum is an open-source library developed by Hugging Face. It leverages various optimization techniques, such as quantization, pruning, and knowledge distillation. Optimum enables developers to reduce the computational requirements and memory usage of their models, making them more efficient and deployable on resource-constrained devices. Since its release, Optimum has gained immense popularity within the machine learning community, with thousands of stars on GitHub and widespread adoption in industries such as computer vision, natural language processing, and autonomous driving. Its popularity can be attributed to its ease of use, flexibility, and the significant performance improvements it offers, making it an essential tool for anyone looking to deploy AI models in real-world applications. By providing a simple and standardized way to optimize models, Optimum has enabled developers to focus on building innovative applications rather than worrying about the underlying infrastructure associated with machine learning tasks.</p>
<p>A key part of using Optimum would be converting the model to ONNX(Open Neural Network Exchange). ONNX is an open format used to represent deep learning models, allowing them to be exchanged and executed across different frameworks and platforms. Developed by Microsoft, Amazon, and Facebook, ONNX provides a common language for AI models, enabling seamless interoperability between various deep learning frameworks, such as TensorFlow, PyTorch etc. This open standard enables developers to train models in one framework and deploy them in another, without the need for retraining or rewriting the model.</p>
<p>Out of the gate, Optimum allows us to either pragmatically use its interface or navigate through via its CLI. We will be using the CLI in this example. We will be attempting to reduce the inference time on a well known summarization model, i.e. <code>t5-small</code> developed by Google AI.</p>
<p>Start by downloading the required libraries</p>
<pre><code class="lang-bash">pip install optimum[onnxruntime-gpu]
pip install optimum[onnxruntime]
</code></pre>
<p>Now using the <code>optimum-cli</code> we can optimize the model on 4 levels:</p>
<ul>
<li><p>O1 basic general optimizations</p>
</li>
<li><p>O2 basic and extended general optimizations, transformers-specific fusions</p>
</li>
<li><p>O3 same as O2 with GELU approximation</p>
</li>
<li><p>O4 same as O3 with mixed precision (fp16, GPU-only)</p>
</li>
</ul>
<pre><code class="lang-bash">optimum-cli <span class="hljs-built_in">export</span> onnx --model t5-small --optimize O3 t5_onnx/ --device cuda
</code></pre>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
<span class="hljs-keyword">from</span> optimum.onnxruntime <span class="hljs-keyword">import</span> ORTModelForSeq2SeqLM
<span class="hljs-keyword">import</span> torch

tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"t5-small"</span>)
model = AutoModelForSeq2SeqLM.from_pretrained(<span class="hljs-string">'t5-small'</span>)
onnx_model = ORTModelForSeq2SeqLM.from_pretrained(<span class="hljs-string">"t5_onnx"</span>)

device = torch.device(<span class="hljs-string">"cuda"</span> <span class="hljs-keyword">if</span> torch.cuda.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">"cpu"</span>)
_ = model.to(device).eval()
</code></pre>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> random
<span class="hljs-keyword">import</span> time

sentences = [
    <span class="hljs-string">"In recent years, advancements in artificial intelligence (AI) have revolutionized various industries, from healthcare to finance and beyond. AI technologies such as machine learning and natural language processing have enabled computers to perform tasks that were once thought to be exclusive to human intelligence. For instance, AI-powered systems can now diagnose diseases, predict stock market trends, and even generate creative content like music and art. These developments have sparked both excitement and concern among experts and the general public. While AI offers immense potential for improving efficiency and solving complex problems, there are also fears about its impact on jobs, privacy, and ethical considerations surrounding its use."</span>,
    <span class="hljs-string">"The rise of renewable energy sources, such as solar and wind power, has gained significant momentum in recent years as the world seeks to address climate change and reduce reliance on fossil fuels. Governments, businesses, and individuals are increasingly investing in renewable energy infrastructure and technologies to transition towards a more sustainable energy system. Solar photovoltaic (PV) panels and wind turbines have become common sights in many parts of the world, harnessing the power of sunlight and wind to generate electricity. This shift towards renewable energy is not only driven by environmental concerns but also by economic factors, as the cost of renewable energy technologies continues to decline, making them increasingly competitive with traditional energy sources."</span>,
    <span class="hljs-string">"The internet has transformed the way we communicate, access information, and conduct business on a global scale. With the proliferation of smartphones and high-speed internet connections, people are more connected than ever before, allowing for instant communication and collaboration across geographical boundaries. Social media platforms have become central hubs for sharing ideas, connecting with friends and family, and consuming news and entertainment. E-commerce has also experienced exponential growth, with online shopping becoming a convenient and preferred method for many consumers. However, along with the benefits of connectivity come challenges such as cybersecurity threats, online privacy concerns, and the spread of misinformation. As the internet continues to evolve, it remains crucial for individuals, businesses, and policymakers to address these issues while harnessing the full potential of digital technology."</span>,
]

len_dataset = <span class="hljs-number">1</span>

texts = []
<span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(len_dataset):
    n_times = random.randint(<span class="hljs-number">1</span>, <span class="hljs-number">5</span>)
    texts.append(<span class="hljs-string">" "</span>.join(random.choice(sentences) <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(n_times)))
</code></pre>
<pre><code class="lang-python">summarization = pipeline(<span class="hljs-string">"summarization"</span>, model=model, tokenizer=tokenizer, max_length= <span class="hljs-number">100</span>)
start = time.time()
print(summarization(texts))
end = time.time()
print(<span class="hljs-string">f"Average response time for original T5: <span class="hljs-subst">{(end-start)/len_dataset}</span> ms"</span>)
</code></pre>
<p><strong>OUTPUT:</strong></p>
<p>'in recent years, advancements in artificial intelligence (AI) have revolutionized various industries, from healthcare to finance and beyond . AI-powered systems can now diagnose diseases, predict stock market trends, and even generate creative content like music and art .'</p>
<p>Inference Time: 3.17 s</p>
<pre><code class="lang-python">onnx_summarization = pipeline(<span class="hljs-string">"summarization"</span>, model=onnx_model, tokenizer=tokenizer, max_length=<span class="hljs-number">100</span>)
start = time.time()
print(onnx_summarization(texts))
end = time.time()
print(<span class="hljs-string">f"Average response time for optimized onnx T5: <span class="hljs-subst">{(end-start)/len_dataset}</span> ms"</span>)
</code></pre>
<p>'in recent years, advancements in artificial intelligence (AI) have revolutionized various industries, from healthcare to finance and beyond . AI-powered systems can now diagnose diseases, predict stock market trends, and even generate creative content like music and art .'</p>
<p>Inference Time: 1.17 s</p>
<p>The Same inference at 2.71 times the speed. Imaging what would happen if we Quantize the model too. That might be a bit out of scope for this article, but we could have a theoretical speedup of 9.99. Almost 10x of the base speed. Although both models we used were inherently different and this may not be a fair method of judging the speed of the LLM.</p>
<h2 id="heading-static-vs-dynamic-quantization">Static vs Dynamic Quantization</h2>
<p>As discussed previously, quantization is a process in machine learning and deep learning that reduces the precision of a model's weights and activations from floating-point numbers to integers. This is done to reduce the memory footprint and computational requirements of the model, making it more efficient and suitable for deployment on resource-constrained devices.</p>
<p>There are two types of quantization: static quantization and dynamic quantization:</p>
<h3 id="heading-static-quantization">Static Quantization</h3>
<p>In static quantization, the quantization parameters (such as the scale and zero-point) are determined during the training process or during a separate calibration step. The model is then quantized using these fixed parameters, and the resulting quantized model is used for inference.</p>
<ul>
<li><p><strong>Faster inference</strong>: Since the quantization parameters are fixed, the inference process is faster and more efficient.</p>
</li>
<li><p><strong>Lower memory usage</strong>: The quantized model requires less memory, making it suitable for deployment on devices with limited memory.</p>
</li>
</ul>
<h3 id="heading-dynamic-quantization">Dynamic Quantization</h3>
<p>In dynamic quantization, the quantization parameters are determined dynamically during inference, based on the input data. This means that the model adapts to the input distribution and adjusts the quantization parameters accordingly.</p>
<ul>
<li><strong>Improved accuracy</strong>: Dynamic quantization can adapt to changing input distributions, leading to improved accuracy and reduced accuracy loss.</li>
</ul>
<ul>
<li><strong>Flexibility</strong>: Dynamic quantization can be used on different hardware platforms and with different input distributions, without requiring retraining or recalibration.</li>
</ul>
<h3 id="heading-pre-training-quantization">Pre Training Quantization</h3>
<p>Pre-training quantization, also known as quantization-aware training, involves quantizing the model's weights and activations during the training process. This means that the model is trained using quantized values, rather than full-precision floating-point numbers.</p>
<p><strong>Advantages:</strong></p>
<ol>
<li><p><strong>Improved accuracy</strong>: Pre-training quantization can lead to improved accuracy, as the model is trained to adapt to the quantization noise and errors.</p>
</li>
<li><p><strong>Better optimization</strong>: The model is optimized for the quantized precision, which can lead to better convergence and optimization.</p>
</li>
<li><p><strong>Faster deployment</strong>: Since the model is already quantized, it can be deployed directly on hardware that supports quantized inference, without the need for additional quantization steps.</p>
</li>
</ol>
<p><strong>Challenges:</strong></p>
<ol>
<li><p><strong>Training complexity</strong>: Pre-training quantization can increase the training complexity, as the model needs to adapt to the quantization noise and errors.</p>
</li>
<li><p><strong>Hyperparameter tuning</strong>: Hyperparameter tuning can be more challenging, as the optimal hyperparameters may vary depending on the quantization precision.</p>
</li>
</ol>
<h3 id="heading-post-training-quantization">Post Training Quantization</h3>
<p>Post-training quantization, also known as quantization after training, involves quantizing a pre-trained model's weights and activations after the training process is complete. This is a more common approach, as it allows for the use of pre-trained models and fine-tuning them for specific hardware platforms.</p>
<p><strong>Advantages:</strong></p>
<ol>
<li><p><strong>Flexibility</strong>: Post-training quantization allows for the use of pre-trained models, which can be fine-tuned for specific hardware platforms.</p>
</li>
<li><p><strong>Simpler deployment</strong>: Post-training quantization is a simpler process, as it only requires quantizing the pre-trained model's weights and activations.</p>
</li>
<li><p><strong>Wider applicability</strong>: Post-training quantization can be applied to a wide range of models and hardware platforms.</p>
</li>
</ol>
<p><strong>Challenges:</strong></p>
<ol>
<li><p><strong>Accuracy loss</strong>: Post-training quantization can result in accuracy loss, as the model is not optimized for the quantized precision.</p>
</li>
<li><p><strong>Calibration required</strong>: Post-training quantization often requires calibration to determine the optimal quantization parameters, which can be time-consuming.</p>
</li>
</ol>
<p>In general, pre-training quantization can lead to improved accuracy, with models like MobileNetV2 achieving an accuracy of 72.0% on the ImageNet benchmark, while reducing the model size by 75%. On the other hand, post-training quantization can offer significant space savings, with models like ResNet-50 requiring only 7.5MB of storage space, a reduction of 90% compared to the full-precision model. With the post-training quantized model achieving an accuracy of 69.5% on the same benchmark. Despite this, post-training quantization can still be a viable option for many applications, especially those where memory constraints are a major concern.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In conclusion, optimizing inference time is crucial for making AI systems more accessible and sustainable. We explored techniques to speed up PyTorch models without sacrificing accuracy, including quantization, Optimum, and static vs dynamic quantization, demonstrating significant reductions in model size, inference time, and energy consumption. As AI continues to evolve, optimizing inference time will become increasingly important, and by leveraging these techniques, developers can build more efficient and deployable AI models, making AI more accessible and sustainable for a wider range of applications.</p>
]]></content:encoded></item><item><title><![CDATA[My Journey as a Developer - DevRetro 2023]]></title><description><![CDATA[Hello World 🤖
Hey folks, I've been putting off this article for what feels like an eternity—blame it on the whirlwind of University exams and then a generous sprinkle of holiday season lethargy. Another year of college is officially in the rearview ...]]></description><link>https://blog.arygarg.me/devretro-2023</link><guid isPermaLink="true">https://blog.arygarg.me/devretro-2023</guid><category><![CDATA[DevRetro]]></category><category><![CDATA[2023]]></category><category><![CDATA[Python]]></category><category><![CDATA[internships]]></category><category><![CDATA[SIH]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Sat, 23 Dec 2023 07:14:06 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/RCpEWDyC5sQ/upload/b0cd9ba168b3c783e1c3c53c1e8d4c39.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-hello-world">Hello World 🤖</h2>
<p>Hey folks, I've been putting off this article for what feels like an eternity—blame it on the whirlwind of University exams and then a generous sprinkle of holiday season lethargy. Another year of college is officially in the rearview mirror, and boy, do I have many more tales to spin compared to last year's Dev Retro. Let's kick things off by dissecting the grand plans I had envisioned for this year, contrasting them with the reality that unfolded. I'll then take you on a rollercoaster ride through the major events that peppered my journey in the tech realm this past year, capping it all off with some crystal ball gazing into what I predict awaits in the upcoming year. Buckle up, enjoy the read, and catch you on the flip side!</p>
<h2 id="heading-dissecting-my-predictions-from-last-year">Dissecting My Predictions from Last Year</h2>
<p>I'm going to be taking excerpts from last year's Dev Retro and trying to justify them a bit. Reading through my old Dev Retro did make me cringe a bit, but it's nice to see that I've grown both in my Technical skills as well as my Vocabulary skills (*wink* <em>wink</em> ChatGPT)</p>
<p>I thought I'd write last year's Dev Retro as a Bride prepares for an English Wedding, "Something Borrowed; Something Blue; Something Old; Something New," but looking back, that wasn't it. The titles seem outdated and inconsistent with the blog's core message.</p>
<blockquote>
<p>Technical, this year was the first year I started actual development, from tiny HTML Pages using the Basics of CSS to Dynamically Loaded Pages using the popular Framework Django</p>
</blockquote>
<p>I quit Web Development probably 5 seconds after that blog went out. It wasn't for me. I get so bored writing out the most basic logic for no reason.</p>
<blockquote>
<p>Starting with Data Science and understanding the math behind it, this year has allowed me to start growing and exploring</p>
</blockquote>
<p>There we go, that's more like it! Let's Go Data <s>Science. </s> I fell in love with data this year. Starting with Data Science and the ETL process, to Data Engineering, where I was able to Intern at one of India's Biggest FinTech companies, and finally, settling on a more mixed role in Data Engineering and Machine Learning (DL).</p>
<blockquote>
<p>Developing simple Dart apps using Flutter and understanding how DApps work in the Solidity Framework made me realize what the future of Web3 might look like</p>
</blockquote>
<p>Quit after 5 seconds. NEXT</p>
<blockquote>
<p>.. solving some (easy to medium difficulty) Data Structures and Algorithm problems.</p>
</blockquote>
<p>I wish I'd continued to do that. I wouldn't be scrambling for a summer Internship for the summer of 2023 (<a target="_blank" href="https://aryann.tech/resume">Resume Plug</a>, just in case you're a recruiter)</p>
<blockquote>
<p>Maybe not the most significant flex, but I also got featured on Hashnode's Twitter account!</p>
</blockquote>
<p>Yup, it's the most considerable flex of the 2020s to date. Allow me to embed it again ;)</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://twitter.com/hashnode/status/1561362487014002692">https://twitter.com/hashnode/status/1561362487014002692</a></div>
<p> </p>
<p>Yup, that's enough of a diss of 2022 Aryan. <a target="_blank" href="https://www.youtube.com/watch?v=_Yhyp-_hX2s">Snapping back to reality</a>, the following section focuses on what I accomplished this year.</p>
<h2 id="heading-hop-into-the-way-back-machine">Hop into the Way Back Machine</h2>
<h3 id="heading-makeathon-5-organising-it-all">Makeathon 5: Organising it all</h3>
<p>If you don't know, I'm the Joint Secretary of the <a target="_blank" href="https://mlsctiet.com">Microsoft Learn Student Chapter</a> present at my university. Still, this story predates that when I was just an ordinary "Core Member" (basically a glorified Grunt Worker). Anyway, we plan the most elaborate Hackathon in Punjab, where we Invite special guests and speakers and end it all with a 24-hour hackathon. What made this year special? I got to interview Mr. Mr. Mr. Richard Stallman (I hope someone gets that joke). It was amazing; a few friends and I got the unbelievable opportunity of a lifetime to interview the man who created GNU. It was a total fanboy moment, and I don't have words to express how amazing that one hour was for me. Unfortunately, Mr. Stallman got diagnosed with Cancer this year, and you can see it on his face in the image below. His contributions to Free and open source make the world we live in today.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703280433949/16d322a1-3e9c-4f00-84b9-60d5e22cb6d7.jpeg" alt class="image--center mx-auto" /></p>
<p>Here are a few more pictures from Makeathon, just because (sorry for the huge photo)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703280765664/c59004b7-5517-4299-a297-642a4a133d7a.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-interning-at-paytm">Interning at Paytm</h3>
<p>You read that right; I got the unholy opportunity to work at Paytm over the summer. I worked on an in-house SQL Optimizer for the Paytm Business Analysts in the Data Engineering Department. It was a stint for two months, and I learned a lot about the underlying processing of SQL because of the Internship. (Thanks, Vikash and Anand, for making it a fantastic learning opportunity)</p>
<h3 id="heading-visiting-pydelhi-and-almost-giving-a-lightning-talk">Visiting PyDelhi and (almost giving a Lightning Talk)</h3>
<p>PyDelhi was super fun. I got in super cheap because of the student discount and spent the two days in hostels to save on Hotels. I learned a lot and got decent Swags from the Sponsors ;). I also got to explore Delhi for the first time in years.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703281307327/c8fe3641-d425-4519-96f6-d3d65bc4ad88.jpeg" alt class="image--center mx-auto" /></p>
<p>Notice how my card doesn't have my name on it? They forgot to get my card printed on the first day, and I got it on Day 2. Still, it was a fantastic opportunity to learn from amazing people. (Plug to <a class="user-mention" href="https://hashnode.com/@vipulgupta2048">Vipul Gupta</a> (<a target="_blank" href="https://twitter.com/vipulgupta2048">X</a>); he had a fantastic talk)</p>
<h3 id="heading-internship-no-2-researching-for-dst-haryana">Internship No. 2: Researching for DST, Haryana</h3>
<p>I jumped when I saw a call for students for a research role for a project sponsored by the Department of Science and Technology, Haryana. I gave the interview a day early (that's how anxious I was) and got shortlisted. We're working on a way to regulate type 1 diabetes in patients using ML, but apart from that, I can't dive into the details just yet.</p>
<h3 id="heading-sih-so-close-yet-so-far">SIH: So close, yet so far</h3>
<p>Me, a classmate, her friend, and 3 of our seniors had teamed up for the Smart India Hackathon. The problem statement we chose to ideate on was about providing feedback to the government about its actions via social media, newspapers, e-newspapers, and articles (including YouTube) from the web. We worked on a prototype that used nontraditional Machine Learning applications like BERT and Whisper and presented our pitch deck. Unfortunately, we got put on the Waitlist for our Problem statement, and no other team backed out. It was a quick month, but we worked a lot. Cheers to SPAARS :)</p>
<h2 id="heading-working-on-next-year">Working on Next Year</h2>
<p>As for what's next, considering my love affair with data, diving into an open-source project in data science or machine learning sounds like a plan, and maybe, just maybe, organizing a tech event or workshop at my university to share the knowledge. Here's to more tech adventures in the coming year!</p>
]]></content:encoded></item><item><title><![CDATA[Why Postgres should be the last database you'll ever need]]></title><description><![CDATA[Being a sucker for reading unnecessary books in fields I have no experience in got me into flipping through the Google Site Reliability Engineering book, where I had read the most elegant concept that seems obvious at first but isn't applied in the r...]]></description><link>https://blog.arygarg.me/postgress-does-everything</link><guid isPermaLink="true">https://blog.arygarg.me/postgress-does-everything</guid><category><![CDATA[PostgreSQL]]></category><category><![CDATA[Databases]]></category><category><![CDATA[Redis]]></category><category><![CDATA[airflow]]></category><category><![CDATA[kafka]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Thu, 12 Oct 2023 11:02:46 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1696998690158/943b3e65-f011-4fb9-a723-2168ac3133e2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Being a sucker for reading unnecessary books in fields I have no experience in got me into flipping through the Google Site Reliability Engineering book, where I had read the most elegant concept that seems obvious at first but isn't applied in the real world very often. It read</p>
<blockquote>
<p>Simple software breaks less often and is easier and faster to fix when it does break</p>
<p>- Google's Site Reliability Engineering Book</p>
</blockquote>
<p>Thinking about this concept and how companies leverage a vast number of technologies to build their data-driven infrastructure, and with my limited experience using <s>the best</s> SQL-based database platform, I got the gears turning in my head. In a world where everyone uses Redis as an in-memory Cache, where Apache Kafka is the defacto real-time message queue, and where time series data has its own database, it seems a bit redundant because, in this limited example, we still need to be experts in all three. In this article, given the acid-compliant nature of Postgres, I'll focus on replacing all three problems with simple Postgres-based tables.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1696998554354/0a1becbf-81cf-47bc-8e5b-5d398aead26d.png" alt="Sample architecture for a multi-dependency project" class="image--center mx-auto" /></p>
<h2 id="heading-terminating-timescale">Terminating Timescale</h2>
<p>Replacing Timescale, the time-series database, with PostgreSQL involves leveraging the powerful features of PostgreSQL to handle time-series data efficiently. In PostgreSQL, the table layout is crucial in achieving optimal performance. Instead of relying on Timescale's hypertables, specialized tables for time-series data, you can create a regular PostgreSQL table with a timestamp column. Indexing this timestamp column is essential for quick data retrieval.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Step 1: Creating a Table</span>
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> time_series_data (
    <span class="hljs-keyword">id</span> <span class="hljs-built_in">SERIAL</span> PRIMARY <span class="hljs-keyword">KEY</span>,
    event_timestamp TIMESTAMPTZ <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    data_value <span class="hljs-keyword">DOUBLE</span> <span class="hljs-keyword">PRECISION</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    <span class="hljs-comment">-- Add other necessary columns as per your data requirements</span>
);

<span class="hljs-comment">-- Step 2: Indexing on Time Stamp</span>
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> idx_event_timestamp <span class="hljs-keyword">ON</span> time_series_data (event_timestamp);
</code></pre>
<p>To maintain the advantages of partitioning in Timescale, you can use PostgreSQL's table partitioning feature. This involves creating child tables inherited from a master table, each handling a specific time range. Proper indexing on these child tables ensures that queries for a particular period are executed swiftly.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Step 3: Partition the master table</span>
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> time_series_data_2023 <span class="hljs-keyword">PARTITION</span> <span class="hljs-keyword">OF</span> time_series_data
    <span class="hljs-keyword">FOR</span> <span class="hljs-keyword">VALUES</span> <span class="hljs-keyword">FROM</span> (<span class="hljs-string">'2023-01-01'</span>) <span class="hljs-keyword">TO</span> (<span class="hljs-string">'2024-01-01'</span>);
</code></pre>
<p>Similarly, you might want to alter your configuration settings to better align with how Timescale operates. This might mean changing some of the default options given below:</p>
<ol>
<li><p>Shared Buffer: 25% to 30% of available memory</p>
</li>
<li><p>Effective Cache Size: 50% to 75% of available memory</p>
</li>
<li><p>Working Memory: Adjust based on the complexity of queries and available memory</p>
</li>
<li><p>Maintenance Working Memory: Sufficient for maintenance operations like index creation</p>
</li>
<li><p>Write Ahead Log: Set to replica or logical for better performance</p>
</li>
<li><p>Maximum and Minimum WaL Value: Adjust based on the write intensity and available disk space.</p>
</li>
<li><p>Checkpoint Completion Target: Aim to balance write performance and checkpoint duration.</p>
</li>
<li><p>Auto Vacuum: Enable and configure autovacuum settings for automatic maintenance.</p>
<p> You can alter these via the following generic DML command</p>
<pre><code class="lang-sql"> <span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">SYSTEM</span> <span class="hljs-keyword">SET</span> shared_buffers = <span class="hljs-string">'2GB'</span>;
</code></pre>
</li>
</ol>
<h2 id="heading-killing-kafka-with-a-message-queue">Killing Kafka with a Message Queue</h2>
<p>A queue in its most basic essence is <strong>JUST A QUEUE</strong>, a data structure that follows the FIFO (First-in, First-Out) rule. That means that new data is added to the tail of the queue, and data is read from the head. So, a straightforward implementation will only deal with enqueuing messages to the tail and dequeuing from the head. The table would have the following schema:</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Step 1: Create the Schema</span>
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> queue_table (
    <span class="hljs-keyword">id</span> <span class="hljs-keyword">UUID</span> PRIMARY <span class="hljs-keyword">KEY</span>,
    inserted_at <span class="hljs-built_in">TIMESTAMP</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span> <span class="hljs-keyword">DEFAULT</span> <span class="hljs-keyword">NOW</span>(),
    message_payload <span class="hljs-built_in">BLOB</span>
);

<span class="hljs-comment">-- Step 2: Index the inserted_at column on its sorted values</span>
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> inserted_at_idx
    <span class="hljs-keyword">ON</span> queue_table (inserted_at <span class="hljs-keyword">ASC</span>);
</code></pre>
<p>Now, to insert the data into the queue is as simple as inserting data into the table, and to receive and delete, we can delete the data and return them.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Adding to Queue</span>
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> queue_table (<span class="hljs-keyword">id</span>, inserted_at, message_payload)
    <span class="hljs-keyword">VALUES</span> (gen_random_uuid(), <span class="hljs-keyword">NOW</span>(), RAWTOHEX(<span class="hljs-string">'top secret information'</span>));

<span class="hljs-comment">-- Returning the Data</span>
<span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">FROM</span> queue_table qt
<span class="hljs-keyword">WHERE</span> qt.id =
    (<span class="hljs-keyword">SELECT</span> qt_inner.id
    <span class="hljs-keyword">FROM</span> queue_table qt_inner
    <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> qt_inner.inserted_at <span class="hljs-keyword">ASC</span>
    <span class="hljs-keyword">FOR</span> <span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">SKIP</span> <span class="hljs-keyword">LOCKED</span>
    <span class="hljs-keyword">LIMIT</span> <span class="hljs-number">1</span>)
<span class="hljs-keyword">RETURNING</span> qt.id, qt.inserted_at, qt.message_payload;
</code></pre>
<p>Although not the most optimal, it can still comfortably do a few thousand transactions per second.</p>
<h2 id="heading-redundant-redis">Redundant Redis</h2>
<p>In PostgreSQL, you can create a cache table with columns such as key and corresponding value pair to store the cached data. We will also require a timestamp to tell when the content was added. Indexing the key column ensures fast retrieval of cached values. To emulate Redis's in-memory performance, consider adjusting PostgreSQL's configuration settings. Increase shared_buffers to allocate more memory for caching and adjust effective_cache_size accordingly. Additionally, configure work_mem to optimize memory usage during query execution. While PostgreSQL may not match Redis in pure in-memory caching speed, its versatility and integration capabilities make it a compelling alternative. Implementing</p>
<p>this in Postgres would involve the following table creation and modification commands.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Step 1: Create table to cache results</span>
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> redis_type_cache (
    _key <span class="hljs-built_in">TEXT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    _value <span class="hljs-built_in">TEXT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    inserted_at <span class="hljs-built_in">TIMESTAMP</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span> <span class="hljs-keyword">DEFAULT</span> <span class="hljs-keyword">NOW</span>()
);

<span class="hljs-comment">-- Step 2: Create an index on _KEY</span>
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> redis_type_cache_key <span class="hljs-keyword">ON</span> redis_type_cache <span class="hljs-keyword">USING</span> <span class="hljs-keyword">HASH</span> (_key);
</code></pre>
<p>Note that this will keep writing to the cache infinitely, which will hog all your memory. To solve this, we can use a corn job to remove old records by following the steps given below:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">OR</span> <span class="hljs-keyword">REPLACE</span> <span class="hljs-keyword">FUNCTION</span> delete_old_rows()
<span class="hljs-keyword">RETURNS</span> <span class="hljs-built_in">VOID</span> <span class="hljs-keyword">AS</span> $$
<span class="hljs-keyword">BEGIN</span>
    <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">FROM</span> redis_type_cache
    <span class="hljs-keyword">WHERE</span> inserted_at &lt; <span class="hljs-keyword">NOW</span>() - <span class="hljs-built_in">INTERVAL</span> <span class="hljs-string">'36 hours'</span>;
<span class="hljs-keyword">END</span>;
$$ LANGUAGE plpgsql;
</code></pre>
<pre><code class="lang-bash">
PG_HOST=<span class="hljs-string">"your_host"</span>
PG_DATABASE=<span class="hljs-string">"your_database"</span>
PG_USER=<span class="hljs-string">"your_user"</span>
PG_PASSWORD=<span class="hljs-string">"your_password"</span>  <span class="hljs-comment"># Consider using a more secure method for password handling</span>

psql -h <span class="hljs-variable">$PG_HOST</span> -d <span class="hljs-variable">$PG_DATABASE</span> -U <span class="hljs-variable">$PG_USER</span> -c <span class="hljs-string">"SELECT delete_old_rows();"</span> -W <span class="hljs-variable">$PG_PASSWORD</span>
<span class="hljs-comment"># save file as delete_row.sh</span>
</code></pre>
<pre><code class="lang-bash">
&gt; chmod +x delete_old_rows.sh  <span class="hljs-comment"># make the file executable</span>
&gt; crontab -e

<span class="hljs-comment"># add the following to crontab so that the file runs everyday at 6pm</span>
0 18 * * * delete_row.sh
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In conclusion, the idea of simplifying data infrastructure by leveraging the power of a versatile and reliable database like PostgreSQL is compelling. By replacing specialized tools like Timescale, Kafka, and Redis with well-designed PostgreSQL tables, you can achieve simplicity, robustness, and ease of maintenance.</p>
<p>For time-series data, the approach involves creating regular PostgreSQL tables with proper indexing and partitioning for efficient data retrieval. In the realm of message queues, a basic FIFO queue can be implemented using a simple table structure. This approach eliminates the need for Apache Kafka, offering a straightforward and efficient way to handle message queuing directly within PostgreSQL. Even for in-memory caching, PostgreSQL can serve as a capable alternative to Redis. By creating a cache table with appropriate indexing and periodic cleanup processes, you can achieve caching functionality within the same database that handles other aspects of your data.</p>
<p>The key takeaway is that simplicity often leads to reliability. A consolidated approach using PostgreSQL not only simplifies the technology stack but also makes it easier to manage and maintain the entire data infrastructure. That being said, there can be no true alternatives to the dependencies mentioned in the article, because of their accepted use in corportations, but it was fun thinking that in an alternative universe, everything could just be Postgres. No doubt, if you need to scale your infrastructure to the level of Google or Microsoft, where you deal in petabytes of data each week, these expert technologies are optimised for results, but if you are just starting our or creating a hobby project, a simple Postgres Database isn't all that bad.</p>
]]></content:encoded></item><item><title><![CDATA[The Subtle Art of Story-Telling Using Tableau]]></title><description><![CDATA[Data tells us a story no author could ever compose. It shows us never before observed patterns that may slip through the crack. The power of data analytics is more important than ever in the rapid-paced market, where the slightest difference is enoug...]]></description><link>https://blog.arygarg.me/story-telling-using-tableau</link><guid isPermaLink="true">https://blog.arygarg.me/story-telling-using-tableau</guid><category><![CDATA[tableau]]></category><category><![CDATA[#data visualisation]]></category><category><![CDATA[Story]]></category><category><![CDATA[Tutorial]]></category><category><![CDATA[data]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Thu, 13 Jul 2023 03:55:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1689192690402/b84eb0bf-3e93-4f11-8f6b-b1e73264218d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Data tells us a story no author could ever compose. It shows us never before observed patterns that may slip through the crack. The power of data analytics is more important than ever in the rapid-paced market, where the slightest difference is enough to save or cost a company millions of dollars in revenue. This high level of data analytics was often hidden behind layers of complex programming languages and frameworks. Still, since Tableau released its product in 2003, it has helped thousands of companies visualize billions of rows worth of data.</p>
<p>Tableau is a powerful data visualization and business intelligence tool that allows users to analyze and present data visually, engaging, and interactively. With a user-friendly interface, Tableau enables individuals and organizations to easily connect to various data sources, whether spreadsheets, databases, or cloud services. It offers a wide range of visualization options, including charts, graphs, maps, and dashboards, which enables users to explore data from different angles and gain valuable insights. Tableau and its lesser-known associate, Tableau Prep, provide a low-code application to import, clean, and optimize your data sources from a central data lake before using them for visualizations. In this article, we will discuss an example dataset I cleaned using Tableau Prep and then visualized using various KPIs and graphs on Tableau Desktop.</p>
<h2 id="heading-getting-to-know-the-data">Getting to know the data</h2>
<p>This specific dataset is a collection of datasets I found on <a target="_blank" href="https://www.data.gov.in">data.gov.in</a>. It contains data about the percent distribution and absolute number of foreign individuals that entered the country in various years (2001 - 2020), the amount of money (USD and INR) spent by foreign visitors, and area-specific domestic and foreign foot traffic. Samples of the data have been provided below, but you can download the data from these sources <a target="_blank" href="https://data.gov.in/files/ogdpv2dms/s3fs-public/India-Tourism-Statistics-2021-Table-2.7.1.csv">[1][2][3]</a>.</p>
<table><tbody><tr><td><p>Year</p></td><td><p>FTAs</p></td><td><p>% distribution by Age- Group (in years) - 0-14</p></td><td><p>% distribution by Age- Group (in years) - 15-24</p></td><td><p>% distribution by Age- Group (in years) - 25-34</p></td><td><p>% distribution by Age- Group (in years) - 35-44</p></td><td><p>% distribution by Age- Group (in years) - 45-54</p></td><td><p>% distribution by Age- Group (in years) - 55-64</p></td><td><p>% distribution by Age- Group (in years) - 65 &amp; above</p></td><td><p>% distribution by Age- Group (in years) - Not Reported</p></td></tr><tr><td><p>2001</p></td><td><p>2537282</p></td><td><p>7</p></td><td><p>10.8</p></td><td><p>20.1</p></td><td><p>21.1</p></td><td><p>19.4</p></td><td><p>11.9</p></td><td><p>6.7</p></td><td><p>3</p></td></tr><tr><td><p>2002</p></td><td><p>2384364</p></td><td><p>9.2</p></td><td><p>10</p></td><td><p>19.4</p></td><td><p>21.6</p></td><td><p>19.4</p></td><td><p>11.5</p></td><td><p>7.7</p></td><td><p>1.2</p></td></tr><tr><td><p>2003</p></td><td><p>2726214</p></td><td><p>7.2</p></td><td><p>10</p></td><td><p>19.5</p></td><td><p>21.6</p></td><td><p>19.4</p></td><td><p>11.5</p></td><td><p>7.7</p></td><td><p>3.1</p></td></tr><tr><td><p>2004</p></td><td><p>3457477</p></td><td><p>8.5</p></td><td><p>9.8</p></td><td><p>18.8</p></td><td><p>21.3</p></td><td><p>19.4</p></td><td><p>12.8</p></td><td><p>8.2</p></td><td><p>0.2</p></td></tr><tr><td><p>2005</p></td><td><p>3918610</p></td><td><p>8.6</p></td><td><p>9.6</p></td><td><p>18.8</p></td><td><p>21.3</p></td><td><p>19.5</p></td><td><p>13</p></td><td><p>8.7</p></td><td><p>0.5</p></td></tr></tbody></table>

<table><tbody><tr><td><p>Circle</p></td><td><p>Name of the Monument </p></td><td><p>Domestic-2019-20</p></td><td><p>Foreign-2019-20</p></td><td><p>Domestic-2020-21</p></td><td><p>Foreign-2020-21</p></td><td><p>% Growth 2021-21/2019-20-Domestic</p></td><td><p>% Growth 2021-21/2019-20-Foreign</p></td></tr><tr><td><p>Agra</p></td><td><p>Taj Mahal</p></td><td><p>4429710</p></td><td><p>645415</p></td><td><p>1259892</p></td><td><p>9034</p></td><td><p>-71.56</p></td><td><p>-98.6</p></td></tr><tr><td><p>Agra</p></td><td><p>Agra Fort</p></td><td><p>1627154</p></td><td><p>386522</p></td><td><p>371242</p></td><td><p>2810</p></td><td><p>-77.18</p></td><td><p>-99.27</p></td></tr><tr><td><p>Agra</p></td><td><p>Fatehpur Sikri</p></td><td><p>454376</p></td><td><p>184751</p></td><td><p>107835</p></td><td><p>574</p></td><td><p>-76.27</p></td><td><p>-99.69</p></td></tr><tr><td><p>Agra</p></td><td><p>Akbar Tomb Sikandra</p></td><td><p>229270</p></td><td><p>19625</p></td><td><p>99509</p></td><td><p>321</p></td><td><p>-56.6</p></td><td><p>-98.36</p></td></tr><tr><td><p>Agra</p></td><td><p>Mariam tomb Sikandra</p></td><td><p>22517</p></td><td><p>414</p></td><td><p>9765</p></td><td><p>31</p></td><td><p>-56.63</p></td><td><p>-92.51</p></td></tr></tbody></table>

<table><tbody><tr><td><p>Year</p></td><td><p>FEE in <code>terms -</code>Crore</p></td><td><p>FEE in ` terms - % Change over previous year</p></td><td><p>FEE in US$ terms - US $ Million</p></td><td><p>FEE in US$ terms - % Change over previous year</p></td></tr><tr><td><p>1991</p></td><td><p>4318</p></td><td><p>NA</p></td><td><p>1861</p></td><td><p>NA</p></td></tr><tr><td><p>2001</p></td><td><p>15083</p></td><td><p>-3.5</p></td><td><p>3198</p></td><td><p>-7.6</p></td></tr><tr><td><p>2002</p></td><td><p>15064</p></td><td><p>-0.1</p></td><td><p>3103</p></td><td><p>-3</p></td></tr><tr><td><p>2003</p></td><td><p>20729</p></td><td><p>37.6</p></td><td><p>4463</p></td><td><p>43.8</p></td></tr><tr><td><p>2004</p></td><td><p>27944</p></td><td><p>34.8</p></td><td><p>6170</p></td><td><p>38.2</p></td></tr></tbody></table>

<h2 id="heading-data-cleaning">Data Cleaning 🧹</h2>
<p>After loading the data, the first step of any Data Visualisation Project is to clean it so that your visualizations can be neat and convey all the relevant information you extract. Of course, this is possible using Python and accessory modules like Pandas and Numpy, but Tableau Prep provides a low/no-code experience. The most you'll ever code is when writing basic SQL queries. Our complete "Data Cleaning Pipeline" is strictly no-code and, in its entirety, can be seen below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1689183541128/2d784fb5-808b-41e8-a3a5-f53fdd1bf91c.png" alt class="image--center mx-auto" /></p>
<p>Here, I've labeled each step to understand better what it does. Still, the basic gist includes renaming columns to more accurately portray their meaning, Altering and regrouping these columns to negate outliers in data better, and then joining the two data sources (via Inner Join) to get our final output. Below we can see one of the two final data sources.</p>
<table><tbody><tr><td><p>Year</p></td><td><p>FTAs</p></td><td><p>Age % 0-14</p></td><td><p>Age % 15-24</p></td><td><p>Age % 25-34</p></td><td><p>Age % 35-44</p></td><td><p>Age % 45-54</p></td><td><p>Age % 55-64</p></td><td><p>Age % 65+</p></td><td><p>Age % Not Reported</p></td><td><p>FEE in INR Crore</p></td><td><p>FEE in % Change over previous year (INR)</p></td><td><p>FEE in US $ Million</p></td><td><p>FEE in % Change over previous year (US$)</p></td></tr><tr><td><p>1/1/2001</p></td><td><p>2537282</p></td><td><p>0.07</p></td><td><p>0.108</p></td><td><p>0.201</p></td><td><p>0.211</p></td><td><p>0.194</p></td><td><p>0.119</p></td><td><p>0.067</p></td><td><p>0.03</p></td><td><p>15083</p></td><td><p>-0.035</p></td><td><p>3198</p></td><td><p>-0.076</p></td></tr><tr><td><p>1/1/2002</p></td><td><p>2384364</p></td><td><p>0.092</p></td><td><p>0.1</p></td><td><p>0.194</p></td><td><p>0.216</p></td><td><p>0.194</p></td><td><p>0.115</p></td><td><p>0.077</p></td><td><p>0.012</p></td><td><p>15064</p></td><td><p>-0.001</p></td><td><p>3103</p></td><td><p>-0.03</p></td></tr><tr><td><p>1/1/2003</p></td><td><p>2726214</p></td><td><p>0.072</p></td><td><p>0.1</p></td><td><p>0.195</p></td><td><p>0.216</p></td><td><p>0.194</p></td><td><p>0.115</p></td><td><p>0.077</p></td><td><p>0.031</p></td><td><p>20729</p></td><td><p>0.376</p></td><td><p>4463</p></td><td><p>0.438</p></td></tr><tr><td><p>1/1/2004</p></td><td><p>3457477</p></td><td><p>0.085</p></td><td><p>0.098</p></td><td><p>0.188</p></td><td><p>0.213</p></td><td><p>0.194</p></td><td><p>0.128</p></td><td><p>0.082</p></td><td><p>0.002</p></td><td><p>27944</p></td><td><p>0.348</p></td><td><p>6170</p></td><td><p>0.382</p></td></tr><tr><td><p>1/1/2005</p></td><td><p>3918610</p></td><td><p>0.086</p></td><td><p>0.096</p></td><td><p>0.188</p></td><td><p>0.213</p></td><td><p>0.195</p></td><td><p>0.13</p></td><td><p>0.087</p></td><td><p>0.005</p></td><td><p>33123</p></td><td><p>0.185</p></td><td><p>7493</p></td><td><p>0.214</p></td></tr></tbody></table>

<h2 id="heading-time-to-visualize-visualize-visualize">Time to Visualize, Visualize, Visualize 📊</h2>
<p>Quoting <a target="_blank" href="https://www.linkedin.com/in/mrdbourke">Daniel Bourke</a>, a personal hero, let's begin visualizing the data we just created. Luckily, Tableau Prep extracts can be opened directly into Tableau Desktop as a <code>.hyper</code>, <code>.csv</code> or a <code>.xlsx</code> file. Here we will also use our second data source, available as download file 2. Getting straight to the point, we see all our data sources and relevant column names on the left-hand pane after we import our data sources.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1689184533276/746490a1-0b3e-4d0c-bbe4-fc96d745624b.png" alt class="image--center mx-auto" /></p>
<p>The names in blue are known as discrete values, while the ones in green are known as continuous values. More information can be found in <a target="_blank" href="https://help.tableau.com/current/pro/desktop/en-us/datafields_typesandroles.htm">this</a> article by Tableau, but to explain with a table:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Blue Fields</strong></td><td><strong>Green Fields</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Data type</strong></td><td>Discrete</td><td>Continuous</td></tr>
<tr>
<td><strong>How data is displayed</strong></td><td>Headers</td><td>Axes</td></tr>
<tr>
<td><strong>Examples</strong></td><td>State, Country, Product Name</td><td>Sales, Profit, Weight</td></tr>
</tbody>
</table>
</div><p>On the right of the pane, we see our workspace, where we can drag and drop our columns to create KPIs, graphs, and dashboards. I won't be going through how to make every KPI or visualization on Tableau, but we'll construct basic graphs based on the available measures and dimensions. Below are some of the more interesting plots.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1689188143529/89714e3c-62e1-4de3-87b2-f20f7684eeab.png" alt class="image--center mx-auto" /></p>
<p>Something interesting I found was the year-on-year growth for 2001-2019, but because of the COVID-19 Pandemic, we can see the money spent in 2020 was equivalent to 2008, a 12-year deficit.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1689188187001/5342cba2-ce9e-475c-aa22-341181d5756a.png" alt class="image--center mx-auto" /></p>
<p>Even though Agra is 4th in terms of the number of monuments, it is the city where the most amount of foreign income is generated (because of the Taj Mehal and surrounding Monuments)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1689188268319/8d67c942-7d74-4529-9bd2-2347e1d57db8.png" alt class="image--center mx-auto" /></p>
<p>It's shocking that even though Mumbai has the highest number of monuments, its gross income from foreign and domestic tourists places it close to the middle of the total rankings.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1689188271924/e79d2585-c8aa-4b53-99a9-aef4bcf80e7f.png" alt class="image--center mx-auto" /></p>
<p>It isn't surprising to see how strong a hold the Taj Mahal has compared to other monuments in terms of International and Domestic earnings. It is about 20% of the international income from tourism.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>Tableau is a fine piece of software that makes data visualizations easy to make and, with its interactive menus, ensures that little to no code is required to complete the toughest visualizations. From simple bar graphs to parsing GeoData via coordinates or location names, Tableau can speed up the data analyzing task. It even provides ways of importing your data from Google BigQuery or Amazon Redshift. But it does lack the satisfaction of coding, which I severely missed while working on this project. The complete data visualization can be found <a target="_blank" href="https://public.tableau.com/app/profile/aryan.garg8873/viz/Tourism-India_16838194207330/Dashboard">here</a> on Tableau Public.</p>
]]></content:encoded></item><item><title><![CDATA[Mojo Programming Language: The Future of Data Science?]]></title><description><![CDATA[In recent years, the field of data science has exploded in popularity. With the ever-increasing amount of data being generated, there is a growing demand for professionals who can collect, analyze, and interpret this data. However, one of data scient...]]></description><link>https://blog.arygarg.me/mojo</link><guid isPermaLink="true">https://blog.arygarg.me/mojo</guid><category><![CDATA[Data Science]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[datascience]]></category><category><![CDATA[Machine Learning]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Sun, 14 May 2023 02:30:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ieic5Tq8YMk/upload/0c6105b57edd9a2f7dfc9476d332628c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In recent years, the field of data science has exploded in popularity. With the ever-increasing amount of data being generated, there is a growing demand for professionals who can collect, analyze, and interpret this data. However, one of data scientists' most significant challenges is the lack of a suitable programming language.</p>
<p>Python is the most popular programming language for data science, but it has some significant limitations. For example, Python is inefficient, making it difficult to train large machine-learning models. Additionally, Python is unsuited for writing low-level code, often necessary for working with AI hardware.</p>
<p>This is where Mojo comes in. Mojo is a new programming language that was designed specifically for data science. It combines Python's usability with C's performance, making it the ideal language for developing and deploying AI applications.</p>
<h2 id="heading-features-of-mojo">Features of Mojo</h2>
<ul>
<li><p><strong>Performance:</strong> Mojo is up to <strong>35,000</strong> times faster than Python, making it possible to train large machine-learning models in a fraction of the time.</p>
</li>
<li><p><strong>Flexibility:</strong> Mojo is a general-purpose programming language that can be used for various tasks.</p>
</li>
<li><p><strong>Ease of use:</strong> Mojo has a clean syntax that is easy to learn and use.</p>
</li>
<li><p><strong>Community support:</strong> Mojo has a strong community of developers constantly adding new features and improvements.</p>
</li>
</ul>
<p><img src="https://forums.fast.ai/uploads/default/optimized/3X/3/4/3469e0355fff434928bb1134b7b572d9b3b0033c_2_690x329.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-why-mojo-might-be-a-paradigm-shift-in-data-science">Why Mojo Might be a paradigm shift in data science</h2>
<p>Mojo has the potential to revolutionize the field of data science by providing a powerful and flexible programming language that is well-suited for developing and deploying AI applications. With its speed, efficiency, and ease of use, Mojo can help data scientists to be more productive and to create more powerful AI models.</p>
<h2 id="heading-current-drawbacks">Current Drawbacks</h2>
<ul>
<li><p><strong>Mojo is still under development:</strong> Mojo is a relatively new programming language, and it is still under development. This means there may be some bugs or limitations that have not yet been addressed.</p>
</li>
<li><p><strong>Mojo is not as widely adopted as Python:</strong> Mojo is not as widely adopted as Python, which means that fewer resources may be available for learning and using the language.</p>
</li>
<li><p><strong>Mojo is not as well-suited for some tasks as Python:</strong> Mojo is a general-purpose programming language but not as well-suited for some tasks as Python. For example, Mojo is not as good at writing web applications as Python.</p>
</li>
</ul>
<h2 id="heading-where-you-can-check-out-the-language">Where you can check out the language</h2>
<p>The Mojo programming language is still under development but is available for preview on the Modular website. To learn more about Mojo, visit the Modular website or join the Mojo community on Discord.</p>
]]></content:encoded></item><item><title><![CDATA[Dev Retro 2022: Begining my Journey into Development]]></title><description><![CDATA[Introduction
What do Google, YouTube, Spotify, Reddit, Apple, and Snapchat have in Common? Except for the fact that they are multi-billion dollar tech giants whose algorithms know us better than our own family. They all gave us personal "stat cards" ...]]></description><link>https://blog.arygarg.me/dev-retro-2022</link><guid isPermaLink="true">https://blog.arygarg.me/dev-retro-2022</guid><category><![CDATA[#DevRetro2022]]></category><category><![CDATA[#DevRetro2022 #hashnode]]></category><category><![CDATA[development]]></category><category><![CDATA[General Programming]]></category><category><![CDATA[reflection]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Sat, 07 Jan 2023 10:23:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/TgGipdWWDuA/upload/bdb2a8ec1dc30a3c38ba9f608d512df9.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>What do Google, YouTube, Spotify, Reddit, Apple, and Snapchat have in Common? Except for the fact that they are multi-billion dollar tech giants whose algorithms know us better than our own family. They all gave us personal "stat cards" about the year, from <a target="_blank" href="https://www.youtube.com/watch?v=4WXs3sKu41I">This Year in Search</a> by Google to <a target="_blank" href="https://newsroom.spotify.com/2022-wrapped/">Wrapped</a> by Spotify. Since this is my first year of blogging, I could only publish 13 articles. Next year, I hope to at least double that, if not triple the number, in the mid-30s! So here is my attempt at HashNode's Iconic New (soon-to-be) Ritual of Dev Retro 2022!</p>
<h1 id="heading-learning-something-new">Learning Something New 💡</h1>
<p>The year started unlike any year whatsoever. After almost a year of using protective gear and sanitizing my hands, COVID finally got to me, and I was stuck in COVID Isolation for the first couple of days of the New Year. That's when I learned the art of taking things slow; instead of rushing into it, I took it slow and processed every step of the journey, from medication to recovery to even post-COVID symptoms. Moving on to be more Technical, this year was the first year I started actual development, from tiny HTML Pages using the Basics of CSS to Dynamically Loaded Pages using the popular Framework Django for the backend, from creating a Discord bot for fun to creating a Discord Bot for a server with over 12,000 members. Starting with Data Science and understanding the math behind it, this year has allowed me to start growing and exploring. It allowed me to branch out and get a general lay of the land. Developing simple Dart apps using Flutter and understanding how DApps work in the Solidity Framework made me realize what the future of Web3 might look like. I completed my first year of college, which was a well-welcomed change. Meeting new people allowed me to broaden my horizons and ultimately learn even more than ever before. Maybe not the most significant flex, but I also got featured on Hashnode's Twitter account!</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://twitter.com/hashnode/status/1561362487014002692">https://twitter.com/hashnode/status/1561362487014002692</a></div>
<p> </p>
<h1 id="heading-improving-on-something-old">Improving on Something Old 💹</h1>
<p>"What skills did I bring into 2022?" was the first real question I had while writing this part of the article. Thinking about it, I came into the year with no fundamental skills. Knew a bit of Python from the Good Ol' School days. But The first year of college helped broaden my horizons on what domains I've now been able to discover. From the first Hello World in C to now solving some (easy to medium difficulty) Data Structures and Algorithm problems. Although I haven't started doing Leetcode, understanding the steps necessary to solve a problem is always the first step. I understood the Dev cycle and ideated on projects that could solve the future's problems.</p>
<h1 id="heading-ending-it-on-a-good-note">Ending it on a Good Note 🙌🏻</h1>
<p>Of course, this wouldn't have been possible if I hadn't discovered the concept of Technical Writing initially to apply for Microsoft Student Ambassador Program. Still, I ultimately decided not to apply. Hopefully, another year filled with tech and, this time, improving my skills to make the best out of myself! 2022 was full of learning, while 2023 will be filled with understanding and deploying ;)</p>
]]></content:encoded></item><item><title><![CDATA[Penalties and The World Cup ⚽]]></title><description><![CDATA[Introduction and Inspiration 💡
First of all, Welcome back to Technical Speaking. It's been a while (three months, to be exact), and while I was busy with Uni and didn't get a chance to write, I'm still writing this blog while my End Semester Exams a...]]></description><link>https://blog.arygarg.me/penalties-and-the-world-cup</link><guid isPermaLink="true">https://blog.arygarg.me/penalties-and-the-world-cup</guid><category><![CDATA[Data Science]]></category><category><![CDATA[Python]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[Tutorial]]></category><category><![CDATA[football]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 19 Dec 2022 05:58:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1670614786064/Xd_YPYgF2.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction-and-inspiration">Introduction and Inspiration 💡</h1>
<p>First of all, Welcome back to Technical Speaking. It's been a while (three months, to be exact), and while I was busy with Uni and didn't get a chance to write, I'm still writing this blog while my End Semester Exams are going on. Not the brightest idea, but It's also World Cup season, and rules are meant to be broken this Holiday.</p>
<p>Getting into the real tech here, I found this excellent video by <a target="_blank" href="https://www.youtube.com/watch?v=HAuwPue57Vs">Vox Media</a>, where they used a dataset of all the penalty shootouts in modern World Cup History (1982-Present). The dataset contained relevant data from the 1982 World cup in Spain to the 2018 World Cup in Russia. I thought of taking it a step further by adding the data for the 2022 World Cup in Qatar. Currently, the Quarter Finals are underway, and I'm watching Argentina decimate the Dutch 1-0 (watch me regret these words later).</p>
<h1 id="heading-tech-stack-and-code">Tech Stack and Code 👨🏻‍💻</h1>
<p>I tried using <code>plotly</code> for my graphs, this time since Matplotlib graphs are a bit stale. Partially since a few Kaggle Graphs were already pre-written in plotly, and well *Hippady Hoppady, your code is now my property*.</p>
<p>Enough chatter; let's dive deep into the code ;)</p>
<p>Note: Even though this article may appear to be written in the past (because it was), all the graphs are Up-to-date as of the 18th of December, 2022 after the final.</p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> pandas
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> plotly.express <span class="hljs-keyword">as</span> px
<span class="hljs-keyword">from</span> plotly.offline <span class="hljs-keyword">import</span> init_notebook_mode, iplot
<span class="hljs-keyword">import</span> base64
init_notebook_mode()

df = pd.read_csv(<span class="hljs-string">'WorldCupShootouts.csv'</span>)
print(<span class="hljs-string">"CSV file Loaded"</span>)
</code></pre>
<p>Just your standard importing of Libraries as well as creating the initial DataFrame from the CSV</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1670615505926/pdLv9OuTq.png" alt="image.png" /></p>
<h2 id="heading-bar-graphs">Bar Graphs 🤓</h2>
<p>Displaying the DataFrame, we can see over 330 Spot Kicks spread over 40 years, starting from Germany vs. France 2(5)-2(4) in the 1982 Spain World Cup Semi-Finals.</p>
<pre><code class="lang-py">df_country_count = pd.DataFrame(df.dropna().groupby([<span class="hljs-string">'Team'</span>]).size()).sort_values(by=<span class="hljs-number">0</span>,ascending=<span class="hljs-literal">False</span>).reset_index().rename(columns={<span class="hljs-number">0</span>:<span class="hljs-string">"Total Penalty Kicks"</span>})
px.bar(df_country_count, x=<span class="hljs-string">'Team'</span>, y=<span class="hljs-string">"Total Penalty Kicks"</span>).show()
</code></pre>
<p>Now by graphing a plot between the Team Name and corresponding Penalty kicks, we can see that <strong>Argentina</strong> has had the most Penalty Opportunities.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671389726411/Q3KoJTLDb.png" alt class="image--center mx-auto" /></p>
<p>And by plotting a similar graph, we can see that Argentina scored the most significant number of penalties.</p>
<pre><code class="lang-py">df_most_goals = pd.DataFrame(df[df.Goal==<span class="hljs-number">1</span>].groupby([<span class="hljs-string">'Team'</span>]).size()).sort_values(by=<span class="hljs-number">0</span>, ascending=<span class="hljs-literal">False</span>).reset_index().rename(
    columns={<span class="hljs-number">0</span>: <span class="hljs-string">"Total Penalties Scored"</span>})
px.bar(df_most_goals, x=<span class="hljs-string">'Team'</span>, y=<span class="hljs-string">"Total Penalties Scored"</span>).show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671390012553/vvAEYt38v.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-goals-zones">Goals Zones 🥅</h2>
<p>Now we're going to look at goal zones and how shots were fired; for this, you will need to understand how zoning works. Here is a small infographic so you can understand how it works ;)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1670617736037/34ShZdxGm.png" alt="goal.png" /></p>
<p>Now that you have a general sense of how the graphs will look let's define the function to help create these beautiful graphs.</p>
<pre><code class="lang-py"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">show_shots</span>(<span class="hljs-params">df: pandas.DataFrame, x, y, size, size_max, hover_name, hover_data, color, title, image_filename=<span class="hljs-string">"goal.jpg"</span></span>):</span>
    init_notebook_mode()
    fig = px.scatter(df,
                 x=x,
                 y=y,  
                 size= size,
                 size_max = size_max,
                 color = color,
                 hover_name = hover_name,
                 hover_data = hover_data,
                 range_x = (<span class="hljs-number">0</span>,<span class="hljs-number">900</span>),
                 range_y = (<span class="hljs-number">581</span>,<span class="hljs-number">0</span>),
                 width = <span class="hljs-number">900</span>,
                 height = <span class="hljs-number">581</span>,
                 labels = {x:<span class="hljs-string">''</span>, y:<span class="hljs-string">''</span>})
    plotly_logo = base64.b64encode(open(image_filename, <span class="hljs-string">'rb'</span>).read())
    fig.update_layout(xaxis_showgrid=<span class="hljs-literal">False</span>, 
                    yaxis_showgrid=<span class="hljs-literal">False</span>,
                    xaxis_showticklabels=<span class="hljs-literal">False</span>,
                    yaxis_showticklabels=<span class="hljs-literal">False</span>,
                    title= title,
                    images= [dict(
                    source=<span class="hljs-string">'data:image/jpg;base64,{}'</span>.format(plotly_logo.decode()),
                    xref=<span class="hljs-string">"paper"</span>, yref=<span class="hljs-string">"paper"</span>,
                    x=<span class="hljs-number">0</span>, y=<span class="hljs-number">1</span>,
                    sizex=<span class="hljs-number">1</span>, sizey=<span class="hljs-number">1</span>,
                    xanchor=<span class="hljs-string">"left"</span>,
                    yanchor=<span class="hljs-string">"top"</span>,
                    sizing = <span class="hljs-string">'stretch'</span>,
                    layer=<span class="hljs-string">"below"</span>)])
    iplot(fig)
</code></pre>
<p>Now, we will try to determine which zone had the most "On Target" shots using a simple Group By Query in our DataFrame.</p>
<pre><code class="lang-py">shot_coords = {
    <span class="hljs-number">1</span>:[<span class="hljs-number">216</span>,<span class="hljs-number">150</span>],
    <span class="hljs-number">2</span>:[<span class="hljs-number">448</span>,<span class="hljs-number">150</span>],
    <span class="hljs-number">3</span>:[<span class="hljs-number">680</span>,<span class="hljs-number">150</span>],
    <span class="hljs-number">4</span>:[<span class="hljs-number">216</span>,<span class="hljs-number">250</span>],
    <span class="hljs-number">5</span>:[<span class="hljs-number">448</span>,<span class="hljs-number">250</span>],
    <span class="hljs-number">6</span>:[<span class="hljs-number">680</span>,<span class="hljs-number">250</span>],
    <span class="hljs-number">7</span>:[<span class="hljs-number">216</span>,<span class="hljs-number">350</span>],
    <span class="hljs-number">8</span>:[<span class="hljs-number">448</span>,<span class="hljs-number">350</span>],
    <span class="hljs-number">9</span>:[<span class="hljs-number">680</span>,<span class="hljs-number">350</span>]
}

df_target = df[df.OnTarget == <span class="hljs-number">1</span>]

df_target[<span class="hljs-string">'Zone_x'</span>] = df_target[<span class="hljs-string">'Zone'</span>].apply(<span class="hljs-keyword">lambda</span> x: shot_coords[int(x)][<span class="hljs-number">0</span>])
df_target[<span class="hljs-string">'Zone_y'</span>] = df_target[<span class="hljs-string">'Zone'</span>].apply(<span class="hljs-keyword">lambda</span> x: shot_coords[int(x)][<span class="hljs-number">1</span>])

df_zone = pd.DataFrame(df_target.groupby([<span class="hljs-string">'Zone'</span>,<span class="hljs-string">'Zone_x'</span>, <span class="hljs-string">'Zone_y'</span>]).size()).reset_index()
df_zone.rename(columns = {<span class="hljs-number">0</span>:<span class="hljs-string">'Number of Shots'</span>}, inplace= <span class="hljs-literal">True</span>)

show_shots(df_zone, <span class="hljs-string">'Zone_x'</span>, <span class="hljs-string">'Zone_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Zone'</span>, [<span class="hljs-string">'Zone'</span>, <span class="hljs-string">'Number of Shots'</span>], <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-string">'Shot Location (On Target Shots)'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671390325966/7-W1S_o6X.png" alt class="image--center mx-auto" /></p>
<p>Looking at On Target shots can not be all of the Image; let's now look to the other extreme, the zone where the ball could never get on Target.</p>
<pre><code class="lang-python">df_Offtarget = df[df.OnTarget == <span class="hljs-number">0</span>]

df_Offtarget[<span class="hljs-string">'Zone_x'</span>] = df_Offtarget[<span class="hljs-string">'Zone'</span>].apply(<span class="hljs-keyword">lambda</span> x: shot_coords[int(x)][<span class="hljs-number">0</span>])
df_Offtarget[<span class="hljs-string">'Zone_y'</span>] = df_Offtarget[<span class="hljs-string">'Zone'</span>].apply(<span class="hljs-keyword">lambda</span> x: shot_coords[int(x)][<span class="hljs-number">1</span>])

df_zone = pd.DataFrame(df_Offtarget.groupby([<span class="hljs-string">'Zone'</span>,<span class="hljs-string">'Zone_x'</span>, <span class="hljs-string">'Zone_y'</span>]).size()).reset_index()
df_zone.rename(columns = {<span class="hljs-number">0</span>:<span class="hljs-string">'Number of Shots'</span>}, inplace= <span class="hljs-literal">True</span>)

show_shots(df_zone, <span class="hljs-string">'Zone_x'</span>, <span class="hljs-string">'Zone_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Zone'</span>, [<span class="hljs-string">'Zone'</span>, <span class="hljs-string">'Number of Shots'</span>], <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-string">'Intended Shot Location (Off Target Shots)'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671390393119/jw9eaoJnL.png" alt class="image--center mx-auto" /></p>
<p>Oddly enough, most shots are inclined towards zone 7 (shown as a yellow circle here). We can also see that no shots in sectors 5 and 8 were ever missed, probably because they have the least risk of missing the net altogether.</p>
<p>Now looking at goals in absolute numbers, let's figure out which zone was lucky! Which zone had the most goals, as well as which had the least number of successful attempts?</p>
<pre><code class="lang-py">df_zone = pd.DataFrame(df_target.groupby([<span class="hljs-string">'Zone'</span>,<span class="hljs-string">'Zone_x'</span>, <span class="hljs-string">'Zone_y'</span>, <span class="hljs-string">'Goal'</span>]).size()).reset_index()
df_zone.rename(columns = {<span class="hljs-number">0</span>:<span class="hljs-string">'Number of Shots'</span>}, inplace= <span class="hljs-literal">True</span>)

show_shots(df_zone, <span class="hljs-string">'Zone_x'</span>, <span class="hljs-string">'Zone_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Zone'</span>, [<span class="hljs-string">'Zone'</span>, <span class="hljs-string">'Number of Shots'</span>], <span class="hljs-string">'Goal'</span>, <span class="hljs-string">'Shot Success by Zone (On Target Shots)'</span>)

<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(df_zone.shape[<span class="hljs-number">0</span>]):
    zone = df_zone.loc[i, <span class="hljs-string">'Zone'</span>]
    df_goal = df_zone[df_zone.Zone == zone]
    tot = df_goal[<span class="hljs-string">'Number of Shots'</span>].sum()
    goal = df_goal[df_goal.Goal == <span class="hljs-number">1.0</span>][<span class="hljs-string">'Number of Shots'</span>].sum()
    df_zone.loc[i, <span class="hljs-string">'Success Percentage'</span>] = goal/tot

df_zone = df_zone[df_zone.Goal == <span class="hljs-number">1.0</span>]
show_shots(df_zone, <span class="hljs-string">'Zone_x'</span>, <span class="hljs-string">'Zone_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Zone'</span>, [<span class="hljs-string">'Zone'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-string">'Success Percentage'</span>], <span class="hljs-string">'Success Percentage'</span>, <span class="hljs-string">'Shot Success by Zone (On Target Shots)'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671390664272/HYmunr6wR.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671390869453/vkTN6MXB6.png" alt class="image--center mx-auto" /></p>
<p>We notice what seemed obvious, the highest success rates are in the corners, with the upper right corner having a success rate of 100% (talk about beating the house), while zones 5 (center) and 8 (Lower Middle) have the lowest. Since Zone 7 (Bottom Left) had the most shots, it also had the most saves and goals. Now you know, if you're ever in a world cup match, ALWAYS aim for the corners. The top third is your best chance to net one in.</p>
<p>Enough talk about the shooter. Let's talk keeper statistics. How do we determine the number of times a keeper has chosen a specific side? The answer might be closer than you might think. Just scroll on down!</p>
<pre><code class="lang-py">keeper_coords = {
    <span class="hljs-string">'L'</span>:[<span class="hljs-number">216</span>,<span class="hljs-number">250</span>],
    <span class="hljs-string">'C'</span>:[<span class="hljs-number">448</span>,<span class="hljs-number">250</span>],
    <span class="hljs-string">'R'</span>:[<span class="hljs-number">680</span>,<span class="hljs-number">250</span>],
}

df.dropna(inplace=<span class="hljs-literal">True</span>)

df.replace(<span class="hljs-string">'l'</span>, <span class="hljs-string">'L'</span>, inplace=<span class="hljs-literal">True</span>)
df[<span class="hljs-string">'Keeper_x'</span>] = df[<span class="hljs-string">'Keeper'</span>].apply(<span class="hljs-keyword">lambda</span> x: keeper_coords[x][<span class="hljs-number">0</span>])
df[<span class="hljs-string">'Keeper_y'</span>] = df[<span class="hljs-string">'Keeper'</span>].apply(<span class="hljs-keyword">lambda</span> x: keeper_coords[x][<span class="hljs-number">1</span>])

df_keeper = pd.DataFrame(df.groupby([<span class="hljs-string">'Keeper'</span>,<span class="hljs-string">'Keeper_x'</span>, <span class="hljs-string">'Keeper_y'</span>]).size()).reset_index()
df_keeper.rename(columns = {<span class="hljs-number">0</span>:<span class="hljs-string">'Number of Shots'</span>}, inplace= <span class="hljs-literal">True</span>)

show_shots(df_keeper, <span class="hljs-string">'Keeper_x'</span>, <span class="hljs-string">'Keeper_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Keeper'</span>, [<span class="hljs-string">'Keeper'</span>, <span class="hljs-string">'Number of Shots'</span>], <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-string">'Keeper Location'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671391126692/rxRRXB9Q5.png" alt class="image--center mx-auto" /></p>
<p>Similarly, we can plot where the keeper landed most of the time and how many of those were goals vs. no goals.</p>
<pre><code class="lang-py">keeper_coords = {
    <span class="hljs-string">'L'</span>:[<span class="hljs-number">216</span>,<span class="hljs-number">250</span>],
    <span class="hljs-string">'C'</span>:[<span class="hljs-number">448</span>,<span class="hljs-number">250</span>],
    <span class="hljs-string">'R'</span>:[<span class="hljs-number">680</span>,<span class="hljs-number">250</span>],
}

df.dropna(inplace=<span class="hljs-literal">True</span>)
df_no_goal = df[df.Goal==<span class="hljs-number">0</span>]
df_no_goal.replace(<span class="hljs-string">'l'</span>, <span class="hljs-string">'L'</span>, inplace=<span class="hljs-literal">True</span>)
df_no_goal[<span class="hljs-string">'Keeper_x'</span>] = df_no_goal[<span class="hljs-string">'Keeper'</span>].apply(<span class="hljs-keyword">lambda</span> x: keeper_coords[x][<span class="hljs-number">0</span>])
df_no_goal[<span class="hljs-string">'Keeper_y'</span>] = df_no_goal[<span class="hljs-string">'Keeper'</span>].apply(<span class="hljs-keyword">lambda</span> x: keeper_coords[x][<span class="hljs-number">1</span>])

df_no_goal_keeper = pd.DataFrame(df_no_goal.groupby([<span class="hljs-string">'Keeper'</span>,<span class="hljs-string">'Keeper_x'</span>, <span class="hljs-string">'Keeper_y'</span>]).size()).reset_index()
df_no_goal_keeper.rename(columns = {<span class="hljs-number">0</span>:<span class="hljs-string">'Number of Shots'</span>}, inplace= <span class="hljs-literal">True</span>)
print(df_no_goal_keeper)

show_shots(df_no_goal_keeper, <span class="hljs-string">'Keeper_x'</span>, <span class="hljs-string">'Keeper_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Keeper'</span>, [<span class="hljs-string">'Keeper'</span>, <span class="hljs-string">'Number of Shots'</span>], <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-string">'Keeper Location (No Goal)'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671391129548/Ef3am21dk.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-python">keeper_coords = {
    <span class="hljs-string">'L'</span>:[<span class="hljs-number">216</span>,<span class="hljs-number">250</span>],
    <span class="hljs-string">'C'</span>:[<span class="hljs-number">448</span>,<span class="hljs-number">250</span>],
    <span class="hljs-string">'R'</span>:[<span class="hljs-number">680</span>,<span class="hljs-number">250</span>],
}

df.dropna(inplace=<span class="hljs-literal">True</span>)

df.replace(<span class="hljs-string">'l'</span>, <span class="hljs-string">'L'</span>, inplace=<span class="hljs-literal">True</span>)
df = df[df.Goal == <span class="hljs-number">1</span>]
df[<span class="hljs-string">'Keeper_x'</span>] = df[<span class="hljs-string">'Keeper'</span>].apply(<span class="hljs-keyword">lambda</span> x: keeper_coords[x][<span class="hljs-number">0</span>])
df[<span class="hljs-string">'Keeper_y'</span>] = df[<span class="hljs-string">'Keeper'</span>].apply(<span class="hljs-keyword">lambda</span> x: keeper_coords[x][<span class="hljs-number">1</span>])

df_keeper = pd.DataFrame(df.groupby([<span class="hljs-string">'Keeper'</span>,<span class="hljs-string">'Keeper_x'</span>, <span class="hljs-string">'Keeper_y'</span>]).size()).reset_index()
df_keeper.rename(columns = {<span class="hljs-number">0</span>:<span class="hljs-string">'Number of Shots'</span>}, inplace= <span class="hljs-literal">True</span>)

show_shots(df_keeper, <span class="hljs-string">'Keeper_x'</span>, <span class="hljs-string">'Keeper_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Keeper'</span>, [<span class="hljs-string">'Keeper'</span>, <span class="hljs-string">'Number of Shots'</span>], <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-string">'Keeper Location (Goal)'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671391712316/eULh3dkNs.png" alt class="image--center mx-auto" /></p>
<p>We can see that the keeper stays in the middle the last time (take the hint) while he saves the most shots on the left side. Top Right is looking enticing now.</p>
<h2 id="heading-foot-preference">Foot Preference 👟</h2>
<pre><code class="lang-py">foot_coords = {
    <span class="hljs-string">'L'</span>:[<span class="hljs-number">270</span>,<span class="hljs-number">520</span>],
    <span class="hljs-string">'R'</span>:[<span class="hljs-number">600</span>,<span class="hljs-number">520</span>],
}

df.dropna(inplace=<span class="hljs-literal">True</span>)

df.replace(<span class="hljs-string">'l'</span>, <span class="hljs-string">'L'</span>, inplace=<span class="hljs-literal">True</span>)
df[<span class="hljs-string">'Foot_x'</span>] = df[<span class="hljs-string">'Foot'</span>].apply(<span class="hljs-keyword">lambda</span> x: foot_coords[x][<span class="hljs-number">0</span>])
df[<span class="hljs-string">'Foot_y'</span>] = df[<span class="hljs-string">'Foot'</span>].apply(<span class="hljs-keyword">lambda</span> x: foot_coords[x][<span class="hljs-number">1</span>])

df_feet = pd.DataFrame(df.groupby([<span class="hljs-string">'Foot'</span>,<span class="hljs-string">'Foot_x'</span>, <span class="hljs-string">'Foot_y'</span>]).size()).reset_index()
df_feet.rename(columns = {<span class="hljs-number">0</span>:<span class="hljs-string">'Number of Shots'</span>}, inplace= <span class="hljs-literal">True</span>)

show_shots(df_feet, <span class="hljs-string">'Foot_x'</span>, <span class="hljs-string">'Foot_y'</span>, <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-number">70</span>, <span class="hljs-string">'Foot'</span>, [<span class="hljs-string">'Foot'</span>, <span class="hljs-string">'Number of Shots'</span>], <span class="hljs-string">'Number of Shots'</span>, <span class="hljs-string">'Left or Right Footed'</span>, <span class="hljs-string">'stance.jpg'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1671392108976/dcAQ8fAGl.png" alt class="image--center mx-auto" /></p>
<p>Most players shoot with their right foot, which makes it way more likely for the ball to end up on the left side of the goal (Zone 1,4, and 7). This trend can be seen in our data and the corresponding plots. Left-footed players scoring in the top right will make or break this world Cup!</p>
<h1 id="heading-conclusion-post-world-cup-final-results">Conclusion [Post World Cup Final Results]</h1>
<p>Vamos Argentina, Messi, and his team deserved the win after that heart-pounding 120-minute match which ultimately ended in penalties. Don't worry. The Data was updated, and the graphs were redrawn. For all my Data, as well as the Jupyter Notebook, you can check out the <a target="_blank" href="https://github.com/Aryan-401/WorldCupPenalties">GitHub link</a> to the Repository. Cheers, and see you in LA, 2026!</p>
]]></content:encoded></item><item><title><![CDATA[Python in the Browser? What in the world is PyScript]]></title><description><![CDATA[I don't like Web Development, especially Front-end Development with HTML and CSS, so when I heard about Pyscript, my mind started wandering into a new false reality. A reality where we didn't need javascript (I never got the hang of Javascript). Toda...]]></description><link>https://blog.arygarg.me/python-in-the-browser-what-in-the-world-is-pyscript</link><guid isPermaLink="true">https://blog.arygarg.me/python-in-the-browser-what-in-the-world-is-pyscript</guid><category><![CDATA[Web Development]]></category><category><![CDATA[Python]]></category><category><![CDATA[HTML5]]></category><category><![CDATA[Frontend Development]]></category><category><![CDATA[backend]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 12 Sep 2022 11:30:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ItGgnEXi48c/upload/1fb638f3fe5160ed2bb90aea47a541f2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I don't like Web Development, especially Front-end Development with HTML and CSS, so when I heard about Pyscript, my mind started wandering into a new false reality. A reality where we didn't need javascript (I never got the hang of Javascript). Today, we aim to understand the basics of PyScript and try to understand why it was even developed.</p>
<p>Since the creation of Javascript in 1995, it first competed with NCSA Mosaic. After a very successful launch by then Browser Giant, Netscape, it gained the trust of the developer community and, by the next year, was handed off to an international scripting organization called ECMA (European Computer Manufacturers Association), which is responsible for the development and upkeep of this language to this day.</p>
<p>Brendan Eich created JavaScript to fill the need for a “glue language” used by informal programmers and designers. This allowed programmers to use JavaScript to put together components and automate interactions. At this point in our JavaScript history, there were two dominating web browsers: Netscape Navigator (with JavaScript) and Internet Explorer (with Jscript). And by the time the browser world shifted and Internet Explorer became the dominant browser, JavaScript evolved into the endorsed standard for writing interactive processing run in a web browser, to the point where it is necessary to develop web apps today.</p>
<p>Now, almost 30 years into the active development of Javascript, a challenger approaches. Created by Anaconda, Pyscript aims to <a target="_blank" href="https://pyscript.net">bring development for 99%</a> by allowing users to create rich Python applications in the browser using HTML's interface and the power of Pyodide, WASM, and modern web technologies. PyScript is still in heavy development, so it isn't advised to use it in a production environment. All warnings aside, let's dive deep into Python in the browser.</p>
<h2 id="heading-installing-the-pyscript-framework">Installing the Pyscript Framework</h2>
<ol>
<li><p><a target="_blank" href="https://github.com/pyscript/pyscript/archive/refs/heads/main.zip">Click here</a> to download the <code>zip</code> file.</p>
</li>
<li><p>Copy and paste the following into your <code>&lt;head&gt;</code> tag</p>
</li>
</ol>
<pre><code class="lang-html"><span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"stylesheet"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"path/to/pyscript.css"</span> /&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">defer</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"path/to/pyscript.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
<p>OR</p>
<ol>
<li>Copy and paste the commands into your <code>&lt;head&gt;</code> tag</li>
</ol>
<pre><code class="lang-html"><span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"stylesheet"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"https://pyscript.net/alpha/pyscript.css"</span> /&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">defer</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://pyscript.net/alpha/pyscript.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
<p>Yup, it's as easy as that. You can now write Python inside your HTML file!</p>
<h2 id="heading-writing-your-first-program-in-html">Writing your first "Program" in HTML</h2>
<p>Yes, I just proved <a target="_blank" href="https://ischool.syr.edu/why-html-is-not-a-programming-language">you</a> wrong. We're gonna be programming in HTML. Let's write the iconic "Hello World" program into our IDE of choice.</p>
<p>Every IDE has a different shortcut to create the boilerplate code, so you don't end up typing it out every time. For JetBrains IDEs, we can use <code>Ctrl+J</code> to open the Template Menu and click on one of the many HTML formats available.</p>
<p>In the <code>&lt;body&gt;</code> tag, insert this little snippet of text. If you know anything about python, you'll be able to decode what this one-liner will do.</p>
<pre><code class="lang-python">&lt;py-script&gt; print(<span class="hljs-string">'Hello, World!'</span>) &lt;/py-script&gt;
</code></pre>
<p>If it wasn't clear to you, we're "printing" hello world onto the screen. Open this file using a modern browser, and after a couple of seconds of loading, you'll finally have written a program in HTML.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662214204048/8wauOQUWW.png" alt="image.png" class="image--center mx-auto" /></p>
<h2 id="heading-using-packages-in-pyscript">Using Packages in PyScript</h2>
<p>To use non-standard packages in Pyscript, we have to declare their use within <code>&lt;py-env&gt;</code> tags separating them with new lines. The simple format for declaring numpy and pandas into your HTML code would be:</p>
<pre><code class="lang-html">      <span class="hljs-tag">&lt;<span class="hljs-name">py-env</span>&gt;</span>
        - numpy
        - pandas
      <span class="hljs-tag">&lt;/<span class="hljs-name">py-env</span>&gt;</span>
</code></pre>
<p>Remember to declare this in the head tag just below the <code>&lt;script&gt;</code> tag we used</p>
<p>Pyscript has a lot to offer, and if you look at the GitHub repository for the project, you'll see a bunch of cool examples. One of my favorites was Mario natively in the browser using PyScript. It has the entire first level (1-1) complete with sound, the iconic Mario soundtrack, and even cool fireworks at the end.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662293579612/TkEY_WhHA.png" alt="image.png" class="image--center mx-auto" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>PyScript is a cool web utility to help integrate the core values of python into the hands of web developers without using a dedicated framework like Flask or Django. Although, in my experience, it was a tad bit slow, maybe because it is in the alpha stage, or maybe because it isn't optimized to run on Chromium-based browsers (Tested on Brave). The latter seems unlikely because chromium-based browsers are the most commonly found nowadays.</p>
]]></content:encoded></item><item><title><![CDATA[Visualizing Data — Women's Fashion Catalog]]></title><description><![CDATA[An Introduction to the dataset
The data set can be found here on Kaggle. It consists of 7 columns and 30758 rows. The data type of all columns are strings and contains no NULL values. The dataset does contain strings labeled as Nan, which are placeho...]]></description><link>https://blog.arygarg.me/visualizing-data-womens-fashion-catalog</link><guid isPermaLink="true">https://blog.arygarg.me/visualizing-data-womens-fashion-catalog</guid><category><![CDATA[Data Science]]></category><category><![CDATA[Python]]></category><category><![CDATA[data]]></category><category><![CDATA[Google]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 05 Sep 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/TS--uNw-JqE/upload/17ad26d58b3169ec1481e09841143f6c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-an-introduction-to-the-dataset">An Introduction to the dataset</h2>
<p>The data set can be found <a target="_blank" href="https://www.kaggle.com/code/mohamedaminesoltani/eda-e-commerce-women-fashion/data">here</a> on Kaggle. It consists of 7 columns and 30758 rows. The data type of all columns are strings and contains no NULL values. The dataset does contain strings labeled as <code>Nan</code>, which are placeholders for <code>np.nan</code>.</p>
<p>Make sure you download the <code>.csv</code> file and move it to the working directory of your Jupyter Notebook.</p>
<h1 id="heading-filtering-and-pre-processing-the-data">Filtering and Pre-processing the data</h1>
<p>Preprocessing data is always the first step in building any usable dataset. It helps eliminate all the unnecessary data that we won't be using. By preprocessing the data, we are essentially helping the program only look at relevant data.</p>
<h3 id="heading-importing-the-required-libraries">Importing the Required Libraries</h3>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> seaborn <span class="hljs-keyword">as</span> sns
<span class="hljs-keyword">from</span> wordcloud <span class="hljs-keyword">import</span> WordCloud
</code></pre>
<h3 id="heading-loading-data-as-a-pandas-dataframe">Loading data as a Pandas DataFrame</h3>
<pre><code class="lang-py">dataset = pd.read_csv(<span class="hljs-string">'FashionDataset.csv'</span>)  <span class="hljs-comment"># Importing CSV file</span>
print(dataset.head())
print(<span class="hljs-string">f"Size: <span class="hljs-subst">{dataset.shape[<span class="hljs-number">0</span>]}</span>"</span>)  <span class="hljs-comment"># Number of Rows</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660380397264/oWHQwXAu4.png" alt="image.png" /></p>
<h3 id="heading-dropping-unnecessary-columns">Dropping Unnecessary Columns</h3>
<pre><code class="lang-py">data_set_trim_1 = dataset.drop([<span class="hljs-string">'Deatils'</span>, <span class="hljs-string">"Sizes"</span>, <span class="hljs-string">'Unnamed: 0'</span>], axis=<span class="hljs-number">1</span>)  <span class="hljs-comment"># Since we only care about numeric data, for now, we can remove all the data we don't need (also, yes details is spelled like that in the dataset)</span>
data_set_trim_1.head()
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660282673318/rlB-hu3fJ.png" alt="image.png" /></p>
<h3 id="heading-pre-processing-the-data">Pre-Processing the data</h3>
<pre><code class="lang-py">data_set_trim_1[<span class="hljs-string">"MRP"</span>] = data_set_trim_1[<span class="hljs-string">"MRP"</span>].str.replace(<span class="hljs-string">"Rs\n"</span>, <span class="hljs-string">""</span>)  <span class="hljs-comment"># 'Rs\n420' -&gt; '420'</span>
data_set_trim_1[<span class="hljs-string">"Category"</span>] = data_set_trim_1[<span class="hljs-string">"Category"</span>].str.replace(<span class="hljs-string">"-Women"</span>, <span class="hljs-string">""</span>)  <span class="hljs-comment"># 'Watch-Women' -&gt; 'Watch'</span>
data_set_trim_1[<span class="hljs-string">"Discount"</span>] = data_set_trim_1[<span class="hljs-string">"Discount"</span>].str.replace(<span class="hljs-string">"% off"</span>, <span class="hljs-string">""</span>)  <span class="hljs-comment"># '50% off' -&gt; '50'</span>
data_set_trim_1.head()
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660282864929/XFvy_qkvz.png" alt="image.png" /></p>
<p>At this point, I thought the data would be converted to integers, but I was wrong all the data were still strings. To verify this, I ran the following one-liner</p>
<pre><code class="lang-py">type(data_set_trim_1.iloc[<span class="hljs-number">22</span>][<span class="hljs-string">"SellPrice"</span>])  <span class="hljs-comment">#Noticing that SellPrice as well as MRP and Discount are Strings and not text</span>
</code></pre>
<p>Output:</p>
<pre><code class="lang-python">str
</code></pre>
<p>Now I would convert all the strings into integers using the <code>.apply()</code> attribute and replace the string <code>Nan</code> values with actual <code>np.NaN</code> values</p>
<pre><code class="lang-py">data_set_trim_2 = data_set_trim_1.replace(<span class="hljs-string">"Nan"</span>,np.nan)  <span class="hljs-comment"># Replacing all "Nan" strings with np.NAN</span>
data_set_trim_2.dropna(inplace=<span class="hljs-literal">True</span>, subset=[<span class="hljs-string">'MRP'</span>, <span class="hljs-string">"BrandName"</span>, <span class="hljs-string">"SellPrice"</span>])
print(data_set_trim_2.dtypes)  <span class="hljs-comment"># All Columns are String DataType</span>
data_set_trim_2[[<span class="hljs-string">'MRP'</span>, <span class="hljs-string">'SellPrice'</span>, <span class="hljs-string">"Discount"</span>]] = data_set_trim_2[[<span class="hljs-string">'MRP'</span>, <span class="hljs-string">'SellPrice'</span>, <span class="hljs-string">"Discount"</span>]].apply(pd.to_numeric)  <span class="hljs-comment"># Changing Required Columns to integer</span>
print(data_set_trim_2.dtypes)
</code></pre>
<p>Output</p>
<pre><code class="lang-python">BrandName    object
MRP          object
SellPrice    object
Discount     object
Category     object
dtype: object
BrandName    object
MRP           int64
SellPrice     int64
Discount      int64
Category     object
dtype: object
</code></pre>
<h3 id="heading-finalising-the-data">Finalising the data</h3>
<pre><code class="lang-py">f_data = data_set_trim_2  <span class="hljs-comment"># Setting Final Data</span>
print(<span class="hljs-string">f"Size: <span class="hljs-subst">{f_data.shape[<span class="hljs-number">0</span>]}</span>"</span>)
</code></pre>
<p>we return with <code>22550</code> rows worth of complete data. Now we can start visualizing it ;)</p>
<h2 id="heading-accessing-basic-information">Accessing Basic Information</h2>
<p>Let's start with something Simple. How many unique brands do we have in our dataset?</p>
<pre><code class="lang-py">f_data.nunique()[<span class="hljs-string">"BrandName"</span>]  <span class="hljs-comment"># Number of Brands in Dataset</span>
</code></pre>
<p>Output:</p>
<pre><code class="lang-python"><span class="hljs-number">177</span>
</code></pre>
<p>That's a lot of brands! Let's get a deeper look at the data using the <code>.describe()</code> command</p>
<pre><code class="lang-py">f_data.describe().T
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660284465754/8pJbpuu_d.png" alt="image.png" /></p>
<p>What about the most expensive Items?</p>
<pre><code class="lang-py">f_data[f_data.SellPrice == f_data.SellPrice.max()]  <span class="hljs-comment"># Most Expensive Item(s)</span>
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660284132392/IB33YWtrH.png" alt="image.png" /></p>
<p>(Yikes! Those are some expensive watches)</p>
<p>What about the most expensive brands? Let's figure out the average price of any article from a brand</p>
<pre><code class="lang-py">f_data.groupby(<span class="hljs-string">'BrandName'</span>)[<span class="hljs-string">'SellPrice'</span>].mean().sort_values(ascending=<span class="hljs-literal">False</span>).head()  <span class="hljs-comment"># Mean Price of every Brand</span>
</code></pre>
<p>Output:</p>
<pre><code class="lang-python">BrandName
just cavalli      <span class="hljs-number">18309.600000</span>
coach             <span class="hljs-number">12616.769231</span>
versus            <span class="hljs-number">11555.600000</span>
ted baker         <span class="hljs-number">10031.111111</span>
emporio armani     <span class="hljs-number">9423.509804</span>
Name: SellPrice, dtype: float64
</code></pre>
<p>You can similarly find the cheapest brand by changing <code>ascending=False</code> to <code>True</code></p>
<h2 id="heading-real-visualizations-using-matplotlib-and-seaborn">Real Visualizations using Matplotlib and Seaborn</h2>
<p>Let's start by plotting a heatmap of our data.</p>
<pre><code class="lang-py">sns.heatmap(f_data.corr(),annot=<span class="hljs-literal">True</span>,cmap=<span class="hljs-string">'coolwarm'</span>,linewidths=<span class="hljs-number">0.2</span>)
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660284776400/cDLJfnPVD.png" alt="image.png" /></p>
<p>What about a plot which shows us the amount of items in each category?</p>
<pre><code class="lang-py">plt.figure(figsize=(<span class="hljs-number">20</span>,<span class="hljs-number">7</span>))  <span class="hljs-comment">#setting the plot size</span>
sns.countplot(f_data[<span class="hljs-string">"Category"</span>])
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660284847338/m6IKw-jwm.png" alt="image.png" /></p>
<p>Here is a scatter plot of all the prices per category</p>
<pre><code class="lang-py">plt.figure(figsize=(<span class="hljs-number">20</span>,<span class="hljs-number">7</span>))
sns.scatterplot(f_data[<span class="hljs-string">'MRP'</span>],f_data[<span class="hljs-string">'SellPrice'</span>],hue=f_data[<span class="hljs-string">'Category'</span>])
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660285468095/7anRo55AH.png" alt="image.png" /></p>
<p>Let's figure out which category is the most discounted</p>
<pre><code class="lang-py"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">modify</span>(<span class="hljs-params">d</span>):</span>
    <span class="hljs-keyword">if</span> int(d) <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>,<span class="hljs-number">41</span>):
        <span class="hljs-keyword">return</span> <span class="hljs-string">'0-40%'</span>
    <span class="hljs-keyword">elif</span> int(d) <span class="hljs-keyword">in</span> range(<span class="hljs-number">41</span>,<span class="hljs-number">71</span>):
        <span class="hljs-keyword">return</span> <span class="hljs-string">'40-70%'</span>
    <span class="hljs-keyword">elif</span> int(d) <span class="hljs-keyword">in</span> range(<span class="hljs-number">71</span>,<span class="hljs-number">101</span>):
        <span class="hljs-keyword">return</span> <span class="hljs-string">'&lt;70%'</span>
f_data[<span class="hljs-string">'D_range'</span>] = f_data[<span class="hljs-string">'Discount'</span>].apply(modify)  <span class="hljs-comment"># adding a new column</span>
f_data.head()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660285572692/Y1UZ5JhBX.png" alt="image.png" /></p>
<pre><code class="lang-py">plt.figure(figsize=(<span class="hljs-number">20</span>,<span class="hljs-number">5</span>))
sns.countplot(f_data[<span class="hljs-string">'Category'</span>],hue=f_data[<span class="hljs-string">'D_range'</span>])
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660285630734/YcSj7qNRA.png" alt="image.png" /></p>
<h1 id="heading-fun-with-wordclouds">Fun with WordClouds</h1>
<p>We've had some interesting plots, but now let's have some fun with '✨word clouds✨</p>
<p>Since the Word Cloud Module takes a string as input we are going to be joining the various strings</p>
<pre><code class="lang-py">textCategory = <span class="hljs-string">" "</span>.join(i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> f_data[<span class="hljs-string">'Category'</span>])
textCompany = <span class="hljs-string">" "</span>.join(i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> f_data[<span class="hljs-string">'BrandName'</span>])
</code></pre>
<pre><code class="lang-py">word_cloud_company = WordCloud(collocations=<span class="hljs-literal">False</span>, background_color=<span class="hljs-string">'black'</span>, width=<span class="hljs-number">1920</span>, height=<span class="hljs-number">1080</span>).generate(textCompany)
word_cloud_company.to_file(<span class="hljs-string">'company.png'</span>)

word_cloud_category = WordCloud(collocations=<span class="hljs-literal">False</span>, background_color=<span class="hljs-string">'black'</span>, width=<span class="hljs-number">1920</span>, height=<span class="hljs-number">1080</span>).generate(textCategory)
word_cloud_category.to_file(<span class="hljs-string">'category.png'</span>)
</code></pre>
<p>Output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660380939668/MkN9I9ojR.png" alt="company.png" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660380945084/lE5KBHDHE.png" alt="category.png" /></p>
<p>How about using the sizes column we dropped while preprocessing</p>
<p>Let's make a word cloud of the Sizes and Details column!</p>
<p>#Processing the data</p>
<pre><code class="lang-py">only_size = dataset[<span class="hljs-string">"Sizes"</span>]
only_size = only_size.replace(<span class="hljs-string">"Nan"</span>,np.nan)  <span class="hljs-comment"># Replacing all "Nan" strings with np.NAN</span>
only_size.dropna(inplace=<span class="hljs-literal">True</span>)
only_size = only_size.str.replace(<span class="hljs-string">"Size:"</span>, <span class="hljs-string">""</span>)
only_size = only_size.str.replace(<span class="hljs-string">","</span>, <span class="hljs-string">" "</span>)
only_size.head()
</code></pre>
<pre><code class="lang-py">textSize = <span class="hljs-string">" "</span>.join(i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> only_size)
word_cloud_size = WordCloud(collocations=<span class="hljs-literal">False</span>, background_color=<span class="hljs-string">'black'</span>, width=<span class="hljs-number">1920</span>, height=<span class="hljs-number">1080</span>).generate(textSize)
word_cloud_size.to_file(<span class="hljs-string">'size.png'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660381595693/C1kfWTJKn.png" alt="size.png" /></p>
<p>Similarily we can do the same thing with the details column</p>
<pre><code class="lang-py">only_details = dataset[<span class="hljs-string">"Deatils"</span>]
only_details = only_details.replace(<span class="hljs-string">"Nan"</span>,np.nan)  <span class="hljs-comment"># Replacing all "Nan" strings with np.NAN</span>
only_details.dropna(inplace=<span class="hljs-literal">True</span>)
only_details = only_details.str.replace(<span class="hljs-string">"Size:"</span>, <span class="hljs-string">""</span>)
only_details = only_details.str.replace(<span class="hljs-string">","</span>, <span class="hljs-string">" "</span>)
only_details.head()
</code></pre>
<pre><code class="lang-py">textDetail = <span class="hljs-string">" "</span>.join(i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> only_details)
word_cloud_detail = WordCloud(collocations=<span class="hljs-literal">False</span>, background_color=<span class="hljs-string">'black'</span>, width=<span class="hljs-number">1920</span>, height=<span class="hljs-number">1080</span>).generate(textDetail)
word_cloud_detail.to_file(<span class="hljs-string">'detail.png'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660381647839/9sTkhiukM.png" alt="detail.png" /></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Well, that's all, folks! I don't really have a conclusion, but DATA IS SO COOL!</p>
<p>You can find the Google Colab Link <a target="_blank" href="https://colab.research.google.com/drive/1GOPw9ZLgMxADKj2D3nDdnxzEf115D6jD?usp=sharing">here</a></p>
]]></content:encoded></item><item><title><![CDATA[Getting Started with Machine Learning]]></title><description><![CDATA[Machine Learning! Artificial Intelligence! Computational Analysis! All these buzz words surround the world of tech. These buzz words have created a hardened cast over the computer science community. As part of the hype, I learnt the basics of Machine...]]></description><link>https://blog.arygarg.me/getting-started-with-machine-learning</link><guid isPermaLink="true">https://blog.arygarg.me/getting-started-with-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Python]]></category><category><![CDATA[pandas]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[learning]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 29 Aug 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1658259844225/dwDbS4Orx.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Machine Learning</strong>! <strong>Artificial Intelligence</strong>! <strong>Computational Analysis</strong>! All these buzz words surround the world of tech. These buzz words have created a hardened cast over the computer science community. As part of the hype, I learnt the basics of Machine Learning using <code>pandas</code> and <code>sklearn</code>. We would also use <a target="_blank" href="https://www.kaggle.com/datasets/dansbecker/melbourne-housing-snapshot/download?datasetVersionNumber=5">this</a> dataset from Kaggle to interpret and understand our data. Enough small-talk. Let's get coding!</p>
<h1 id="heading-step-1-create-a-virtual-environment-in-python">Step 1: Create a Virtual Environment in Python</h1>
<p>Data scientists usually use Jupyter Notebooks for compiling and interpreting data, so make sure you have Jupyter installed on your local machine.</p>
<p>Or, in case you want to work in a standard Python Environment:</p>
<p>Follow the steps in <a target="_blank" href="https://aryan401.hashnode.dev/virtual-environments-youre-gonna-need-them">this</a> article, and then continue with this tutorial.</p>
<h1 id="heading-step-2-download-the-required-dependencies">Step 2: Download the Required Dependencies</h1>
<p>For this initial tutorial, we will be starting with Pandas. Pandas is the primary tool data scientists use for exploring and manipulating data. Most people abbreviate pandas in their code as <code>pd</code>. We do this with the following command:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd  <span class="hljs-comment">#pip install pandas</span>
</code></pre>
<p>The most important part of the Pandas library is the DataFrame. A DataFrame holds the type of data you might think of as a table. This is similar to a sheet in Excel.</p>
<p>Pandas has powerful methods for most things you'll want to do with this type of data and could be a mini-series on its own.</p>
<p>We would also be working with sklearn further, so it wouldn't hurt to install it now.</p>
<pre><code class="lang-bash">pip install scikit-learn
</code></pre>
<h1 id="heading-the-basics-of-pandas">The Basics of Pandas</h1>
<p>Pandas is a fast, powerful, flexible, and easy-to-use open source data analysis and manipulation tool built on top of the Python programming language. It is used by millions of data scientists and is entirely open source!</p>
<h2 id="heading-reading-a-csv-file">Reading a CSV file</h2>
<p>Unzipping <a target="_blank" href="https://www.kaggle.com/datasets/dansbecker/melbourne-housing-snapshot/download?datasetVersionNumber=5">this</a> file, we should get a file named <code>melb_data.csv</code>. Make sure it is in the same folder as your working file. and attempt to read it with the following lines of code</p>
<pre><code class="lang-python">file_path = <span class="hljs-string">'melb_data.csv'</span>
melbourne_data = pd.read_csv(file_path)
melbourne_data.describe() <span class="hljs-comment">#should give information about independent columns in the dataset</span>
</code></pre>
<h2 id="heading-some-basic-pandas-function">Some basic Panda's Function</h2>
<p>To get all columns, we use <code>columns</code></p>
<pre><code class="lang-python">print(melbourne_data.columns)
</code></pre>
<p>Output:</p>
<pre><code class="lang-python">Index([<span class="hljs-string">'Suburb'</span>, <span class="hljs-string">'Address'</span>, <span class="hljs-string">'Rooms'</span>, <span class="hljs-string">'Type'</span>, <span class="hljs-string">'Price'</span>, <span class="hljs-string">'Method'</span>, <span class="hljs-string">'SellerG'</span>,
       <span class="hljs-string">'Date'</span>, <span class="hljs-string">'Distance'</span>, <span class="hljs-string">'Postcode'</span>, <span class="hljs-string">'Bedroom2'</span>, <span class="hljs-string">'Bathroom'</span>, <span class="hljs-string">'Car'</span>,
       <span class="hljs-string">'Landsize'</span>, <span class="hljs-string">'BuildingArea'</span>, <span class="hljs-string">'YearBuilt'</span>, <span class="hljs-string">'CouncilArea'</span>, <span class="hljs-string">'Lattitude'</span>,
       <span class="hljs-string">'Longtitude'</span>, <span class="hljs-string">'Regionname'</span>, <span class="hljs-string">'Propertycount'</span>],
      dtype=<span class="hljs-string">'object'</span>)
</code></pre>
<p>The current dataset has missing values (some houses for which some variables weren't recorded). We will learn to handle missing values in a later tutorial. So we will take the simplest option for now and drop houses from our data. Don't worry about this much for now, though the code is:</p>
<pre><code class="lang-python">melbourne_data = melbourne_data.dropna(axis=<span class="hljs-number">0</span>)
</code></pre>
<p>Most datasets have thousands of rows and tens of columns. It is not logical to load up the entire dataFrame to have a glimpse of the data it contains. we use <code>head(x)</code> and <code>tail(x)</code> functions to get the first <code>x</code> records from the dataFrame. By default, they return five results each</p>
<pre><code class="lang-python">melbourne_data.head()  <span class="hljs-comment"># First 5 Rows</span>
melbourne_data.tail()  <span class="hljs-comment"># Last 5 Rows</span>
</code></pre>
<h2 id="heading-choosing-features">Choosing "Features"</h2>
<p>Columns inputted into our model are called "features" In our case, those would be the columns used to determine the home price.</p>
<pre><code class="lang-python">melbourne_features = [<span class="hljs-string">'Rooms'</span>, <span class="hljs-string">'Bathroom'</span>, <span class="hljs-string">'Landsize'</span>, <span class="hljs-string">'Car'</span>, <span class="hljs-string">'Postcode'</span>]
</code></pre>
<p>By convention, this variable is called <code>X</code>.</p>
<pre><code class="lang-python">X = melbourne_data[melbourne_features]
</code></pre>
<p>There are four major steps to making and using a model: </p>
<ul>
<li>Defining what kind of model you are going to be using</li>
<li>Fitting your data into the model</li>
<li>Predicting data using your model</li>
<li>Determining how accurate the model is using error analysis techniques</li>
</ul>
<h2 id="heading-writing-our-first-model">Writing our First Model</h2>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeRegressor

melbourne_model = DecisionTreeRegressor(random_state=<span class="hljs-number">1</span>)  <span class="hljs-comment"># Defining what type of model we are going to be using</span>
</code></pre>
<p><code>random_state</code> is used to ensure the same results can be found on each run since most models allow randomness in their training. The number does not meaningfully change the results of the model.</p>
<pre><code class="lang-python">melbourne_model.fit(X, y)  <span class="hljs-comment"># Fitting your data into the model</span>
</code></pre>
<p>Now to predict some data, we can use the <code>.predict()</code> attribute</p>
<pre><code class="lang-python">print(melbourne_model.predict(X.head()))
print(X.head)
</code></pre>
<p>You'd notice that the price of both of these are the same, as the model is trained directly on the same values. This is called in-sampling since, in a large enough market, the door color is unrelated to the retail value of the property. However, in the model we generated, green-colored doors are, on average, much more expensive than any other colored door; thus, if we had two similar houses with different colored doors, the green-colored door would net a greater price though the detail is hardly relevant. </p>
<p>To combat this, we would split our data into two parts — Training data and Validation data to calculate a mean absolute error. </p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = <span class="hljs-number">0</span>)

melbourne_model = DecisionTreeRegressor()  <span class="hljs-comment"># Defining our model</span>

melbourne_model.fit(train_X, train_y)  <span class="hljs-comment"># Fitting our model</span>

val_predictions = melbourne_model.predict(val_X)  <span class="hljs-comment"># predictions!</span>
print(mean_absolute_error(val_y, val_predictions))  <span class="hljs-comment"># The greater the mean_absolute_error, the worse the model performance is.</span>
</code></pre>
<h2 id="heading-and-there-you-go">And there you go</h2>
<p>You've created your first data model. Try experimenting with the data and finding a method to reduce the <code>mean_absolute_error</code>. The Lower the <code>mean_absolute_error</code>, the better the data model. </p>
]]></content:encoded></item><item><title><![CDATA[Arrays — Beginning with Data Structues]]></title><description><![CDATA[Ask a Computer Major what they need to focus on the most, and 9/10 of them would say that they need a firmer grasp of Data Structures and Algorithms. As a sophomore in college, I've also started the tedious task of understanding and implementing vari...]]></description><link>https://blog.arygarg.me/arrays-beginning-with-data-structues</link><guid isPermaLink="true">https://blog.arygarg.me/arrays-beginning-with-data-structues</guid><category><![CDATA[data structures]]></category><category><![CDATA[array]]></category><category><![CDATA[C++]]></category><category><![CDATA[General Programming]]></category><category><![CDATA[fundamentals]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 22 Aug 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/jLwVAUtLOAQ/upload/v1660043227062/21_NlIY_h.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Ask a Computer Major what they need to focus on the most, and 9/10 of them would say that they need a firmer grasp of Data Structures and Algorithms. As a sophomore in college, I've also started the tedious task of understanding and implementing various Data Structures. I'm going to be using C++ for my implementations but would also be providing pseudo-code for other languages!</p>
<p>A quick reminder that I'm just starting out with this, so my solutions might not be the most optimized. I would love to learn more in the comments!</p>
<h1 id="heading-arrays">Arrays</h1>
<p>To sum it up in as few lines as possible, an Array is a collection of objects with similar data types. In C++, once an array size has been defined, it can not be changed unless we use the concept of Dynamic Memory Allocation. This article will focus on the Insertion, Deletion, and rotation of arrays.</p>
<h2 id="heading-traversing-an-array">Traversing an Array</h2>
<p>Traversing an Array refers to iterating through the elements of the array. Its time complexity would be <code>O(n)</code> for an array of size <code>n</code>.</p>
<h3 id="heading-c">C++</h3>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
<span class="hljs-keyword">int</span> size = <span class="hljs-number">5</span>;
<span class="hljs-keyword">int</span> arr[size] = {<span class="hljs-number">7</span>,<span class="hljs-number">3</span>,<span class="hljs-number">2</span>,<span class="hljs-number">5</span>,<span class="hljs-number">1</span>};
<span class="hljs-keyword">for</span>(<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; size; i++){
    <span class="hljs-built_in">cout</span> &lt;&lt; i &lt;&lt; <span class="hljs-string">"th Element is: "</span> &lt;&lt; arr[i] &lt;&lt; <span class="hljs-built_in">endl</span>;
    }

<span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h3 id="heading-psuedo-code">Psuedo-Code</h3>
<pre><code class="lang-python">INT size = <span class="hljs-number">5</span>
declare INT array of Length size
For i = <span class="hljs-number">1</span> to <span class="hljs-number">5</span>
    OUTPUT i-th element of array
EndFor
</code></pre>
<h3 id="heading-output">Output:</h3>
<pre><code class="lang-python"><span class="hljs-number">0</span>th Element <span class="hljs-keyword">is</span>: <span class="hljs-number">7</span>
<span class="hljs-number">1</span>th Element <span class="hljs-keyword">is</span>: <span class="hljs-number">3</span>
<span class="hljs-number">2</span>th Element <span class="hljs-keyword">is</span>: <span class="hljs-number">2</span>
<span class="hljs-number">3</span>th Element <span class="hljs-keyword">is</span>: <span class="hljs-number">5</span>
<span class="hljs-number">4</span>th Element <span class="hljs-keyword">is</span>: <span class="hljs-number">1</span>
</code></pre>
<h2 id="heading-inserting-an-element-into-an-array">Inserting an element into an Array</h2>
<p>Before we start, we will assume that an array of significant length has been made in advance, and we would not need to use pointers or dynamic memory allocation for the same. This section would include:</p>
<ul>
<li><p>Inserting at the End of an Array</p>
</li>
<li><p>Inserting at the beginning of the Array</p>
</li>
<li><p>Inserting at ANY position of the Array</p>
</li>
</ul>
<p>Also, some constants we would be declaring are given below:</p>
<pre><code class="lang-md">n -&gt; total number of elements in the array
item -&gt; element to be added
position -&gt; position at which item is supposed to be inserted
a -&gt; array with significant space available for new item
</code></pre>
<h3 id="heading-c-1">C++</h3>
<h4 id="heading-inserting-at-the-end-of-an-array">Inserting at the End of an Array</h4>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
    a[n] = item;
    n += <span class="hljs-number">1</span>;
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h4 id="heading-inserting-at-the-beginning-of-an-array">Inserting at the Beginning of an Array</h4>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = n<span class="hljs-number">-1</span>; i &gt; <span class="hljs-number">0</span>; i--){
        a[i+<span class="hljs-number">1</span>] = a[i];
    }
    a[pos<span class="hljs-number">-1</span>] = item;
<span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h4 id="heading-inserting-at-any-position-of-an-array">Inserting at ANY position of an Array</h4>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = n<span class="hljs-number">-1</span>; i &gt;= pos; i--){
        a[i+<span class="hljs-number">1</span>] = a[i];
    }
    a[pos<span class="hljs-number">-1</span>] = item;
<span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h3 id="heading-pseudo-code">Pseudo-Code</h3>
<h4 id="heading-inserting-at-the-end-of-the-array">Inserting at the End of the Array</h4>
<pre><code class="lang-python">SET the n+<span class="hljs-number">1</span>th element of the array = item
</code></pre>
<h4 id="heading-inserting-at-the-beginning-of-the-array">Inserting at the beginning of the Array</h4>
<pre><code class="lang-python">For int i = last element index to first element decreasing by <span class="hljs-number">1</span>
    SET (i + <span class="hljs-number">1</span>)-th element of Array = i-th element of the Array
EndFor
SET (pos<span class="hljs-number">-1</span>)-th element of Array = item
</code></pre>
<h4 id="heading-inserting-at-any-position-of-the-array">Inserting at ANY position of the Array</h4>
<pre><code class="lang-python">For int i = last element index to pos-th element decreasing by <span class="hljs-number">1</span>
    SET (i + <span class="hljs-number">1</span>)-th element of Array = i-th element of the Array
EndFor
SET (pos<span class="hljs-number">-1</span>)-th element of Array = item
</code></pre>
<h4 id="heading-interpretation">Interpretation</h4>
<p>The Best Case for this algorithm would be when we need to add an element to the end of the array, as it can be achieved in <code>O(1)</code>. Meanwhile, the Worst Case Scenario would be when we need to insert an element to the first index, which would have a time complexity of <code>O(n)</code>.</p>
<h2 id="heading-deleting-elements-from-an-array">Deleting Elements from an Array</h2>
<p>As important as inserting elements directly into an array, we should also be able to remove them. We would be using the same constants as in the above section and have three sub-sections here.</p>
<ul>
<li><p>Deleting the Last Element of the Array</p>
</li>
<li><p>Deleting the First Element of the Array</p>
</li>
<li><p>Deleting ANY position element of the Array</p>
</li>
</ul>
<h3 id="heading-c-2">C++</h3>
<h4 id="heading-deleting-the-last-element-of-the-array">Deleting the Last Element of the Array</h4>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
    a[n<span class="hljs-number">-1</span>] = <span class="hljs-number">0</span>;
    n n<span class="hljs-number">-1</span>;
<span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h4 id="heading-deleting-the-first-element-of-the-array">Deleting the First Element of the Array</h4>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">1</span>; i &lt; n; i++){
        a[i<span class="hljs-number">-1</span>] = a[i];
    }
    n=n<span class="hljs-number">-1</span>;
<span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h4 id="heading-deleting-any-position-element-from-the-array">Deleting ANY position element from the Array</h4>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = pos; i &lt; n; i++){
        a[i<span class="hljs-number">-1</span>] = a[i];
    }
    n=n<span class="hljs-number">-1</span>;
<span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h3 id="heading-pseudo-code-1">Pseudo-Code</h3>
<h4 id="heading-deleting-the-last-element-of-the-array-1">Deleting the Last Element of the Array</h4>
<pre><code class="lang-md">SET the last element of the Array to Zero
</code></pre>
<h4 id="heading-deleting-the-first-element-of-the-array-1">Deleting the First Element of the Array</h4>
<pre><code class="lang-md">For i = to n
  Array's i-1-th element = Array's i-th element
EndFor
SET n = n - 1
</code></pre>
<h4 id="heading-deleting-any-position-element-from-the-array-1">Deleting ANY position element from the Array</h4>
<pre><code class="lang-md">For i =pos to n
  Array's i-1-th element = Array's i-th element
EndFor
SET n = n - 1
</code></pre>
<h4 id="heading-interpretation-1">Interpretation</h4>
<p>The Best Case for this algorithm would be when we need to remove an element from the last index of the array, as it can be achieved in <code>O(1)</code>. Meanwhile, the Worst Case Scenario would be when we need to remove an element from the beginning, which would have a time complexity of <code>O(n)</code>.</p>
<h2 id="heading-rotation-of-an-array">Rotation of an Array</h2>
<p>Rotation of an array refers to changing the order of an array by shifting the elements by <code>m</code> spaces either on the left or the right.</p>
<p>Example via: <a target="_blank" href="https://www.geeksforgeeks.org/array-rotation/">GeeksForGeeks</a></p>
<pre><code class="lang-python">Input: arr[] = {<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>, <span class="hljs-number">7</span>}, d = <span class="hljs-number">2</span>
Output: <span class="hljs-number">3</span> <span class="hljs-number">4</span> <span class="hljs-number">5</span> <span class="hljs-number">6</span> <span class="hljs-number">7</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span>
</code></pre>
<pre><code class="lang-python">Input: arr[] = {<span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>, <span class="hljs-number">7</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>}, d=<span class="hljs-number">2</span>
Output: <span class="hljs-number">5</span> <span class="hljs-number">6</span> <span class="hljs-number">7</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span> <span class="hljs-number">4</span>
</code></pre>
<h3 id="heading-c-3">C++</h3>
<pre><code class="lang-cpp"><span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">gcd</span><span class="hljs-params">(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b)</span></span>{
   <span class="hljs-keyword">if</span> (b == <span class="hljs-number">0</span>)
     <span class="hljs-keyword">return</span> a;
   <span class="hljs-keyword">else</span>
     <span class="hljs-keyword">return</span> gcd(b, a % b);
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">array_left_rotate</span><span class="hljs-params">(<span class="hljs-keyword">int</span> arr[], <span class="hljs-keyword">int</span> d, <span class="hljs-keyword">int</span> n)</span></span>{
   <span class="hljs-keyword">int</span> i, j, k, temp;
   <span class="hljs-keyword">for</span> (i = <span class="hljs-number">0</span>; i &lt; gcd(d, n); i++){
     temp = arr[i];
     j = i;
     <span class="hljs-keyword">while</span> (<span class="hljs-number">1</span>) {
       k = j + d;
       <span class="hljs-keyword">if</span> (k &gt;= n)
         k = k - n;
       <span class="hljs-keyword">if</span> (k == i)
         <span class="hljs-keyword">break</span>;
       arr[j] = arr[k];
       j = k;
 }
   arr[j] = temp;
   }
}

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span>{
 <span class="hljs-keyword">int</span> arr[] = { <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>, <span class="hljs-number">7</span> };
 <span class="hljs-keyword">int</span> n = <span class="hljs-keyword">sizeof</span>(arr) / <span class="hljs-keyword">sizeof</span>(arr[<span class="hljs-number">0</span>]);
 <span class="hljs-built_in">cout</span>&lt;&lt;<span class="hljs-string">"\nArray elements before rotating : \n"</span>;
 <span class="hljs-keyword">for</span>(<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; n; i++){
   <span class="hljs-built_in">cout</span>&lt;&lt;arr[i]&lt;&lt;<span class="hljs-string">"\t"</span>;
 }
 <span class="hljs-keyword">int</span> no_of_rotations = <span class="hljs-number">1</span>;
 array_left_rotate(arr, no_of_rotations, n);
 <span class="hljs-built_in">cout</span>&lt;&lt;<span class="hljs-string">"\n\nArray elements after rotating : \n"</span>;
 <span class="hljs-keyword">for</span>(<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; n; i++)
 {
 <span class="hljs-built_in">cout</span>&lt;&lt;arr[i]&lt;&lt;<span class="hljs-string">"\t"</span>; <span class="hljs-comment">// Printing the array elements after rotation of elements</span>
 }
 <span class="hljs-built_in">cout</span>&lt;&lt;<span class="hljs-string">"\n"</span>;
 <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h3 id="heading-pseudo-code-2">Pseudo-Code</h3>
<pre><code class="lang-md"><span class="hljs-bullet">1.</span> divide the array into M sets, where M = GCD (numElements, rotationNumber), and then rotate the elements in each set.
<span class="hljs-bullet">2.</span> The number of numElements of the array and rotationNumber to be made to the array, the GCD (numElements, rotationNumber) number of blocks are made.
<span class="hljs-bullet">3.</span> In each block, shifting will occur to the block's corresponding elements.
<span class="hljs-bullet">4.</span> After all the blocks' elements are shifted, the array will be rotated for the given number of times.
</code></pre>
<h4 id="heading-interpretation-2">Interpretation</h4>
<p>This method is not the most optimized method of rotating an array. Also, we have only rotated an array LTR (Left to Right). We can also use the concept of recursion.</p>
]]></content:encoded></item><item><title><![CDATA[So I tried Carbon...]]></title><description><![CDATA[First of all, Happy 75th Independence Day, India!!! 🇮🇳🇮🇳🇮🇳

Carbon (/ˈkɑːb(ə)n/) noun.
The chemical element of atomic number 6, a non-metal with two main forms (diamond and graphite), occurs in impure form in charcoal, soot, and coal.

Carbon i...]]></description><link>https://blog.arygarg.me/carbon</link><guid isPermaLink="true">https://blog.arygarg.me/carbon</guid><category><![CDATA[programming languages]]></category><category><![CDATA[C++]]></category><category><![CDATA[newbie]]></category><category><![CDATA[code]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 15 Aug 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/BR6lrzCPYPk/upload/v1660201167556/je7Wtf-QT.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>First of all, Happy 75th Independence Day, India!!! 🇮🇳🇮🇳🇮🇳</p>
<blockquote>
<p>Carbon <em>(/ˈkɑːb(ə)n/)</em> <em>noun.</em></p>
<p>The chemical element of atomic number 6, a non-metal with two main forms (diamond and graphite), occurs in impure form in charcoal, soot, and coal.</p>
</blockquote>
<p>Carbon isn't just atomic number 6, haunting us in General Organic Chemistry, but now is a programming language developed by google to eventually be used over the Legendary (and age-old) language C++ *Dramatic music intensifies*. With the help of a few straightforward programs written in this article, we will try to understand and learn the language. We would also try to understand why Google (did I forget to mention Google made the language) even needed to create Carbon and why it might be time for our old friend C++ to bid farewell.</p>
<h2 id="heading-why-carbon">Why Carbon?</h2>
<p>C++ is a monster of a programming language. First built in 1985, the creator of C++ (Bjarne Stroustrup) has stated, "Within C++, there is a much smaller and cleaner language struggling to get out." The problem with changing or evolving C++ today is that it is an essential part of most code bases that it doesn't make sense to dramatically change how the language functions for the sake of changing technology. C++ focuses more on standardization rather than design functionality because of this reason. Carbon was designed not as a substitute for C++. Rather than an attempt to incrementally evolve C++, it is designed around interoperability with C++ and large-scale adoption and migration for existing C++ codebases and developers.</p>
<p>As part of the documentation of Carbon, it states that this new, more dynamic language would need to have certain features, which include:</p>
<ul>
<li><p>**Performance matching C++**an essential property for our developers.</p>
</li>
<li><p><strong>Seamless, bidirectional interoperability with C++</strong>, such that a library anywhere in an existing C++ stack can adopt Carbon without porting the rest.</p>
</li>
<li><p><strong>A gentle learning curve</strong> with reasonable familiarity for C++ developers.</p>
</li>
<li><p><strong>Comparable expressivity</strong> and support for existing software's design and architecture.</p>
</li>
<li><p><strong>Scalable migration</strong>, with some level of source-to-source translation for idiomatic C++ code.</p>
</li>
</ul>
<p>We can also observe how specific newer languages like Kotlin and Typescript have been a medium for change for their older counterparts (Java and JavaScript, respectively). Google wants to do that with C++ by Introducing Carbon.</p>
<p>Carbon is still an experimental language and by no means ready to be used in real-world applications, and it won't be for a reasonably long time, but it doesn't hurt to try to use it and add another language you know how to "Hello World" in.</p>
<h2 id="heading-installing-carbon-and-its-compiler">Installing Carbon and its Compiler</h2>
<p>Since Carbon is in such early stages of development, it isn't easy to use it on your local machine. For Windows, it's an even longer procedure since we have to install brew through WSL (Windows Subsystem for Linux). Assuming you have WSL installed with Ubuntu as your primary Linux Operating System, we can now move forward with installing brew and then ultimately installing Carbon.</p>
<h3 id="heading-installing-homebrew">Installing Homebrew</h3>
<ol>
<li>Click <a target="_blank" href="https://brew.sh/">here</a> and copy the installation command on the webpage. Open your Ubuntu environment in WSL and paste the command there. [Takes 5-10 mins]</li>
</ol>
<p>You might be required to enter your password during the previous step. I had to run this command twice as I got multiple fatal errors during this step.</p>
<ol>
<li>After the command runs successfully, scroll until you see a section called "Next Steps." Copy and paste the commands in the order they are given into the terminal. They will look like the image below:</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660208306403/jWqO0Yoe2.png" alt="image.png" /></p>
<ol>
<li><p>Next in line is to install the build-essentials of homebrew, which can be done with <code>sudo apt-get install build-essentials</code> [Takes 3-5 mins]</p>
</li>
<li><p>Now, we will install gcc. Install it using the <code>brew install gcc</code> command [Takes 5-10 mins]</p>
</li>
</ol>
<p>I ran into another error where my homebrew-core was "not tapped correctly," which I figured out by running <code>brew doctor</code>. It gave me the necessary commands to fix the error, after which <code>gcc</code> installation was easy as pie.</p>
<h2 id="heading-install-the-carbon-compiler">Install the Carbon Compiler</h2>
<ol>
<li><p>Install the Bazelisk Launcher using <code>brew install bazelisk</code></p>
</li>
<li><p>Install LLVM (Low Level Virtual Machine) using <code>brew install llvm</code> [Takes 5-15 mins]</p>
</li>
<li><p>Add this to your path variable using <code>export PATH="$(brew --prefix llvm)/bin:${PATH}"</code></p>
</li>
<li><p>Since we are doing this installation on Ubuntu, we will also need <code>zlib1g dev</code>, which we can install using <code>sudo apt install zlib1g-dev</code></p>
</li>
<li><p>We now have all the dependencies installed for the compiler. Time to clone the Github Repository using <code>git clone https://github.com/carbon-language/carbon-lang</code></p>
</li>
</ol>
<h2 id="heading-coding-carbon-in-windows-using-vscode">Coding Carbon in Windows using VSCode</h2>
<p>And there we have it, we've successfully installed Carbon on your Linux Machine. Time to port it over to windows, so we don't have to code in the terminal again.</p>
<ol>
<li>Open VS Code and install the Remote Development Extention</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660209485466/avdIWRmWA.png" alt="image.png" /></p>
<ol>
<li>Open Remote Explorer on the left-hand pane and click on the drop-down at the top of the screen. Select "WSL Targets" from the list. Then click on the button beside Ubuntu, as shown in the image.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660209646742/4wrazKWhC.png" alt="Screenshot 2022-08-11 144918.png" /></p>
<ol>
<li><p>Wait for the Home Screen to Install all required dependencies and then go to <code>File &lt; Open Folder &lt; /home/{user}/carbon-lang/</code></p>
</li>
<li><p>Open <code>explorer/testdata/print/format_only.carbon</code> in the VS Code Window. A Hello World Program is displayed!</p>
</li>
<li><p>To see its output run <code>bazel run //explorer -- ./explorer/testdata/print/format_only.carbon</code> and wait a couple minutes. Since Carbon isn't ready to be adopted by the masses, its compilation time is painfully slow, and it took me almost 2 minutes to compile.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1660210143338/Il8xuDedq.png" alt="image.png" /></p>
<h2 id="heading-writing-fizz-buzz-in-carbon">Writing Fizz Buzz in Carbon</h2>
<p>Everyone has heard of Fizz Buzz. If the number is divisible by 2, print "Fizz"; If it's divisible by 3, print "Buzz". If it's divisible by both, print "FizzBuzz"; else, print the number. Here is the implementation in Carbon:</p>
<pre><code class="lang-c++">package ExplorerTest api;

<span class="hljs-function">fn <span class="hljs-title">Main</span><span class="hljs-params">()</span> -&gt; i32</span>{
    var i: <span class="hljs-keyword">auto</span> = <span class="hljs-number">0</span>;
    <span class="hljs-keyword">while</span>(i &lt;= <span class="hljs-number">100</span>){
        <span class="hljs-keyword">if</span> (i % <span class="hljs-number">6</span> == <span class="hljs-number">0</span>){
            Print(<span class="hljs-string">"{0} FizzBuzz"</span>, i);
        }

        <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (i % <span class="hljs-number">2</span> == <span class="hljs-number">0</span>){
            Print(<span class="hljs-string">"{0} Fizz"</span>, i);
        }

        <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (i % <span class="hljs-number">3</span> == <span class="hljs-number">0</span>){
            Print(<span class="hljs-string">"{0} Buzz"</span>, i);
        }

        <span class="hljs-keyword">else</span>{
            Print(<span class="hljs-string">"{0}"</span>, i);
        }
        i = i + <span class="hljs-number">1</span>;
    }

    <span class="hljs-keyword">return</span> <span class="hljs-number">1</span>;
}
</code></pre>
<p>Let us discuss what every line in this program does:</p>
<ul>
<li><code>package ExplorerTest api;</code></li>
</ul>
<p>Base package and acts like <code>#include &lt;iostream&gt;</code> in C++</p>
<ul>
<li><code>fn Main() -&gt; i32</code></li>
</ul>
<p>Our main function which has a return type of Integer 32bit</p>
<ul>
<li><code>var i: auto = 0</code></li>
</ul>
<p>Creates a variable <code>i' with value</code> 0`</p>
<ul>
<li><code>while(i &lt;= 100)</code></li>
</ul>
<p>While Loop to be executed while <code>i &lt; 100</code>. I had originally tried to create a for loop like in C++ [<code>for (int i =0; i &lt;= 10; i++)</code>] but got a syntax error</p>
<p>[COMPILATION ERROR: foo_bar_baz.carbon:4: syntax error, unexpected identifier, expecting COLON]</p>
<ul>
<li><code>if</code> statements</li>
</ul>
<p>They execute their block only if the condition in parenthesis is True.</p>
<ul>
<li><code>Print</code> statements</li>
</ul>
<p>Print the value inside the parenthesis. If variables need to be printed, we can use <code>{num}</code> where <code>num</code> is a number starting from 0. Eg:</p>
<pre><code class="lang-c++">var a: <span class="hljs-keyword">auto</span> = <span class="hljs-number">1</span>;
var b: <span class="hljs-keyword">auto</span> = <span class="hljs-number">5</span>;
var c: <span class="hljs-keyword">auto</span> = <span class="hljs-number">2</span>;
var d: <span class="hljs-keyword">auto</span> = <span class="hljs-number">6</span>;

Print(<span class="hljs-string">"These are Numbers: {0}, {1}, {2}, {3}"</span>, a,b,c,d);
</code></pre>
<p>Will give the output:</p>
<pre><code class="lang-python">These are Numbers: <span class="hljs-number">1</span>, <span class="hljs-number">5</span>, <span class="hljs-number">2</span>, <span class="hljs-number">6</span>
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Carbon is a fantastic Language, it's still in its early stages of development, and it'll hopefully get better over time. One obvious flaw is the lack of documentation. I want to remind you that Carbon is still under development, and Google does not recommend using it for any scaled-up applications. I hope you have as much fun as I did trying to learn Carbon!</p>
]]></content:encoded></item><item><title><![CDATA[What is AI — Working with OpenAI's Models]]></title><description><![CDATA[Artificial Intelligence is often understood as a complicated forte only for those indulged in the field. OpenAI aims to change that with their AI models, which they have made available to the public. In this article, we would go through the setup pro...]]></description><link>https://blog.arygarg.me/what-is-ai-working-with-openais-models</link><guid isPermaLink="true">https://blog.arygarg.me/what-is-ai-working-with-openais-models</guid><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Python]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[APIs]]></category><category><![CDATA[#codenewbies]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 08 Aug 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/0E_vhMVqL9g/upload/v1657786773977/EJcAN010C.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Artificial Intelligence is often understood as a complicated forte only for those indulged in the field. OpenAI aims to change that with their AI models, which they have made available to the public. In this article, we would go through the setup process and implement a few simple applications in a few lines of code!</p>
<h2 id="heading-signing-up-for-openai">Signing up for OpenAI</h2>
<p>The first step to any API service is to <a target="_blank" href="https://beta.openai.com/signup">sign up</a> for their service. After signing up, log back in and view your API key using <a target="_blank" href="https://beta.openai.com/account/api-keys">this</a> link and copy your client secret and paste it into your <code>.env</code> file in the following format</p>
<pre><code class="lang-python">OPENAI_API_KEY=key
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1658050607078/78ncAxC63.png" alt="openai.png" /></p>
<h2 id="heading-writing-your-first-script">Writing your first script</h2>
<p>Now that we have our own API keys, we can work with OpenAI's models. there are a bunch to work with, so I would suggest going through their Engine list using the following function:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> openai
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv
<span class="hljs-keyword">from</span> os <span class="hljs-keyword">import</span> getenv
load_dotenv()
openai.api_key = getenv(<span class="hljs-string">"OPENAI_API_KEY"</span>)
openai.Engine.list()
</code></pre>
<p>Output:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"data"</span>: [
    {
      <span class="hljs-attr">"created"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"id"</span>: <span class="hljs-string">"text-davinci-002"</span>,
      <span class="hljs-attr">"object"</span>: <span class="hljs-string">"engine"</span>,
      <span class="hljs-attr">"owner"</span>: <span class="hljs-string">"openai"</span>,
      <span class="hljs-attr">"permissions"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"ready"</span>: <span class="hljs-literal">true</span>
    },
    {
      <span class="hljs-attr">"created"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"id"</span>: <span class="hljs-string">"text-ada-001"</span>,
      <span class="hljs-attr">"object"</span>: <span class="hljs-string">"engine"</span>,
      <span class="hljs-attr">"owner"</span>: <span class="hljs-string">"openai"</span>,
      <span class="hljs-attr">"permissions"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"ready"</span>: <span class="hljs-literal">true</span>
    },

    {
      <span class="hljs-attr">"created"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"id"</span>: <span class="hljs-string">"babbage-search-document"</span>,
      <span class="hljs-attr">"object"</span>: <span class="hljs-string">"engine"</span>,
      <span class="hljs-attr">"owner"</span>: <span class="hljs-string">"openai-dev"</span>,
      <span class="hljs-attr">"permissions"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"ready"</span>: <span class="hljs-literal">true</span>
    },
...
...
...
    {
      <span class="hljs-attr">"created"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"id"</span>: <span class="hljs-string">"text-babbage-001"</span>,
      <span class="hljs-attr">"object"</span>: <span class="hljs-string">"engine"</span>,
      <span class="hljs-attr">"owner"</span>: <span class="hljs-string">"openai"</span>,
      <span class="hljs-attr">"permissions"</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">"ready"</span>: <span class="hljs-literal">true</span>
    },

  ],
  <span class="hljs-attr">"object"</span>: <span class="hljs-string">"list"</span>
}
</code></pre>
<p>Although there are about 50 models trained for all your needs, we will be using only a few in this tutorial.</p>
<p><strong>Note: These APIs are not a free service, and you will be charged for each API call depending on the service and how many tokens are utilized. Open AI does provide $18.00 of credit for trials.</strong></p>
<h2 id="heading-completing-a-prompt">Completing a prompt:</h2>
<p>Write or copy the following code into your script</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> openai
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv
<span class="hljs-keyword">from</span> os <span class="hljs-keyword">import</span> getenv

prompt = <span class="hljs-string">"Maybe I just need sleep."</span>

load_dotenv()
openai.api_key = getenv(<span class="hljs-string">"OPENAI_API_KEY"</span>)

response = openai.Completion.create(
  model=<span class="hljs-string">"text-davinci-002"</span>,
  prompt=prompt,
  temperature=<span class="hljs-number">0.3</span>,
  max_tokens=<span class="hljs-number">30</span>,
  top_p=<span class="hljs-number">1.0</span>,
  frequency_penalty=<span class="hljs-number">0.5</span>,
  presence_penalty=<span class="hljs-number">0.2</span>
)
print(response)

print(prompt + <span class="hljs-string">"..."</span> + response.choices[<span class="hljs-number">0</span>].text)
</code></pre>
<p>The following code will utilize OpenAI's DaVinci Engine on the Prompt "Maybe I just need sleep." It will automatically generate an output with what it seems appropriate</p>
<p>Output:</p>
<pre><code class="lang-python">{
  <span class="hljs-string">"choices"</span>: [
    {
      <span class="hljs-string">"finish_reason"</span>: <span class="hljs-string">"stop"</span>,
      <span class="hljs-string">"index"</span>: <span class="hljs-number">0</span>,
      <span class="hljs-string">"logprobs"</span>: null,
      <span class="hljs-string">"text"</span>: <span class="hljs-string">".\n\n\n\nI'm not sure.\n\n\n\nI'm not sure.\n\n\n\nMaybe I just need sleep."</span>
    }
  ],
  <span class="hljs-string">"created"</span>: <span class="hljs-number">1658052132</span>,
  <span class="hljs-string">"id"</span>: <span class="hljs-string">"cmpl-5UvOGBy3igAZfFuyXofhCoWrQE40w"</span>,
  <span class="hljs-string">"model"</span>: <span class="hljs-string">"text-davinci-002"</span>,
  <span class="hljs-string">"object"</span>: <span class="hljs-string">"text_completion"</span>,
  <span class="hljs-string">"usage"</span>: {
    <span class="hljs-string">"completion_tokens"</span>: <span class="hljs-number">30</span>,
    <span class="hljs-string">"prompt_tokens"</span>: <span class="hljs-number">5</span>,
    <span class="hljs-string">"total_tokens"</span>: <span class="hljs-number">35</span>
  }
}
Maybe I just need sleep....



I<span class="hljs-string">'m not sure.



I'</span>m <span class="hljs-keyword">not</span> sure.



Maybe I just need sleep.
</code></pre>
<p>We can see that this prompt took up 35 tokens from our credit, this can also be verified on the OpenAI dashboard <a target="_blank" href="https://beta.openai.com/account/usage">here</a></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1658052426915/l65MrU5Wb.png" alt="image.png" /></p>
<h2 id="heading-playing-with-the-api">Playing with the API</h2>
<p>Open AI can also write unique ads, we will only be showcasing the response variable from now along with the output it generates.</p>
<blockquote>
<p>Prompt: "Create an ad campaign targeted at orphans from the following prompt:\nPrompt: A Phone where you can only call your mother"</p>
<p>Response: "A phone where you can only call your mother"</p>
<p>Orphans need love too. And with this new phone, they can stay connected to the one person who loves them the most: their mother. With this new phone, they can call their mother anytime, anywhere. So don't forget about the orphans when you're shopping for your new phone. They need your love too.</p>
</blockquote>
<p>Here is OpenAI trying to tell a joke:</p>
<blockquote>
<p>Prompt: "Tell me a programming joke"</p>
<p>Response: Why do programmers always have to write code? Because without code, there would be nothing to debug!</p>
</blockquote>
<p>So AI jokes might not be my cup of tea, but at least I can enjoy an AI-generated poem.</p>
<blockquote>
<p>Prompt: "Write a poem on cotton in 30 words or less: "</p>
<p>Response: Cotton is a soft, fluffy material that is often used to make clothing. It is also used to stuff pillows and plush toys.</p>
</blockquote>
<p>I don't know how I feel about AI writing totally unique pieces of text, but it is something special.</p>
<p>Lets make a simple chat bot</p>
<blockquote>
<p>Prompt: "I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with "Unknown".</p>
<p>Q: What is human life expectancy in the United States? A: Human life expectancy in the United States is 78 years.</p>
<p>Q: Who was president of the United States in 1955? A: Dwight D. Eisenhower was president of the United States in 1955.</p>
<p>Q: Which party did he belong to? A: He belonged to the Republican Party.</p>
<p>Q: What is the square root of banana? A: Unknown</p>
<p>Q: How does a telescope work? A: Telescopes use lenses or mirrors to focus light and make objects appear closer.</p>
<p>Q: How many squigs are in a bonk? A: Unknown</p>
<p>Q: Where is the Taj Mahal? A:"</p>
<p>Response:</p>
<p>The Taj Mahal is located in Agra, India.</p>
</blockquote>
<p>Convert Python code into C++ code</p>
<blockquote>
<p>Prompt: "Convert the following python code into C++\narr = [1,7,5,3]\narr.sort()\nprint(arr)"</p>
</blockquote>
<p>arr = [1,7,5,3] arr.sort() print(arr)</p>
<pre><code class="lang-python">
&gt; Response: 
&gt;```c++
std::vector&lt;int&gt; arr = {<span class="hljs-number">1</span>,<span class="hljs-number">7</span>,<span class="hljs-number">5</span>,<span class="hljs-number">3</span>};
std::sort(arr.begin(), arr.end());
<span class="hljs-keyword">for</span>(int i : arr) {
    std::cout &lt;&lt; i &lt;&lt; <span class="hljs-string">" "</span>;
}
</code></pre>
<p>Your Imagination is the limit for this revolutionizing piece of tech. Try out different prompts and be to explore what the API can do, and, more importantly, what it can't do. Make sure to look at OpenAI's other projects, such as Dall-e, a next-generation AI image generator!</p>
]]></content:encoded></item><item><title><![CDATA[Scraping Reddit — One Subreddit at a Time]]></title><description><![CDATA[Continuing our tradition of using APIs to solve problems no one ever had, we've come to Reddit. Launched more than 17 years ago, Reddit is where everyone goes to discuss. It's a mega-forum, and today we're going to be getting data just because we can...]]></description><link>https://blog.arygarg.me/scraping-reddit-one-subreddit-at-a-time</link><guid isPermaLink="true">https://blog.arygarg.me/scraping-reddit-one-subreddit-at-a-time</guid><category><![CDATA[APIs]]></category><category><![CDATA[Python]]></category><category><![CDATA[newbie]]></category><category><![CDATA[reddit]]></category><category><![CDATA[python projects]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 01 Aug 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1656943266101/eeQndl1TA.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Continuing our tradition of using APIs to solve problems no one ever had, we've come to Reddit. Launched more than 17 years ago, Reddit is where everyone goes to discuss. It's a mega-forum, and today we're going to be getting data just because we can!</p>
<p>We will use PRAW (Python Reddit API Wrapper) to scrape our way through Reddit. Traditionally, we would have used a scraping package like Selenium or BeautifulSoup, but PRAW has simplified getting information from Reddit.</p>
<h2 id="heading-step-1-getting-our-credentials">Step 1: Getting our Credentials</h2>
<p>First, we would need credentials to help us access the Reddit API. Log-in to <a target="_blank" href="https://www.reddit.com/prefs/apps">this</a> link, and at the bottom, you will see a button labeled "create another app", click on it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656943894950/OdtgZot1-.png" alt="reddit-2.jpg.png" /></p>
<p>Next, you'll have to give your script a name and fill out a description. Once that's done, make sure to select the "script" option and then make sure to put the following into the redirect URI box: http://localhost:8080. This is suggested by the PRAW docs but is necessary because Reddit requires a redirect URI even if our application doesn't use it. Your "personal use script" can be found near the top left corner, directly under where it says "personal use script." The next thing you'll need is the "secret." It should be listed below the "personal use script." With these two seemingly random strings, we can start using PRAW. Please note that PRAW can also be used for posting to Reddit, but I would not be covering that in this article.</p>
<h2 id="heading-step-2-creating-a-virtual-environment">Step 2: Creating a Virtual Environment</h2>
<p>Follow the steps given in <a target="_blank" href="https://aryan401.hashnode.dev/api-avalanche-all-about-apis">this</a> article, and then continue with this tutorial</p>
<h2 id="heading-step-3-time-to-code">Step 3: Time to Code</h2>
<p>The first step in any project is to install our dependencies, this time, we're going to be installing PRAW as a package in Python. To install the required package enter the following pip command into the terminal</p>
<pre><code class="lang-bash">pip install praw
</code></pre>
<p>Next, we will be setting up our .env file, which will house all the credentials for the bot. Ensure its name is <code>.env</code> and is in the working directory. It should also have this format.</p>
<pre><code class="lang-python">CLIENT_ID=client id here
CLIENT_SECRET=client secret here
USER_AGENT=&lt;platform&gt;:&lt;app ID&gt;:&lt;version string&gt; (by u/&lt;Reddit username&gt;)
</code></pre>
<p>You can also do this using a <code>praw.ini</code> file, but I prefer to leave everything in my <code>.env</code></p>
<p>Now we've our credentials stored, let's start with our <code>main.py</code></p>
<pre><code class="lang-Python"><span class="hljs-keyword">from</span> os <span class="hljs-keyword">import</span> getenv
<span class="hljs-keyword">import</span> praw  <span class="hljs-comment"># pip install praw</span>
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv  <span class="hljs-comment"># pip install python-dotenv</span>

reddit = praw.Reddit(
    client_id=getenv(<span class="hljs-string">'CLIENT_ID'</span>),
    client_secret=getenv(<span class="hljs-string">'CLIENT_SECRET'</span>),
    user_agent=getenv(<span class="hljs-string">"USER_AGENT"</span>),
)
</code></pre>
<p>This boilerplate code would be present in every PRAW file for a read-only instance.</p>
<p>Now let's get into the magical aspects of this API Wrapper.</p>
<h2 id="heading-step-4-explore-the-api">Step 4: Explore the API</h2>
<p>Let's first check if we're in read-only mode, so we don't cause any unwanted errors</p>
<pre><code class="lang-python">reddit.read_only()
<span class="hljs-comment">#returns True</span>
</code></pre>
<p>25 "Hottest" Submissions from r/python</p>
<pre><code class="lang-python"><span class="hljs-keyword">for</span> submission <span class="hljs-keyword">in</span> reddit.subreddit(<span class="hljs-string">"python"</span>).hot(limit=<span class="hljs-number">25</span>):
    print(submission.title)

<span class="hljs-comment"># Output: 25 submissions</span>
</code></pre>
<p>We can also pass a string in the format <code>subreddit1+subreddit2+subreddit3...</code> Its syntax would look like this:</p>
<pre><code class="lang-python">subreddit = reddit.subreddit(<span class="hljs-string">"python+reddit+java+discord"</span>)
<span class="hljs-keyword">for</span> submission <span class="hljs-keyword">in</span> subreddit.top(limit=<span class="hljs-number">25</span>, time_filter=<span class="hljs-string">"day"</span>):
    print(submission.title)
<span class="hljs-comment"># Output: 25 submissions according to score</span>
</code></pre>
<h3 id="heading-working-with-comments">Working with Comments</h3>
<p>Like Submissions are a subclass of a Subreddit, Comments can be considered a subclass of a Submission. To access comments, we have to have a submission in memory.</p>
<pre><code class="lang-python">submission = reddit.submission(<span class="hljs-string">"vnsq8s"</span>)  <span class="hljs-comment"># Every Reddit Submission has its own ID</span>
submission.comment_sort = <span class="hljs-string">"new"</span>
top_level_comments = list(submission.comments)[:<span class="hljs-number">25</span>]
<span class="hljs-keyword">for</span> comment <span class="hljs-keyword">in</span> top_level_comments:
    print(<span class="hljs-string">f"<span class="hljs-subst">{comment.author}</span> wrote: <span class="hljs-subst">{comment.body}</span> at <span class="hljs-subst">{comment.created_utc}</span>"</span>)
</code></pre>
<p>Like all APIs, you only get better by using it, <a target="_blank" href="https://praw.readthedocs.io/en/stable/index.html">here</a> is the link to the Documentation, which you can use to refer for more attributes and find out about posting using the wrapper.</p>
<p>Cheers, and keep API-ing!</p>
]]></content:encoded></item><item><title><![CDATA[Understanding Big-O Notation]]></title><description><![CDATA[Let's start with a fundamental question. What is Big-O notation? What was the first thing that popped up in your head?

If you're like me and are just now diving deep into the world of Computer Science, you'd be surprised by how fast a computer can c...]]></description><link>https://blog.arygarg.me/understanding-big-o-notation</link><guid isPermaLink="true">https://blog.arygarg.me/understanding-big-o-notation</guid><category><![CDATA[algorithms]]></category><category><![CDATA[2Articles1Week]]></category><category><![CDATA[Python]]></category><category><![CDATA[basics]]></category><category><![CDATA[newbie]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 25 Jul 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/h7FMJugpcfs/upload/v1657016496987/Qk7aSgoAY.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>Let's start with a fundamental question. What is Big-O notation? What was the first thing that popped up in your head?</p>
</blockquote>
<p>If you're like me and are just now diving deep into the world of Computer Science, you'd be surprised by how fast a computer can <em>compute</em> thousands, if not millions of lines in a single second.</p>
<p>When you write an algorithm, even a simple one that sorts data, you are somewhat using the concept of Big-O notation. In simple terms, Big-O is how hard the computer has to work to complete the set of Instructions issued by you, the programmer.</p>
<p>Formally defining Big-O notation in mathematical terms would look like</p>
<blockquote>
<p>f(N) = O(g(N)), if there exists a positive constant c, N* such that: f(N) &lt;= c * g(N) &gt;= N* | Where N is the number of inputs</p>
</blockquote>
<p>As we increase the complexity of the program, we will see an increase in time taken to finish (Time Complexity), an increase in Memory Utilisation (Space Complexity), or both.</p>
<h1 id="heading-understanding-big-o-notation">Understanding Big-O Notation</h1>
<h3 id="heading-o1-constant-time">O(1) — Constant Time</h3>
<p>Often assumed the best notation since time does not vary with the number of inputs. Constant time algorithms will always take the same amount of time to be executed. Accessing a value in an indexable array is the best example.</p>
<pre><code class="lang-python">arr = [<span class="hljs-number">1</span>,<span class="hljs-number">5</span>,<span class="hljs-number">2</span>,<span class="hljs-number">6</span>,<span class="hljs-number">8</span>,<span class="hljs-number">3</span>,<span class="hljs-number">9</span>]
val = arr[<span class="hljs-number">3</span>]
print(val)

<span class="hljs-comment">#Output: 6 - O(1)</span>
</code></pre>
<h3 id="heading-on-linear-time">O(n) — Linear Time</h3>
<p>An algorithm has linear complexity if the execution time varies linearly with the number of inputs. As the number of inputs increases, so will the time taken to complete. Indexing through an array varies linearly with time.</p>
<pre><code class="lang-python">arr = [<span class="hljs-number">1</span>,<span class="hljs-number">5</span>,<span class="hljs-number">2</span>,<span class="hljs-number">6</span>,<span class="hljs-number">8</span>,<span class="hljs-number">3</span>,<span class="hljs-number">9</span>]
find_num = <span class="hljs-number">3</span>
<span class="hljs-keyword">for</span> num <span class="hljs-keyword">in</span> arr:
    <span class="hljs-keyword">if</span> find_num == num:
        print(<span class="hljs-string">f"Found the number <span class="hljs-subst">{find_num}</span> in the array"</span>)

<span class="hljs-comment"># Output: Found the number 3 in the array - O(n)</span>
</code></pre>
<h3 id="heading-olog-n-logarithmic-time">O(log n) — Logarithmic Time</h3>
<p>An algorithm has logarithmic time complexity if the time it takes to run the algorithm is proportional to the logarithm of the input size n. An example is binary search.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">binarySearch</span>(<span class="hljs-params">array, item</span>):</span>
    first = <span class="hljs-number">0</span>
    last = len(array)<span class="hljs-number">-1</span>
    found = <span class="hljs-literal">False</span>

    <span class="hljs-keyword">while</span> first &lt;= last <span class="hljs-keyword">and</span> <span class="hljs-keyword">not</span> found:
        midpoint = (first + last)//<span class="hljs-number">2</span>
        <span class="hljs-keyword">if</span> array[midpoint] == item:
            found = <span class="hljs-literal">True</span>
        <span class="hljs-keyword">else</span>:
            <span class="hljs-keyword">if</span> item &lt; array[midpoint]:
            last = midpoint<span class="hljs-number">-1</span>
            <span class="hljs-keyword">else</span>:
                first = midpoint+<span class="hljs-number">1</span>

    <span class="hljs-keyword">return</span> found
</code></pre>
<h3 id="heading-on2-quadratic-time">O(n^2) — Quadratic Time</h3>
<p>An algorithm has quadratic time complexity if the time to execute is proportional to the square of the input size.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">quadratic</span>(<span class="hljs-params">items</span>):</span>
    <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> items:
        <span class="hljs-keyword">for</span> item2 <span class="hljs-keyword">in</span> items:
            print(item, <span class="hljs-string">' '</span> ,item2)

quadratic([<span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>, <span class="hljs-number">8</span>])

<span class="hljs-comment"># O(2^n)</span>
</code></pre>
<h1 id="heading-simplifying-big-o">Simplifying Big-O</h1>
<p>Often, our algorithms can not be simplified into a simple expression like O(n*log n) and are usually a combination of various notations added together. To simplify the expression, we only look at the most massive expression. For example, if a function has the complexity of <code>O(2^n) + O(n)</code>, taking an example of 10 terms being passed through, we get <code>2^10 + 10 = 1034</code>, here <code>O(2^n)</code> contributes majorly to the expression so we can simplify the expression to only O(2^n) </p>
<h1 id="heading-terms-related-to-big-o">Terms related to Big-O</h1>
<p>Here is a list of terms that are often associated with Big-O</p>
<ul>
<li><em>Big O</em> : (O) describes the upper bound of the complexity.</li>
<li><em>Omega</em> : (Ω) describes the lower bound of the complexity.</li>
<li><em>Theta </em>: (Θ) describes the exact bound of the complexity.</li>
<li><em>Little O</em> : (o) describes the upper bound excluding the exact one.</li>
</ul>
<p>This was just a brief introduction to Big-O. There is so much more to Big-O than making well-optimized code. Click <a target="_blank" href="https://www.bigocheatsheet.com/">here</a> for a Big-O Cheat-Sheet.</p>
]]></content:encoded></item><item><title><![CDATA[Learning to Use the Twitter API v2.0 [2022]]]></title><description><![CDATA[An Introduction
In this article, I will show you how you can get started quickly with the new Twitter API v2. It includes new features like:

Improvements to the response objects

Support for getting Twitter polls data in the API

Tweet annotations a...]]></description><link>https://blog.arygarg.me/learning-to-use-the-twitter-api-v20-2022</link><guid isPermaLink="true">https://blog.arygarg.me/learning-to-use-the-twitter-api-v20-2022</guid><category><![CDATA[Twitter]]></category><category><![CDATA[Python]]></category><category><![CDATA[APIs]]></category><category><![CDATA[newbie]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 18 Jul 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/Jm1YUfYjpHI/upload/v1656931162358/iDkTlv4MR.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-an-introduction">An Introduction</h1>
<p>In this article, I will show you how you can get started quickly with the new <code>Twitter API v2</code>. It includes new features like:</p>
<ul>
<li><p>Improvements to the response objects</p>
</li>
<li><p>Support for getting Twitter polls data in the API</p>
</li>
<li><p>Tweet annotations and Conversation Threads</p>
</li>
</ul>
<h2 id="heading-step-1-creating-a-developer-account-on-twitter">Step #1: Creating a Developer Account on Twitter</h2>
<p>You need a developer account to get started with the new Twitter API. If you do not have one, you can sign up for one <a target="_blank" href="https://developer.twitter.com/en">here</a>.</p>
<h2 id="heading-step-2-creating-a-project-and-app">Step #2: Creating a Project and App</h2>
<p>Next, go to your <a target="_blank" href="https://developer.twitter.com/en/portal/dashboard">dashboard</a>, and under Projects &amp; Apps &lt; Overview Click on "Add App".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656932013127/-s-4VJ-7n.png" alt="dev_tw_start.png" /></p>
<p>On the next page, click "Development" and Next. It doesn't matter what you do. You're allowed to create an App for each Option.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656932138869/gfqUcJkx-.png" alt="dev_tw_start-2.png" /></p>
<p>Select an App name on the next screen and Click Next.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656932226149/mPbxGn-5_.png" alt="dev_tw_start-3.png" /></p>
<p>On the Next Page, Copy all the credentials into a text file for use in the future.</p>
<p>You'll have to apply for Elevated Access <a target="_blank" href="https://developer.twitter.com/en/portal/products/elevated">here</a></p>
<p>Next, on your Project Page, scroll down and click on "Edit" under "User Authentication Settings." Toggle the "OAuth 2.0" setting; Type of App as "Automated App or Bot" Click on save</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656937556120/uq4e67Qvh.png" alt="dev_tw_start-5.png" /></p>
<h1 id="heading-and-youre-done-with-setting-up-your-developer-account-for-twitter">And you're done with setting up your Developer Account for Twitter!</h1>
<p>Now the next thing to do is to ask for data from Twitter. We're going to be doing the next part in Python, but you can do this in basically any modern language.</p>
<h2 id="heading-step-3-create-a-virtual-environment-in-python">Step 3: Create a Virtual Environment in Python</h2>
<p>Follow the steps given in <a target="_blank" href="https://aryan401.hashnode.dev/virtual-environments-youre-gonna-need-them">this</a> article, and then continue with this tutorial</p>
<h2 id="heading-step-4-time-to-code">Step 4: Time To Code</h2>
<p>Before we start coding, let's load some dependencies into our project. We will be using <code>tweepy</code> as a wrapper between Twitter and our code. Wrappers are just another layer between two pieces of code that will help them communicate with each other. Feel free to use your IDE, and I prefer PyCharm.</p>
<pre><code class="lang-bash">pip install tweepy==4.10.0
</code></pre>
<pre><code class="lang-Python"><span class="hljs-keyword">import</span> tweepy
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv  <span class="hljs-comment"># pip install python-dotenv</span>
<span class="hljs-keyword">from</span> os <span class="hljs-keyword">import</span> getenv

load_dotenv()

client = tweepy.Client(consumer_key=getenv(<span class="hljs-string">'CONSUMER_KEY'</span>),
                       consumer_secret=getenv(<span class="hljs-string">'CONSUMER_SECRET'</span>),
                       access_token=getenv(<span class="hljs-string">"ACCESS_TOKEN"</span>),
                       access_token_secret=getenv(<span class="hljs-string">"ACCESS_SECRET"</span>))
</code></pre>
<p>Make sure you keep your credentials in a .env file in the following format and place it in the same directory as your code:</p>
<pre><code class="lang-python">CONSUMER_KEY=twitter consumer key here
CONSUMER_SECRET=twitter consumer secret here
ACCESS_TOKEN=twitter access token here
ACCESS_SECRET=twitter access secret here
</code></pre>
<p>Lets try printing out tweets from your timeline:</p>
<pre><code class="lang-python">home_tweet = client.get_home_tweet()
<span class="hljs-keyword">for</span> tweet <span class="hljs-keyword">in</span> home_tweet.data:
    print(str(tweet).encode(<span class="hljs-string">'utf-8'</span>))
</code></pre>
<p>Gives the Output: (Forgive my Twitter Feed)</p>
<pre><code class="lang-python"><span class="hljs-string">b'Pentagon finds concerning vulnerabilities on blockchain (18835)\nVia:https://t.co/dHK6HMbGpm'</span>
...
<span class="hljs-string">b'NEW VIDEO - A first look and hands-on with the Nothing Phone, which looks\xe2\x80\xa6 pretty neat, actually\n\nhttps://t.co/Rbo9Fxvqk6 https://t.co/FcUB3jTEYK'</span>
</code></pre>
<p>which returned 88 Tweets</p>
<p>Likewise, we can also tweet from the API</p>
<pre><code class="lang-python">tweet = client.create_tweet(text=<span class="hljs-string">"Hello World, I am using Tweepy"</span>)
<span class="hljs-comment">#tweet is a dictionary with tweet id and metadata</span>
</code></pre>
<p>For a full range of API calls, you can check the documentation <a target="_blank" href="https://docs.tweepy.org/en/stable/client.html#tweets">here</a></p>
<p>A side bonus of Tweeting using the API is that your tweets get a custom Source Message Like this tweet here:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://twitter.com/Aryan_401/status/1543931162036703232">https://twitter.com/Aryan_401/status/1543931162036703232</a></div>
<p> </p>
<p>And don't forget to comment if you have any questions! See you next time on API-city.</p>
]]></content:encoded></item><item><title><![CDATA[Virtual Environments — You're Gonna need em]]></title><description><![CDATA[Virtual Environments are a crucial aspect of python, which allows you to isolate various instances of the language into their container to be used independently. This article will be referenced a lot so keep it handy.
Installing virtualenv
pip instal...]]></description><link>https://blog.arygarg.me/virtual-environments-youre-gonna-need-em</link><guid isPermaLink="true">https://blog.arygarg.me/virtual-environments-youre-gonna-need-em</guid><category><![CDATA[Python]]></category><category><![CDATA[basics]]></category><category><![CDATA[newbie]]></category><dc:creator><![CDATA[Aryan Garg]]></dc:creator><pubDate>Mon, 11 Jul 2022 11:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/npxXWgQ33ZQ/upload/v1656963491079/ZYgW9E7VS.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Virtual Environments are a crucial aspect of python, which allows you to isolate various instances of the language into their container to be used independently. This article will be referenced a lot so keep it handy.</p>
<h5 id="heading-installing-virtualenv">Installing virtualenv</h5>
<pre><code><span class="hljs-attribute">pip</span> install virtualenv
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656927981450/qBVKGBI2p.png" alt="install_ve.png" /></p>
<p>Test your installation:</p>
<pre><code>virtualenv <span class="hljs-operator">-</span><span class="hljs-operator">-</span>version
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656927960524/ePr_YP6h-.png" alt="verify_ve.png" />
To create a virtualenv, we can use the following command.</p>
<pre><code><span class="hljs-attribute">virtualenv</span> name_of_project
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656927948971/lmIWlUCvD.png" alt="create_ve.png" /></p>
<p>After running this command, a directory named <code>name_of_project</code> will be created. This directory contains all the necessary executables to use the packages that a Python project would need. </p>
<p>Now to activate the virtual environment, we can run the following command. Remember to re-activate the environment whenever you exit it to work on something else.</p>
<pre><code>name_of_project\Scripts\activate
</code></pre><p>Once the virtual environment is activated, the name of your virtual environment will appear on the left side of the terminal. This will let you know that the virtual environment is currently active.
Now you can install dependencies related to the project in this virtual environment. For example, if you use asyncio for a project, you can install it like other packages.</p>
<pre><code><span class="hljs-attribute">pip</span> install asyncio
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656927996738/mzFVitqr7.png" alt="ve.png" />
To Deactivate the virtual environment, we can run:</p>
<pre><code>deactivate
</code></pre><p>which will switch to the default Python Installation that you have had been using</p>
]]></content:encoded></item></channel></rss>