Django Project 2
- Contextual advertising is a technology that finds our relevant ads from a given blog article to maximise a blog or websites revenue.
- Contextual advertising is the reason why you see an ad for nike shoes on a fitness related article.
- In this project I have build a contextual advertising platform which reads data from any blog who's URL we pass in, find relevant keywords on that blog and find ads which are relevant to them and all of this is done automatically.
- First we create a basic Django app that could accept a blog URL, then read all the data on that blog page using the requests library and parse the data using BeautifulSoup.
- We then feed the parsed data to the rake library which then finds the most relevant and prominent keywords in that blog article and save them.
- These relevant keywords are then matched with the ads present in our database and gives us back the ads which are most relevant to the blog post.
- Also using Tailwind to style up the web app.
- Python : Programming language.
- Django : For web app.
- Requests: For making HTTP request to blog pages.
- BeautifulSoup: To parse webpages
- RakeNLTK : To find relevant keywords
- Python x.x < 3.9
- Django
pip install django==3.2
- requests (Python library for making HTTP requests)
pip install requests
- BeautifulSoup (Python library for for extracting data from HTML and XML documents. beautifulsoup 4 is the lastest version)
pip install beautifulsoup4
- RAKE --Rapid Automatic Keyword Extraction (Its is an NLP tool), RAKE is implemented in the Python Natural Language Toolkit (NLTK) library.
pip install rake-nltk
- Node.js (In order to set-up and install Tailwind), NPM for tailwind --> a) https://nodejs.org/en/ (download the LTS version), after installsation check if it is installed by running command (node --version) --> b) tailwind
npm install tailwindcss@2.2.16
- Homepage
- Providing a link for demo and clicking on sumbit.
- Relevant keywords are then matched with the ads present in our database and gives us back the ads which are most relevant to the blog post.
- requests is a Python library for making HTTP requests. It provides a simple and easy-to-use interface for sending HTTP requests and receiving HTTP responses.
- With requests, you can send various types of HTTP requests, including GET, POST, PUT, DELETE, and more.
- We can also send requests with different HTTP headers and parameters, and you can handle responses with different HTTP status codes.
- Beautiful Soup is a Python library for extracting data from HTML and XML documents.
- It provides a simple and efficient way to parse and navigate documents, and to extract data from them.
- Beautiful Soup converts an HTML or XML document into a tree-like structure, with each element in the document represented as a node in the tree.
- We can use the various methods provided by Beautiful Soup to navigate and search the tree, extract data, and modify the document.
Some of the features of Beautiful Soup include:
--> Support for parsing HTML and XML documents.
--> Support for searching and navigating the document tree using various search criteria.
--> Support for extracting data from the document using tag names, attribute values, and CSS classes.
--> Support for modifying the document tree and writing the modified document back to a file.
- Important function of BeautifulSoup library used in the project
-
find_all()
--> The find_all() function is a method of the BeautifulSoup object that allows you to search for all occurrences of a particular HTML or XML element in the document.
--> It returns a list of elements that match the search criteria. -
get_text()
--> The get_text() method is a method of the Tag object in Beautiful Soup that allows you to extract the text contents of the tag.
--> It returns a string containing the text of the tag, including any text contained within child tags.
- RAKE is a natural language processing (NLP) tool that is used to extract keywords and phrases from a text document.
- It is based on the idea of identifying candidate keywords by analyzing the co-occurrence of words in the document and scoring them based on their frequency and distinctiveness.
- RAKE is implemented in the Python Natural Language Toolkit (NLTK) library.
- To use RAKE in a Python project, you will need to install the NLTK library and import the RAKE module.