Sunday, March 12, 2023

Conversion tools and difference checkers

Conversion tools:
TBX convert: On this page, you can convert between several glossary filetypes: UTX-Simple, GlossML, TBXGlossary,
OLIF. TBX (TermBase eXchange) is a family of XML-based languages for the interchange of
terminological information (called TMLs, for Terminological Markup Language; also informally called “dialects” of TBX). All of TBX shares a core structure, in which information is represented on one of three structural levels: concept, language, and term.
UTF-16 to UTF-8 Converter
Glossary converter allows to convert between MultiTerm Termbases and other terminology formats by simple drag and drop, with minimal user interaction. It supports xls, xlsx, csv, txt, tbx, utx, multiterm export files and tmx.
TBX Utilities: This is a collection of tools to be used in working with Term Base eXchange (TBX); an open, XML based standard for exchanging structured terminological data submitted for adoption under ISO 30042 Technical Committee 37.
TBX Resources: TBX Resources is dedicated to helping you use the industry-standard TBX format with your terminological data. Here you’ll find tutorials and tools for using and converting to and from TBX.
Other TBX downloads and tools
Converting TBX files to XLS/CSV format
TXT
AntFile Converter: A freeware tool to convert PDF and Word (DOCX) files into plain text for use in corpus tools like AntConc.
EncodeAnt is a freeware character encoding detection and conversion tool. EncodeAnt takes an input list of text files (e.g. .txt) and attempts to auto-detect the character encoding that the files use. The character encoding can also be set manually. EncodeAnt also has an option to auto-convert the character encoding of the files to UTF-8, which is a standard used in most corpus research. The converted files are saved in a separate folder leaving the original files untouched.
Difference checkers:
Winmerge.org: WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can
compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
DiffEngineX is a fast and scalable compare utility that finds the differences between the formulae, constants, defined names, cell comments and Visual Basic VBA code contained in either two whole Excel workbooks or selected worksheets on Windows. It can align similar rows and columns across two different Excel spreadsheets. It works with xls, xlsx, xlsm and xlsb files. xla and xlam add-ins need to be converted first into xls and xlsm files before DiffEngineX can compare them. Excel 2003, 2007, 2010 or 2013 is required for this spreadsheet comparison tool to work.
ExcelDiff analyzes multiple Microsoft Excel(.csv, .xls, .xlsx, .xlsm, .xlsb) files and shows their differences graphically, even clarifies cell-level.
KDiff3

Source: inmyownterms.com

Flask Ressources

Framework

Connexion - Swagger/OpenAPI First framework for Python on top of Flask with automatic endpoint validation and OAuth2 support
Flask-MongoRest - Restful API framework wrapped around MongoEngine
Eve - REST API framework powered by Flask, MongoDB and good intentions
Flask-Restless - A Flask extension for creating simple ReSTful APIs from SQLAlchemy models
Flask-RESTful - Simple framework for creating REST APIs
Flask-RestPlus - syntaxic sugar, helpers and automatically generated Swagger documentation.
Flask-Potion - RESTful API framework for Flask and SQLAlchemy
Zappa - Build and deploy server-less Flask applications on AWS Lambda and API Gateway

Admin interface

Flask-Admin - Simple and extensible administrative interface framework for Flask

Analytics

Flask-Analytics - Analytics snippets generator extension for the Flask framework
Flask-Matomo - Track requests to your Flask website with Matomo

Authentication

Flask-Security - Quick and simple security for Flask applications
Flask-Login - Flask user session management
Flask-User - Customizable user account management for Flask
Flask-HTTPAuth - Simple extension that provides Basic and Digest HTTP authentication for Flask routes
Flask-Praetorian - Strong, Simple, and Precise security for Flask APIs (using jwt)

Authorization

Authlib - Authlib is an ambitious authentication library for OAuth 1, OAuth 2, OpenID clients, servers and more.
Authomatic - Authomatic provides out of the box support for a number of providers using OAuth 1.0a (Twitter, Tumblr and more) and OAuth 2.0 (Facebook, Foursquare, GitHub, Google, LinkedIn, PayPal and more)
Flask-Pundit - Extension based on Rails' Pundit gem that provides easy way to organize access control for your models
Flask-Dance - OAuth consumer extension for Flask, shipped with pre-set support for Facebook, GitHub, Google, etc.

Database

Flask-MongoEngine - MongoEngine flask extension with WTF model forms support
Flask-SQLAlchemy - Adds SQLAlchemy support to Flask

Database Migrations

Flask-Migrate - SQLAlchemy database migrations for Flask applications using Alembic

Session

Flask-Session - Server side session extension for Flask

Cache

Flask-Caching - Adds easy cache support to Flask
flask-heroku-cacheify - Automatic Flask cache configuration on Heroku

Data Validation

Flask-WTF - Simple integration of Flask and WTForms, including CSRF, file upload and Recaptcha integration.

Email

Flask-Mail - Flask-Mail adds SMTP mail sending to your Flask applications

i18n

flask-babel - i18n and l10n support for Flask based on Babel and pytz

Full-text searching

SQLAlchemy-Searchable - Full-text searching for Flask-SQLAlchemy (Postgres only)
flask_msearch - Full text search for flask with whoosh

Rate Limiting

Flask-Limiter - Flask-Limiter provides rate limiting features to flask routes

Task Queue

Flask-Dramatiq - dramatiq integration for Flask applications.
huey - a little task queue for python
Flask-RQ - RQ (Redis Queue) integration for Flask applications
celery - Distributed Task Queue

Exception tracking

sentry-sdk - Python client for Sentry.
airbrake-python - Python client for Airbrake

Tracing

flask-zipkin - Distributed tracing with Zipkin.
Flask-OpenTracing - Distributed tracing with OpenTracing.

APM

elastic-apm - Elastic APM agent for Python

Other SDK

Flask-GoogleMaps - Build and embed google maps in our Flask templates
Flask-Gravatar - Small and simple gravatar usage in Flask
Flask-Pusher - Pusher integration for Flask
Flask-Azure-Storage - Flask extension that provides integration with Azure Storage

Frontend

Flask-CORS - A Flask extension for handling Cross Origin Resource Sharing (CORS), making cross-origin AJAX possible
flask-assets - Flask webassets integration
flask-s3 - Seamlessly serve your static assets of your Flask app from Amazon S3
Flask-SSLify - Force SSL on your Flask app
Flask-HTMLmin - Flask html minifier

Development (Debugging/Testing/Documentation)

Flasgger - Create API documentation for Flask views using Swagger 2.0 specs
flask-apispec - simple self-documenting APIs with flask
flask2postman - Generate a Postman collection from your Flask application
flask_profiler - endpoint analyzer/profiler for Flask
Flask-DebugToolbar - A port of the django debug toolbar to flask
flask-debug-toolbar-mongo - MongoDB panel for the Flask Debug Toolbar
Flask-Testing - Unittest extensions for Flask
pytest-flask - A set of pytest fixtures to test Flask applications
Flask-MonitoringDashboard - Automatically monitor the evolving performance of Flask/Python web services.
nplusone - Auto-detect n+1 queries with Flask and SQLAlchemy
connexion - Swagger/OpenAPI First framework for Python on top of Flask with automatic endpoint validation & OAuth2 support.

Utils

flask-marshmallow Flask + marshmallow for beautiful APIs
flask-jsonrpc - A basic JSON-RPC implementation for your Flask-powered sites
Flask-Bcrypt - Flask-Bcrypt is a Flask extension that provides bcrypt hashing utilities for your application
Mixer - Mixer is application to generate instances of Django or SQLAlchemy models
Flask-FeatureFlags - A Flask extension that enables or disables features based on configuration
Flask-Reggie - Regex Converter for Flask URL Routes
Flask-SocketIO - Socket.IO integration for Flask applications
Flask-Moment - Formatting of dates and times in Flask templates using moment.js
Flask-Paginate - Pagination support for Flask
Flask-graphql - Adds GraphQL support to your Flask application

Resources

Tutorials

Courses

Books

Slides

Videos

Built with Flask

zmusic-ng - ZX2C4 Music provides a web interface for playing and downloading music files using metadata.
GuitarFan - guitar tab
June - ~~python-china.org~~
Zerqu - ZERQU is a content-focused API-based platform. eg: Python-China
motiky
missing - a list service called missing
thenewsmeme.com
overholt - Example Flask application illustrating common practices
pypress - flask team blog
thepast.me
redispapa - another redis monitor by using flask, angular, socket.io
flaskblog - a simple blog system based on flask
cleanblog - a clean blog system based on flask and mongoengine
Quokka CMS - CMS made with Flask and MongoDB
chat - a live chat built with python (flask + gevent + apscheduler) + redis
chatapp - Flask and Angular.js Chat Application using Socket.io
Frozen-Flask - Freezes a Flask application into a set of static files
mcflyin - A small timeseries transformation API built on Flask and Pandas
Skylines - Live tracking, flight database and competition framework
airflow - Airflow is a system to programmatically author, schedule and monitor data pipelines.
timesketch - Collaborative forensics timeline analysis
changes - A dashboard for your code. A build system.
security_monkey - monitors policy changes and alerts on insecure configurations in an AWS account.
securedrop- an open-source whistleblower submission system that media organizations can use to securely accept documents from and communicate with anonymous sources.
sync_engine - IMAP/SMTP sync system with modern APIs
cleansweep - Volunteer & Campaign Management System
indico - a general-purpose event management web-based solution. It includes a full-blown conference organization workflow as well as tools for meeting management and room booking. It provides as well integration with video-conferencing solutions.
flaskbb - A classic Forum Software in Python using Flask.
[PythonBuddy] (https://github.com/ethanchewy/PythonBuddy) - Online Python Editor With Live Syntax Checking and Execution

Boilerplate

fbone
cookiecutter-flask
Flask-Foundation
flask-rest-template
gae-init - Flask boilerplate running on Google App Engine
Flask-AppBuilder - Simple and rapid application builder framework, built on top of Flask. includes detailed security, auto form generation, google charts and much more

Source: https://github.com/humiaozuzu/awesome-flask

Regex in Word

By default Regular Expression option is disabled in Word, to enable:

1. Press ALT+F11 in Word

2. Go to Tools > References as shown below. enter image description here

3. Now put a tick on "Microsoft VBScript Regular Expressions 5.5" option and then press oh as shown below. enter image description here

4. Now onward you can create a RegExp object in your VBA script. You can verify it be searching in object data base as explained below. View > Object Browser ( Or press F2) , as shown below.

enter image description here

and search for RegExp object

enter image description here

5. The RegExp object uses regular expressions to match a pattern. The following properties are provided by RegExp. These properties set the pattern to compare the strings that are passed to the RegExp instance:

a. Pattern: A string that defines the regular expression.

b. IgnoreCase: A Boolean property that indicates whether you must test the regular expression against all possible matches in a string.

c. Global: Sets a Boolean value or returns a Boolean value that indicates whether a pattern must match all the occurrences in a whole search string, or whether a pattern must match just the first occurrence.

RegExp provides the following methods to determine whether a string matches a particular pattern of a regular expression:

d. Test: Returns a Boolean value that indicates whether the regular expression can successfully be matched against the string.

e. Execute: Returns a MatchCollection object that contains a Match object for each successful match.

Sample code:

Function TestRegExp(myPattern As String, myString As String)
   'Create objects.
   Dim objRegExp As RegExp
   Dim objMatch As Match
   Dim colMatches   As MatchCollection
   Dim RetStr As String

   ' Create a regular expression object.
   Set objRegExp = New RegExp

   'Set the pattern by using the Pattern property.
   objRegExp.Pattern = myPattern

   ' Set Case Insensitivity.
   objRegExp.IgnoreCase = True

   'Set global applicability.
   objRegExp.Global = True

   'Test whether the String can be compared.
   If (objRegExp.Test(myString) = True) Then

   'Get the matches.
    Set colMatches = objRegExp.Execute(myString)   ' Execute search.

    For Each objMatch In colMatches   ' Iterate Matches collection.
      RetStr = RetStr & "Match found at position "
      RetStr = RetStr & objMatch.FirstIndex & ". Match Value is '"
      RetStr = RetStr & objMatch.Value & "'." & vbCrLf
    Next
   Else
    RetStr = "String Matching Failed"
   End If
   TestRegExp = RetStr
End Function

Another sample code to replace all e-mail addresses in all document ranges:

Sub RegexFindAndReplace()
    Dim regExp As Object
    Dim doc As Document
    Dim rng As Range

    Set regExp = CreateObject("VBScript.RegExp")
    Set doc = ActiveDocument

    With regExp
        .Pattern = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b" 'regex pattern for email address
        .Global = True 'match all occurrences
        .IgnoreCase = True 'ignore case sensitivity
    End With

    For Each rng In doc.StoryRanges 'loop through all story ranges in the document
        rng.Text = regExp.Replace(rng.Text, "[email protected]") 'replace email address with dummy text
    Next rng

End Sub

Friday, March 10, 2023

Free Cloud Instances (VPS)

Google Cloud Platform

- App Engine - 28 frontend instance hours per day, 9 backend instance hours per day
- Cloud Firestore - 1GB storage, 50,000 reads, 20,000 writes, 20,000 deletes per day
- Compute Engine - 1 non-preemptible e2-micro, 30GB HDD, 5GB snapshot storage (restricted to certain regions), 1 GB network egress from North America to all region destinations (excluding China and Australia) per month
- Cloud Storage - 5GB, 1GB network egress
- Cloud Shell - Web-based Linux shell/basic IDE with 5GB of persistent storage. 60 hours limit per week
- Cloud Pub/Sub - 10GB of messages per month
- Cloud Functions - 2 million invocations per month (includes both background and HTTP invocations)
- Cloud Run - 2 million requests per month, 360,000 GB-seconds memory, 180,000 vCPU-seconds of compute time, 1 GB network egress from North America per month
- Google Kubernetes Engine - No cluster management fee for one zonal cluster. Each user node is charged at standard Compute Engine pricing
- BigQuery - 1 TB of querying per month, 10 GB of storage each month
- Cloud Build - 120 build-minutes per day
- Cloud Source Repositories - Up to 5 Users, 50 GB Storage, 50 GB Egress
- Google Colab - Free Jupyter Notebooks development environment.
- Full, detailed list - Free Trial and Free Tier | Google Cloud

Amazon Web Services
- CloudFront - 1TB egress per month
- Cloudwatch - 10 custom metrics and 10 alarms
- CodeBuild - 100min of build time per month
- CodeCommit - 5 active users,50GB storage and 10000 request per month
- CodePipeline - 1 active pipeline per month
- DynamoDB - 25GB NoSQL DB
- EC2 - 750 hours per month of t2.micro or t3.micro(12mo)
- EBS - 30GB per month of General Purpose (SSD) or Magnetic(12mo)
- Elastic Load Balancing - 750 hours per month(12mo)
- Glacier - 10GB long-term object storage
- Lambda - 1 million requests per month
- SNS - 1 million publishes per month
- SES - 62.000 messages per month
- SQS - 1 million messaging queue requests
- Full, detailed list - Free Cloud Computing Services - AWS Free Tier

Microsoft Azure
- Virtual Machines - 1 B1S Linux VM, 1 B1S Windows VM (12mo)
- App Service - 10 web, mobile or API apps (60 CPU minutes / day)
- Functions - 1 million requests per month
- DevTest Labs - Enable fast, easy, and lean dev-test environments
- Active Directory - 500,000 objects
- Active Directory B2C - 50,000 monthly stored users
- Azure DevOps - 5 active users, unlimited private Git repos
- Azure Pipelines — 10 free parallel jobs with unlimited minutes for open source for Linux, macOS, and Windows
- Microsoft IoT Hub - 8,000 messages per day
- Load Balancer - 1 free public load balanced IP (VIP)
- Notification Hubs - 1 million push notifications
- Bandwidth - 15GB Inbound(12mo) & 5GB egress per month
- Cosmos DB - 5GB storage and 400 RUs of provisioned throughput
- Static Web Apps — Build, deploy and host static apps and serverless functions, with free SSL, Authentication/Authorization and custom domains
- Storage - 5GB LRS File or Blob storage (12mo)
- Cognitive Services - AI/ML APIs (Computer Vision, Translator, Face detection, Bots...) with free tier including limited transactions
- Cognitive Search - AI-based search and indexation service, free for 10,000 documents
- Azure Kubernetes Service - Managed Kubernetes service, free cluster management
- Event Grid - 100K ops/month
- Full, detailed list - Create Your Azure Free Account Today | Microsoft Azure

Oracle Cloud
- Compute - 2 x64-based with 1 GB RAM each, 4 Arm-based Ampere A1 cores and 24 GB of memory usable as one VM or up to 4 VMs
- Block Volume - 2 volumes, 200 GB total (used for compute)
- Object Storage - 10 GB
- Load balancer - 1 instance with 10 Mbps
- Databases - 2 DBs, 20 GB each
- Monitoring - 500 million ingestion datapoints, 1 billion retrieval datapoints
- Bandwidth - 10 TB egress per month, speed limited to 50 Mbps on x64 based VM, 500 Mbps * core count on ARM based VM
- Public IP - 2 IPv4 for VMs, 1 IPv4 for load balancer
- Notifications - 1 million delivery options per month, 1000 emails sent per month
- Full, detailed list - https://www.oracle.com/cloud/free/

IBM Cloud
- Cloud Functions - 5 million executions per month
- Object Storage - 25GB per month
- Cloudant database - 1 GB of data storage
- Db2 database - 100MB of data storage
- API Connect - 50,000 API calls per month
- Availability Monitoring - 3 million data points per month
- Log Analysis - 500MB of daily log
- Full, detailed list - IBM Cloud Free Tier
Source: gigarocket.net

Saturday, March 4, 2023

Model Wikidictionary in română

[[Fișier:<nume>|thumb|<descriere>]]

=={{limba|ron}}==
{{-etimologie-}}
{{-etim-lipsă-|ron}}

{{-pronunție-}}
{{-pron-lipsă-|ron}}

{{-substantiv-|ron}}
{{substantiv-ron
|gen=
|nom-sg=
|nom-pl=
|art-sg=
|art-pl=
|dat-sg=
|dat-pl=
|voc-sg=
|voc-pl=
}}
# <definiție>
#: <exemplu>
{{-sin-}}
* <sinonime>
{{-ant-}}
* <antonime>
{{-deriv-}}
* <cuvinte derivate>
{{-comp-}}
* <cuvinte compuse>
{{-apr-}}
* <cuvinte apropiate>
{{-omo-}}
* <omonime>
{{-omof-}}
* <omofone>
{{-paro-}}
* <paronime>
{{-loc-}}
* <locuțiuni>
{{-expr-}}
* <expresii>
{{-trans-}}
{{(}}
* English: {{trad|en|translation|m}}
* German: {{trad|de|Übersetzung|f}}, {{trad|de|Andere Übersetzung|f}}
{{)}}
{{-anagr-}}
* <anagrame>

Monday, February 27, 2023

Word Macro to Remove Highlight and Shading from Text

Remove highlighting and shading (for instance, to remove highlighting from Finereader OCR):

Sub RemoveShadingandHighlights()
  Selection.Font.Shading.Texture = wdTextureNone
  Selection.Shading.BackgroundPatternColor = wdColorWhite
  Selection.Shading.ForegroundPatternColor = wdColorWhite
  Selection.Range.HighlightColorIndex = wdNoHighlight
End Sub

Remove highlighting:

Sub RemoveAllHighlights()
  Selection.Range.HighlightColorIndex = wdNoHighlight
End Sub

Remove shading:

 Sub RemoveShading()
  Selection.Font.Shading.Texture = wdTextureNone
  Selection.Shading.BackgroundPatternColor = wdColorWhite
  Selection.Shading.ForegroundPatternColor = wdColorWhite
End Sub

Friday, February 17, 2023

Difference between r, r+, w, w+, a and a+ in Python

Differences between open modes r, r+, w, w+, a and a+ in open() function.

	r	r+	w	w+	a	a+
read	*	*		*		*
write		*	*	*	*	*
create			*	*	*	*
truncate			*	*
position at start	*	*	*	*
position at end					*	*

In this context, truncate means delete the content of the file.

1.2 Definition of open modes r, r+, w, w, a, a+:

The r throws an error if the file does not exist or opens an existing file without truncating it for reading; the file pointer position at the beginning of the file.
The r+ throws an error if the file does not exist or opens an existing file without truncating it for reading and writing; the file pointer position at the beginning of the file.
The w creates a new file or truncates an existing file, then opens it for writing; the file pointer position at the beginning of the file.
The w+ creates a new file or truncates an existing file, then opens it for reading and writing; the file pointer position at the beginning of the file.
The a creates a new file or opens an existing file for writing; the file pointer position at the end of the file.
The a+ creates a new file or opens an existing file for reading and writing, and the file pointer position at the end of the file.

File Operations in Python

File operations are the operations that can be performed on a file. These include operations carried out by the user using Python commands (or any other programming language).

A few fundamental file operations are listed below:

Open: The first and most important operation on a file is to open it. When you create a file, you must open it in order to do further file processing operations. Python offers an in-built open() function to open a file. The open() function returns a file object, also known as a handle, to perform further operations accordingly.
Read: As the name suggests, this operation reads the content of a file. Python provides various methods to read a file, the most common being the read() function. Note that in order to read a file you'll need to open that file in 'read mode'.
Write: This operation is used to write information into a file. There are various modes, that can be used, for the write operation (we'll soon discuss the different modes).
Close: After completing all procedures, the file must be closed in order to save the data. This operation frees up all the resources used up by the file while processing. Python has a close() method to close the file.

Python has six File Access Modes:

Sr. No.	Access Mode (File Operations)	Description
1.	Read Only ('r')	Default mode. Opens a file in Python to read. (Raises an I\O error if the file does not exist.)
2.	Read & Write ('r+')	With this, you can read as well as write in the file.
3.	Write Only ('w')	It is used to write in a file. (This creates a new file if the file doesn't exist). This overwrites on an existing file.
4.	Write & Read ('w+')	Used for writing as well as reading an opened file in Python
5.	Append Only ('a')	This is used to insert data at the end of an opened file. Here, the existing data won't get truncated.
6.	Append & Read ('a+')	This is used to open a file for writing (at the end) and reading.

The access methods are mentioned along with the file name in the open() function.

The syntax to open a file is:

f = open("FilePath", "access mode")

Using seek() and truncate() function

This method can be used to overwrite a file (completely or a particular line) in Python. This method involves two functions :

The seek() function: This function sets the handler (pointer) at the beginning of the file. This is called upon to ensure that the handler stays at the start of the file, hence by default it is set at 0.
The truncate() function: This function removes the data of the file.

""" File Content: 
Program: To Overwrite a File in Python
Overwriting a File : Replacing old contents of the file """

# open the file using write only mode
handle = open("file.txt", "w")

# seek out the line you want to overwrite
handle.seek(0)
handle.write("File Overwritten.")
handle.truncate()

# close the file
handle.close()

# To read the contains of the file
# open the file in read mode
f = open("file.txt", "r")
print(f.read())
f.close()

OR

# Program: Overwriting a File in Python
""" File Content: 
Program: To Overwrite a File in Python
Overwriting a File : Replacing old contents of the file """

# open the file using read only mode
handle = open("file.txt", "r")

# reading the file and storing the data in content
content = handle.read()
# replacing the data using replace()
content = content.replace("File", "Data")

# close the file
handle.close()

handle = open("file.txt", "w")
handle.write(content)
handle.close()

# To read the contains of the file
# open the file in read mode
f = open("file.txt", "r")
print(f.read())
f.close()

Source: favtutor.com

Python Difflib Module - An Algorithm for Fuzzy Matches

Sequence Matcher

The SequenceMatcher method will compare two given strings and return data that presents how similar the two strings are. Let's try this out together using the ratio() object. This will return the comparison data in decimal format.

>>> import difflib
>>> from difflib import SequenceMatcher
>>> str1 = 'I like pizza'
>>> str2 = 'I like tacos'
>>> seq = SequenceMatcher(a=str1, b=str2)
>>> print(seq.ratio())
0.66666666

We create a new variable that encapsulates the SequenceMatcher class with two parameters, a and b. Although, the method actually accepts three parameters: None, a, and b. In order for the the method to acknowledge our two strings, we need to assign each of the string values to the method's variables, SequenceMatcher(a=str1, b=str2).

Once all of the necessary variables have been defined and the SequenceMatcher has been given at least two parameters, we can now print the value using the ratio() object that we'd mentioned earlier. This determines the ratio of characters that are similar in the two strings and the result is then returned as a decimal. The ratio() object is one of a few that belong to the Sequence Matcher class.

Differ

The Differ class is the opposite of SequenceMatcher; it takes in lines of text and finds the differences between the strings. However, the Differ class is unique in its usage of deltas, making it even more readable and easier for humans to spot the differences.

For instance, when adding new characters to the second string in a comparison between two strings, a '+ ' will appear before the line that has received the additional characters.

As you have probably guessed, deleting some of the characters that were visible in the first string will cause '- ' to pop up before the second line of text.

If a line is the same in both sequences, ' ' will be returned and if there is a line missing, then you will see '? '. Additionally, you can also utilize attributes like ratio(), which we saw in the last example. Let's see the Differ class in action.

>>> import difflib
>>> from difflib import Differ
>>> str1 = "I would like to order a pepperoni pizza"
>>> str2 = "I would like to order a veggie burger"
>>> str1_lines = str1.splitlines()
>>> str2_lines = str2.splitlines()
>>> d = difflib.Differ()
>>> diff = d.compare(str1_lines, str2_lines)
>>> print('\n'.join(diff))
# output
I would like to order a 
'- ' pepperoni pizza
'+ ' veggie burger

In the example above, we begin by importing the module and Differ class. Once we have defined our two strings that we want to compare, we must invoke the splitlines() function on the two strings.

>>> str1_lines = str1.splitlines()
>>> str2_lines = str2.splitlines()

This will allow us to compare the strings by each line rather than by each individual character.

Once we have defined a variable that contains the Differ class, we create another that contains Differ with the compare() object, taking in the two strings as parameters.

>>> diff = d.compare(str1_lines, str2_lines)

We call the print function and join the diff variable with a line enter so that our result is formatted in a way that makes it more readable.

get_close_matches

Another simple yet powerful tool in difflib is its get_close_matches method. It's exactly what it sounds like: a tool that will take in arguments and return the closest matches to the target string. In pseudocode, the function works like this:

get_close_matches(target_word, list_of_possibilities, n=result_limit, cutoff)

As we can see above, get_close_matches can take in 4 arguments but only requires the first 2 in order to return results.

The first parameter is the word that we are targeting; what we want the method to return similarities to. The second parameter can be an array of terms, or a variable that points to an array of strings. The third parameter allows the user to define a limit to the number of results that are returned. The last parameter determines how similar two words need to be in order to be returned as a result.

With the first two parameters, alone, the method will return results based on the default cutoff of 0.6 (in the range of 0 - 1) and a default result limit of 3. Take a look at a couple of examples in order to see how this function really works.

>>> import difflib
>>> from difflib import get_close_matches
>>> get_close_matches('bat', ['baton', 'chess', 'bat', 'bats', 'fireflies', 'batter'])

['bat', 'bats', 'baton']

Notice how the example above only returns three results even though there is a fourth term that is similar to 'bats': 'batter'. This is because we did not specify a result limit as our third parameter. Let's try that again, but this time we will define a result_limit and a cutoff.

>>> get_close_matches('bat', ['baton', 'chess', 'batter', 'bats', 'fireflies', 'battering'], n=4, cutoff=0.6)

['bat', 'bats', 'baton', 'batter']

This time we get all four results that are at least 60% similar to the word, 'bat'. The cutoff is equivalent to the original because we just defined the same value as the default, 0.6. However, this can be changed to make the results more or less strict. The closer to 1, the more strict the constraints will be. In the example below, the constraint has been changed to 0.9. This means that the results will need to be at least 90% similar to the word 'bat'.

>>> get_close_matches('bat', ['baton', 'chess', 'batter', 'bats', 'fireflies', 'battering'], n=4, cutoff=0.9)

['bat']

unified_diff & context_diff

There are two classes in difflib which operate in a very similar fashion; the unified_diff and the context_diff. The only major difference between the two is the result.

The unified_diff takes in two strings of data and then returns each word that was either added or removed from the first. The best way to understand this concept is by seeing it in practice:

>>> import sys
>>> import difflib
>>> from difflib import unified_diff
>>> str1 = ['dog\n', 'cat\n', 'frog\n', 'bear\n', 'animals\n']
>>> str2 = ['puppy\n', 'kitten\n', 'tadpole\n', 'cub\n', 'animals\n']
>>> sys.stdout.writelines(unified_diff(str1, str2))
---
+++
@@ -1,5 +1,5 @@
-dog
-cat
-frog
-bear
+puppy
+kitten
+tadpole
+cub
 animals

As evidenced by the results, the unified_diff returns the removed words prefixed with - and returns the added words prefixed with +. The final word, 'animals' contains no prefix because it was present in both strings.

The context_diff works in the same way as the unified_diff. However, instead of revealing what was added and removed from the original string, it simply returns what lines have changed by returning the changed lines with a prefix of '!'.

>>> from difflib import context_diff
>>> str1 = ['dog\n', 'cat\n', 'frog\n', 'bear\n', 'animals\n']
>>> str2 = ['puppy\n', 'kitten\n', 'tadpole\n', 'cub\n', 'animals\n']
>>> sys.stdout.writelines(context_diff(str1, str2))
***
---
***************
*** 1,5 ****
! dog
! cat
! frog
! bear
  animals
--- 1,5 ----
! puppy
! kitten
! tadpole
! cub
  animals

Within these examples, we can see that many of the functions and classes of the difflib module resemble one another. Each have their own set of benefits and it's important to analyze which will work best for your project. Comparing sets of data becomes effortless when leveraging the difflib module, but your results can be even better when your program returns results in the most readable format possible for your data.

Source: iq.opengenus.org

Python has a built-in package called difflib with the function get_close_matches()

get_close_matches(word, possibilities, n, cutoff) accepts four parameters:

word - the word to find close matches for in our list
possibilities - the list in which to search for close matches of word
n (optional) - the maximum number of close matches to return. Must be > 0. Default is 3.
cutoff (optional) - a float in the range [0, 1] that a possibility must score in order to be considered similar to word. 0 is very lenient, 1 is very strict. Default is 0.6.

>>> from difflib import get_close_matches
>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
['apple', 'ape']

Parameters

This function accepts four parameters:

word: This is the string for which we need the close matches.
possibilities: This is usually a list of string values with which the word is matched.
n: This is an optional parameter with a default value of 3. It specifies the maximum number of close matches required.
cutoff: This is also an optional parameter with a default value of 0.6. It specifies that the close matches should have a score greater than the cutoff.

word = "learning"
possibilities = ["love", "learn", "lean", "moving", "hearing"]
n = 3
cutoff = 0.7
close_matches = difflib.get_close_matches(word,
possibilities, n, cutoff)

Or print differences to an html table:

import difflib
from IPython import display

a = open("original.txt", "r").readlines()
b = open("modified.txt", "r").readlines()

difference = difflib.HtmlDiff(tabsize=2)

with open("compare.html", "w") as fp:
    html = difference.make_file(fromlines=a, tolines=b, fromdesc="Original", todesc="Modified")
    fp.write(html)


display.HTML(open("compare.html", "r").read())

Pages

Sunday, March 12, 2023

Framework

Admin interface

Analytics

Authentication

Authorization

Database

Database Migrations

Session

Cache

Data Validation

Email

i18n

Full-text searching

Rate Limiting

Task Queue

Exception tracking

Tracing

APM

Other SDK

Frontend

Development (Debugging/Testing/Documentation)

Utils

Resources

Tutorials

Courses

Books

Slides

Videos

Built with Flask

Boilerplate

Friday, March 10, 2023

Saturday, March 4, 2023

Monday, February 27, 2023

Friday, February 17, 2023

File Operations in Python

Differ

get_close_matches

unified_diff & context_diff

Parameters

Show IP and Country

Search This Blog

LinkedIn Profile

About Me

Useful Links

Blog Archive

Tags

2Performant

ProZ.com Jobs

TranslatorsCafe.com: Recent Translation Jobs

TranslatorsTown.com

Total Pageviews

Popular Posts

Subscribe To

SmartCAT

Wikipedia

Google Translate

2performant