Sunday, March 12, 2023

Conversion tools and difference checkers

Conversion tools:
TBX convert: On this page, you can convert between several glossary filetypes: UTX-Simple, GlossML, TBXGlossary,
OLIF. TBX (TermBase eXchange) is a family of XML-based languages for the interchange of
terminological information (called TMLs, for Terminological Markup Language; also informally called “dialects” of TBX). All of TBX shares a core structure, in which information is represented on one of three structural levels: concept, language, and term.
UTF-16 to UTF-8 Converter
Glossary converter allows to convert between MultiTerm Termbases and other terminology formats by simple drag and drop, with minimal user interaction. It supports xls, xlsx, csv, txt, tbx, utx, multiterm export files and tmx.
TBX Utilities: This is a collection of tools to be used in working with Term Base eXchange (TBX); an open, XML based standard for exchanging structured terminological data submitted for adoption under ISO 30042 Technical Committee 37.
TBX Resources: TBX Resources is dedicated to helping you use the industry-standard TBX format with your terminological data. Here you’ll find tutorials and tools for using and converting to and from TBX.
Other TBX downloads and tools
Converting TBX files to XLS/CSV format
TXT
AntFile Converter: A freeware tool to convert PDF and Word (DOCX) files into plain text for use in corpus tools like AntConc.
EncodeAnt is a freeware character encoding detection and conversion tool. EncodeAnt takes an input list of text files (e.g. .txt) and attempts to auto-detect the character encoding that the files use. The character encoding can also be set manually. EncodeAnt also has an option to auto-convert the character encoding of the files to UTF-8, which is a standard used in most corpus research. The converted files are saved in a separate folder leaving the original files untouched.
Difference checkers:
Winmerge.org: WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can
compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
DiffEngineX is a fast and scalable compare utility that finds the differences between the formulae, constants, defined names, cell comments and Visual Basic VBA code contained in either two whole Excel workbooks or selected worksheets on Windows. It can align similar rows and columns across two different Excel spreadsheets. It works with xls, xlsx, xlsm and xlsb files. xla and xlam add-ins need to be converted first into xls and xlsm files before DiffEngineX can compare them. Excel 2003, 2007, 2010 or 2013 is required for this spreadsheet comparison tool to work.
ExcelDiff analyzes multiple Microsoft Excel(.csv, .xls, .xlsx, .xlsm, .xlsb) files and shows their differences graphically, even clarifies cell-level.
KDiff3

Source: inmyownterms.com

Flask Ressources

Framework

  • Connexion - Swagger/OpenAPI First framework for Python on top of Flask with automatic endpoint validation and OAuth2 support
  • Flask-MongoRest - Restful API framework wrapped around MongoEngine
  • Eve - REST API framework powered by Flask, MongoDB and good intentions
  • Flask-Restless - A Flask extension for creating simple ReSTful APIs from SQLAlchemy models
  • Flask-RESTful - Simple framework for creating REST APIs
  • Flask-RestPlus - syntaxic sugar, helpers and automatically generated Swagger documentation.
  • Flask-Potion - RESTful API framework for Flask and SQLAlchemy
  • Zappa - Build and deploy server-less Flask applications on AWS Lambda and API Gateway

Admin interface

  • Flask-Admin - Simple and extensible administrative interface framework for Flask

Analytics

  • Flask-Analytics - Analytics snippets generator extension for the Flask framework
  • Flask-Matomo - Track requests to your Flask website with Matomo

Authentication

  • Flask-Security - Quick and simple security for Flask applications
  • Flask-Login - Flask user session management
  • Flask-User - Customizable user account management for Flask
  • Flask-HTTPAuth - Simple extension that provides Basic and Digest HTTP authentication for Flask routes
  • Flask-Praetorian - Strong, Simple, and Precise security for Flask APIs (using jwt)

Authorization

  • Authlib - Authlib is an ambitious authentication library for OAuth 1, OAuth 2, OpenID clients, servers and more.
  • Authomatic - Authomatic provides out of the box support for a number of providers using OAuth 1.0a (Twitter, Tumblr and more) and OAuth 2.0 (Facebook, Foursquare, GitHub, Google, LinkedIn, PayPal and more)
  • Flask-Pundit - Extension based on Rails' Pundit gem that provides easy way to organize access control for your models
  • Flask-Dance - OAuth consumer extension for Flask, shipped with pre-set support for Facebook, GitHub, Google, etc.

Database

Database Migrations

  • Flask-Migrate - SQLAlchemy database migrations for Flask applications using Alembic

Session

Cache

Data Validation

  • Flask-WTF - Simple integration of Flask and WTForms, including CSRF, file upload and Recaptcha integration.

Email

  • Flask-Mail - Flask-Mail adds SMTP mail sending to your Flask applications

i18n

  • flask-babel - i18n and l10n support for Flask based on Babel and pytz

Full-text searching

Rate Limiting

  • Flask-Limiter - Flask-Limiter provides rate limiting features to flask routes

Task Queue

Exception tracking

Tracing

APM

Other SDK

Frontend

  • Flask-CORS - A Flask extension for handling Cross Origin Resource Sharing (CORS), making cross-origin AJAX possible
  • flask-assets - Flask webassets integration
  • flask-s3 - Seamlessly serve your static assets of your Flask app from Amazon S3
  • Flask-SSLify - Force SSL on your Flask app
  • Flask-HTMLmin - Flask html minifier

Development (Debugging/Testing/Documentation)

Utils

  • flask-marshmallow Flask + marshmallow for beautiful APIs
  • flask-jsonrpc - A basic JSON-RPC implementation for your Flask-powered sites
  • Flask-Bcrypt - Flask-Bcrypt is a Flask extension that provides bcrypt hashing utilities for your application
  • Mixer - Mixer is application to generate instances of Django or SQLAlchemy models
  • Flask-FeatureFlags - A Flask extension that enables or disables features based on configuration
  • Flask-Reggie - Regex Converter for Flask URL Routes
  • Flask-SocketIO - Socket.IO integration for Flask applications
  • Flask-Moment - Formatting of dates and times in Flask templates using moment.js
  • Flask-Paginate - Pagination support for Flask
  • Flask-graphql - Adds GraphQL support to your Flask application

Resources

Tutorials

Courses

Books

Slides

Videos

Built with Flask

  • zmusic-ng - ZX2C4 Music provides a web interface for playing and downloading music files using metadata.
  • GuitarFan - guitar tab
  • June - python-china.org
  • Zerqu - ZERQU is a content-focused API-based platform. eg: Python-China
  • motiky
  • missing - a list service called missing
  • thenewsmeme.com
  • overholt - Example Flask application illustrating common practices
  • pypress - flask team blog
  • thepast.me
  • redispapa - another redis monitor by using flask, angular, socket.io
  • flaskblog - a simple blog system based on flask
  • cleanblog - a clean blog system based on flask and mongoengine
  • Quokka CMS - CMS made with Flask and MongoDB
  • chat - a live chat built with python (flask + gevent + apscheduler) + redis
  • chatapp - Flask and Angular.js Chat Application using Socket.io
  • Frozen-Flask - Freezes a Flask application into a set of static files
  • mcflyin - A small timeseries transformation API built on Flask and Pandas
  • Skylines - Live tracking, flight database and competition framework
  • airflow - Airflow is a system to programmatically author, schedule and monitor data pipelines.
  • timesketch - Collaborative forensics timeline analysis
  • changes - A dashboard for your code. A build system.
  • security_monkey - monitors policy changes and alerts on insecure configurations in an AWS account.
  • securedrop- an open-source whistleblower submission system that media organizations can use to securely accept documents from and communicate with anonymous sources.
  • sync_engine - IMAP/SMTP sync system with modern APIs
  • cleansweep - Volunteer & Campaign Management System
  • indico - a general-purpose event management web-based solution. It includes a full-blown conference organization workflow as well as tools for meeting management and room booking. It provides as well integration with video-conferencing solutions.
  • flaskbb - A classic Forum Software in Python using Flask.
  • [PythonBuddy] (https://github.com/ethanchewy/PythonBuddy) - Online Python Editor With Live Syntax Checking and Execution

Boilerplate

Source: https://github.com/humiaozuzu/awesome-flask

Regex in Word

 By default Regular Expression option is disabled in Word, to enable:

1. Press ALT+F11 in Word

2. Go to Tools > References as shown below. enter image description here

3. Now put a tick on "Microsoft VBScript Regular Expressions 5.5" option and then press oh as shown below. enter image description here

4. Now onward you can create a RegExp object in your VBA script. You can verify it be searching in object data base as explained below. View > Object Browser ( Or press F2) , as shown below.

enter image description here

and search for RegExp object

enter image description here

5. The RegExp object uses regular expressions to match a pattern. The following properties are provided by RegExp. These properties set the pattern to compare the strings that are passed to the RegExp instance:

a. Pattern: A string that defines the regular expression.

b. IgnoreCase: A Boolean property that indicates whether you must test the regular expression against all possible matches in a string.

c. Global: Sets a Boolean value or returns a Boolean value that indicates whether a pattern must match all the occurrences in a whole search string, or whether a pattern must match just the first occurrence.

RegExp provides the following methods to determine whether a string matches a particular pattern of a regular expression:

d. Test: Returns a Boolean value that indicates whether the regular expression can successfully be matched against the string.

e. Execute: Returns a MatchCollection object that contains a Match object for each successful match.

Sample code:

Function TestRegExp(myPattern As String, myString As String)
   'Create objects.
   Dim objRegExp As RegExp
   Dim objMatch As Match
   Dim colMatches   As MatchCollection
   Dim RetStr As String

   ' Create a regular expression object.
   Set objRegExp = New RegExp

   'Set the pattern by using the Pattern property.
   objRegExp.Pattern = myPattern

   ' Set Case Insensitivity.
   objRegExp.IgnoreCase = True

   'Set global applicability.
   objRegExp.Global = True

   'Test whether the String can be compared.
   If (objRegExp.Test(myString) = True) Then

   'Get the matches.
    Set colMatches = objRegExp.Execute(myString)   ' Execute search.

    For Each objMatch In colMatches   ' Iterate Matches collection.
      RetStr = RetStr & "Match found at position "
      RetStr = RetStr & objMatch.FirstIndex & ". Match Value is '"
      RetStr = RetStr & objMatch.Value & "'." & vbCrLf
    Next
   Else
    RetStr = "String Matching Failed"
   End If
   TestRegExp = RetStr
End Function

Another sample code to replace all e-mail addresses in all document ranges:

Sub RegexFindAndReplace()
    Dim regExp As Object
    Dim doc As Document
    Dim rng As Range
    
    Set regExp = CreateObject("VBScript.RegExp")
    Set doc = ActiveDocument
    
    With regExp
        .Pattern = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b" 'regex pattern for email address
        .Global = True 'match all occurrences
        .IgnoreCase = True 'ignore case sensitivity
    End With
    
    For Each rng In doc.StoryRanges 'loop through all story ranges in the document
        rng.Text = regExp.Replace(rng.Text, "[email protected]") 'replace email address with dummy text
    Next rng
    
End Sub

Friday, March 10, 2023

Free Cloud Instances (VPS)

 Google Cloud Platform

    • App Engine - 28 frontend instance hours per day, 9 backend instance hours per day
    • Cloud Firestore - 1GB storage, 50,000 reads, 20,000 writes, 20,000 deletes per day
    • Compute Engine - 1 non-preemptible e2-micro, 30GB HDD, 5GB snapshot storage (restricted to certain regions), 1 GB network egress from North America to all region destinations (excluding China and Australia) per month
    • Cloud Storage - 5GB, 1GB network egress
    • Cloud Shell - Web-based Linux shell/basic IDE with 5GB of persistent storage. 60 hours limit per week
    • Cloud Pub/Sub - 10GB of messages per month
    • Cloud Functions - 2 million invocations per month (includes both background and HTTP invocations)
    • Cloud Run - 2 million requests per month, 360,000 GB-seconds memory, 180,000 vCPU-seconds of compute time, 1 GB network egress from North America per month
    • Google Kubernetes Engine - No cluster management fee for one zonal cluster. Each user node is charged at standard Compute Engine pricing
    • BigQuery - 1 TB of querying per month, 10 GB of storage each month
    • Cloud Build - 120 build-minutes per day
    • Cloud Source Repositories - Up to 5 Users, 50 GB Storage, 50 GB Egress
    • Google Colab - Free Jupyter Notebooks development environment.
    • Full, detailed list - Free Trial and Free Tier | Google Cloud
  • Oracle Cloud
    • Compute - 2 x64-based with 1 GB RAM each, 4 Arm-based Ampere A1 cores and 24 GB of memory usable as one VM or up to 4 VMs
    • Block Volume - 2 volumes, 200 GB total (used for compute)
    • Object Storage - 10 GB
    • Load balancer - 1 instance with 10 Mbps
    • Databases - 2 DBs, 20 GB each
    • Monitoring - 500 million ingestion datapoints, 1 billion retrieval datapoints
    • Bandwidth - 10 TB egress per month, speed limited to 50 Mbps on x64 based VM, 500 Mbps * core count on ARM based VM
    • Public IP - 2 IPv4 for VMs, 1 IPv4 for load balancer
    • Notifications - 1 million delivery options per month, 1000 emails sent per month
    • Full, detailed list - https://www.oracle.com/cloud/free/
  • IBM Cloud
    • Cloud Functions - 5 million executions per month
    • Object Storage - 25GB per month
    • Cloudant database - 1 GB of data storage
    • Db2 database - 100MB of data storage
    • API Connect - 50,000 API calls per month
    • Availability Monitoring - 3 million data points per month
    • Log Analysis - 500MB of daily log
    • Full, detailed list - IBM Cloud Free Tier

    Source: gigarocket.net

Saturday, March 4, 2023

Model Wikidictionary in română

 [[Fișier:<nume>|thumb|<descriere>]]
<!-- ștergeți acest comentariu. ștergeți și linia anterioară dacă nu introduceți o imagine. -->
=={{limba|ron}}==
{{-etimologie-}}
{{-etim-lipsă-|ron}}
<!-- dacă introduceți etimologia, ștergeți linia de mai sus și acest comentariu -->
{{-pronunție-}}
{{-pron-lipsă-|ron}}
<!-- dacă introduceți pronunția, ștergeți linia de mai sus și acest comentariu -->
{{-substantiv-|ron}}
{{substantiv-ron
|gen=
|nom-sg=
|nom-pl=
|art-sg=
|art-pl=
|dat-sg=
|dat-pl=
|voc-sg=
|voc-pl=
}}
# <definiție>
#: <exemplu>
{{-sin-}}
* <sinonime>
{{-ant-}}
* <antonime>
{{-deriv-}}
* <cuvinte derivate>
{{-comp-}}
* <cuvinte compuse>
{{-apr-}}
* <cuvinte apropiate>
{{-omo-}}
* <omonime>
{{-omof-}}
* <omofone>
{{-paro-}}
* <paronime>
{{-loc-}}
* <locuțiuni>
{{-expr-}}
* <expresii>
{{-trans-}}
{{(}}
* English: {{trad|en|translation|m}}
* German: {{trad|de|Übersetzung|f}}, {{trad|de|Andere Übersetzung|f}}
{{)}}
{{-anagr-}}
* <anagrame>

Monday, February 27, 2023

Word Macro to Remove Highlight and Shading from Text

 Remove highlighting and shading (for instance, to remove highlighting from Finereader OCR):

Sub RemoveShadingandHighlights()
  Selection.Font.Shading.Texture = wdTextureNone
  Selection.Shading.BackgroundPatternColor = wdColorWhite
  Selection.Shading.ForegroundPatternColor = wdColorWhite
  Selection.Range.HighlightColorIndex = wdNoHighlight
End Sub

Remove highlighting:

Sub RemoveAllHighlights()
  Selection.Range.HighlightColorIndex = wdNoHighlight
End Sub
Remove shading: 
 Sub RemoveShading()
  Selection.Font.Shading.Texture = wdTextureNone
  Selection.Shading.BackgroundPatternColor = wdColorWhite
  Selection.Shading.ForegroundPatternColor = wdColorWhite
End Sub 

Friday, February 17, 2023

Difference between r, r+, w, w+, a and a+ in Python

 

Differences between open modes r, r+, w, w+, a and a+ in open() function.


r r+ w w+ a a+
read * *
*
*
write
* * * * *
create

* * * *
truncate

* *

position at start * * * *

position at end



* *

In this context, truncate means delete the content of the file.

1.2 Definition of open modes r, r+, w, w, a, a+:

  • The r throws an error if the file does not exist or opens an existing file without truncating it for reading; the file pointer position at the beginning of the file.
  • The r+ throws an error if the file does not exist or opens an existing file without truncating it for reading and writing; the file pointer position at the beginning of the file.
  • The w creates a new file or truncates an existing file, then opens it for writing; the file pointer position at the beginning of the file.
  • The w+ creates a new file or truncates an existing file, then opens it for reading and writing; the file pointer position at the beginning of the file.
  • The a creates a new file or opens an existing file for writing; the file pointer position at the end of the file.
  • The a+ creates a new file or opens an existing file for reading and writing, and the file pointer position at the end of the file.

File Operations in Python

File operations are the operations that can be performed on a file. These include operations carried out by the user using Python commands (or any other programming language).

A few fundamental file operations are listed below:

  1. Open: The first and most important operation on a file is to open it. When you create a file, you must open it in order to do further file processing operations. Python offers an in-built open() function to open a file. The open() function returns a file object, also known as a handle, to perform further operations accordingly.
  2. Read: As the name suggests, this operation reads the content of a file. Python provides various methods to read a file, the most common being the read() function. Note that in order to read a file you'll need to open that file in 'read mode'.
  3. Write: This operation is used to write information into a file. There are various modes, that can be used, for the write operation (we'll soon discuss the different modes).
  4. Close: After completing all procedures, the file must be closed in order to save the data. This operation frees up all the resources used up by the file while processing. Python has a close() method to close the file.

Python has six File Access Modes:

Sr. No.

Access Mode

(File Operations)

Description
1. Read Only ('r')

Default mode. Opens a file in Python to read. (Raises an I\O error if the file does not exist.)

2. Read & Write ('r+') With this, you can read as well as write in the file.
3. Write Only ('w') It is used to write in a file. (This creates a new file if the file doesn't exist). This overwrites on an existing file.
4. Write & Read ('w+') Used for writing as well as reading an opened file in Python
5. Append Only ('a') This is used to insert data at the end of an opened file. Here, the existing data won't get truncated.
6. Append & Read ('a+') This is used to open a file for writing (at the end) and reading.

The access methods are mentioned along with the file name in the open() function.

The syntax to open a file is:

f = open("FilePath", "access mode")

 Using seek() and truncate() function

This method can be used to overwrite a file (completely or a particular line) in Python. This method involves two functions :

  1. The seek() function: This function sets the handler (pointer) at the beginning of the file. This is called upon to ensure that the handler stays at the start of the file, hence by default it is set at 0.
  2. The truncate() function: This function removes the data of the file.
""" File Content: 
Program: To Overwrite a File in Python
Overwriting a File : Replacing old contents of the file """

# open the file using write only mode
handle = open("file.txt", "w")

# seek out the line you want to overwrite
handle.seek(0)
handle.write("File Overwritten.")
handle.truncate()

# close the file
handle.close()

# To read the contains of the file
# open the file in read mode
f = open("file.txt", "r")
print(f.read())
f.close()
 
OR
 
# Program: Overwriting a File in Python
""" File Content: 
Program: To Overwrite a File in Python
Overwriting a File : Replacing old contents of the file """

# open the file using read only mode
handle = open("file.txt", "r")

# reading the file and storing the data in content
content = handle.read()
# replacing the data using replace()
content = content.replace("File", "Data")

# close the file
handle.close()

handle = open("file.txt", "w")
handle.write(content)
handle.close()

# To read the contains of the file
# open the file in read mode
f = open("file.txt", "r")
print(f.read())
f.close() 

 Source: favtutor.com

 

Python Difflib Module - An Algorithm for Fuzzy Matches

 Sequence Matcher

 The SequenceMatcher method will compare two given strings and return data that presents how similar the two strings are. Let's try this out together using the ratio() object. This will return the comparison data in decimal format.

>>> import difflib
>>> from difflib import SequenceMatcher
>>> str1 = 'I like pizza'
>>> str2 = 'I like tacos'
>>> seq = SequenceMatcher(a=str1, b=str2)
>>> print(seq.ratio())
0.66666666

We create a new variable that encapsulates the SequenceMatcher class with two parameters, a and b. Although, the method actually accepts three parameters: None, a, and b. In order for the the method to acknowledge our two strings, we need to assign each of the string values to the method's variables, SequenceMatcher(a=str1, b=str2).

Once all of the necessary variables have been defined and the SequenceMatcher has been given at least two parameters, we can now print the value using the ratio() object that we'd mentioned earlier. This determines the ratio of characters that are similar in the two strings and the result is then returned as a decimal. The ratio() object is one of a few that belong to the Sequence Matcher class.

Differ

The Differ class is the opposite of SequenceMatcher; it takes in lines of text and finds the differences between the strings. However, the Differ class is unique in its usage of deltas, making it even more readable and easier for humans to spot the differences.

For instance, when adding new characters to the second string in a comparison between two strings, a '+ ' will appear before the line that has received the additional characters.

As you have probably guessed, deleting some of the characters that were visible in the first string will cause '- ' to pop up before the second line of text.

If a line is the same in both sequences, ' ' will be returned and if there is a line missing, then you will see '? '. Additionally, you can also utilize attributes like ratio(), which we saw in the last example. Let's see the Differ class in action.

>>> import difflib
>>> from difflib import Differ
>>> str1 = "I would like to order a pepperoni pizza"
>>> str2 = "I would like to order a veggie burger"
>>> str1_lines = str1.splitlines()
>>> str2_lines = str2.splitlines()
>>> d = difflib.Differ()
>>> diff = d.compare(str1_lines, str2_lines)
>>> print('\n'.join(diff))
# output
I would like to order a 
'- ' pepperoni pizza
'+ ' veggie burger

In the example above, we begin by importing the module and Differ class. Once we have defined our two strings that we want to compare, we must invoke the splitlines() function on the two strings.

>>> str1_lines = str1.splitlines()
>>> str2_lines = str2.splitlines()

This will allow us to compare the strings by each line rather than by each individual character.

Once we have defined a variable that contains the Differ class, we create another that contains Differ with the compare() object, taking in the two strings as parameters.

>>> diff = d.compare(str1_lines, str2_lines)

We call the print function and join the diff variable with a line enter so that our result is formatted in a way that makes it more readable.

get_close_matches

Another simple yet powerful tool in difflib is its get_close_matches method. It's exactly what it sounds like: a tool that will take in arguments and return the closest matches to the target string. In pseudocode, the function works like this:

get_close_matches(target_word, list_of_possibilities, n=result_limit, cutoff)

As we can see above, get_close_matches can take in 4 arguments but only requires the first 2 in order to return results.

The first parameter is the word that we are targeting; what we want the method to return similarities to. The second parameter can be an array of terms, or a variable that points to an array of strings. The third parameter allows the user to define a limit to the number of results that are returned. The last parameter determines how similar two words need to be in order to be returned as a result.

With the first two parameters, alone, the method will return results based on the default cutoff of 0.6 (in the range of 0 - 1) and a default result limit of 3. Take a look at a couple of examples in order to see how this function really works.

>>> import difflib
>>> from difflib import get_close_matches
>>> get_close_matches('bat', ['baton', 'chess', 'bat', 'bats', 'fireflies', 'batter'])

['bat', 'bats', 'baton']

Notice how the example above only returns three results even though there is a fourth term that is similar to 'bats': 'batter'. This is because we did not specify a result limit as our third parameter. Let's try that again, but this time we will define a result_limit and a cutoff.

>>> get_close_matches('bat', ['baton', 'chess', 'batter', 'bats', 'fireflies', 'battering'], n=4, cutoff=0.6)

['bat', 'bats', 'baton', 'batter']

This time we get all four results that are at least 60% similar to the word, 'bat'. The cutoff is equivalent to the original because we just defined the same value as the default, 0.6. However, this can be changed to make the results more or less strict. The closer to 1, the more strict the constraints will be. In the example below, the constraint has been changed to 0.9. This means that the results will need to be at least 90% similar to the word 'bat'.

>>> get_close_matches('bat', ['baton', 'chess', 'batter', 'bats', 'fireflies', 'battering'], n=4, cutoff=0.9)

['bat']

unified_diff & context_diff

There are two classes in difflib which operate in a very similar fashion; the unified_diff and the context_diff. The only major difference between the two is the result.

The unified_diff takes in two strings of data and then returns each word that was either added or removed from the first. The best way to understand this concept is by seeing it in practice:

>>> import sys
>>> import difflib
>>> from difflib import unified_diff
>>> str1 = ['dog\n', 'cat\n', 'frog\n', 'bear\n', 'animals\n']
>>> str2 = ['puppy\n', 'kitten\n', 'tadpole\n', 'cub\n', 'animals\n']
>>> sys.stdout.writelines(unified_diff(str1, str2))
---
+++
@@ -1,5 +1,5 @@
-dog
-cat
-frog
-bear
+puppy
+kitten
+tadpole
+cub
 animals

As evidenced by the results, the unified_diff returns the removed words prefixed with - and returns the added words prefixed with +. The final word, 'animals' contains no prefix because it was present in both strings.

The context_diff works in the same way as the unified_diff. However, instead of revealing what was added and removed from the original string, it simply returns what lines have changed by returning the changed lines with a prefix of '!'.

>>> from difflib import context_diff
>>> str1 = ['dog\n', 'cat\n', 'frog\n', 'bear\n', 'animals\n']
>>> str2 = ['puppy\n', 'kitten\n', 'tadpole\n', 'cub\n', 'animals\n']
>>> sys.stdout.writelines(context_diff(str1, str2))
***
---
***************
*** 1,5 ****
! dog
! cat
! frog
! bear
  animals
--- 1,5 ----
! puppy
! kitten
! tadpole
! cub
  animals

Within these examples, we can see that many of the functions and classes of the difflib module resemble one another. Each have their own set of benefits and it's important to analyze which will work best for your project. Comparing sets of data becomes effortless when leveraging the difflib module, but your results can be even better when your program returns results in the most readable format possible for your data.

Source: iq.opengenus.org 

Python has a built-in package called difflib with the function get_close_matches()

get_close_matches(word, possibilities, n, cutoff) accepts four parameters:

  • word - the word to find close matches for in our list
  • possibilities - the list in which to search for close matches of word
  • n (optional) - the maximum number of close matches to return. Must be > 0. Default is 3.
  • cutoff (optional) - a float in the range [0, 1] that a possibility must score in order to be considered similar to word. 0 is very lenient, 1 is very strict. Default is 0.6.

>>> from difflib import get_close_matches
>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
['apple', 'ape']

Parameters

This function accepts four parameters:

  • word: This is the string for which we need the close matches.
  • possibilities: This is usually a list of string values with which the word is matched.
  • n: This is an optional parameter with a default value of 3. It specifies the maximum number of close matches required.
  • cutoff: This is also an optional parameter with a default value of 0.6. It specifies that the close matches should have a score greater than the cutoff.

word = "learning"
possibilities = ["love", "learn", "lean", "moving", "hearing"]
n = 3
cutoff = 0.7
close_matches = difflib.get_close_matches(word,
                possibilities, n, cutoff)

Or print differences to an html table:

import difflib
from IPython import display

a = open("original.txt", "r").readlines()
b = open("modified.txt", "r").readlines()

difference = difflib.HtmlDiff(tabsize=2)

with open("compare.html", "w") as fp:
    html = difference.make_file(fromlines=a, tolines=b, fromdesc="Original", todesc="Modified")
    fp.write(html)


display.HTML(open("compare.html", "r").read())