Monday, January 23, 2023

Convert ImmutableMultiDict with duplicate keys to list of lists in Python

 

Using Werkzeug Python

Werkzeug is a WSGI utility library. WSGI is a protocol or convention that assures that your web application can communicate with the webserver and, more crucially, that web apps can collaborate effectively. To convert an ImmutableMultiDict with duplicate keys to a list of lists, we’ll use one of its function to create ImmutableMultiDict.

We can use the pip command to install the werkzeug library by opening a command line and typing the following command –

pip install werkzeug

Examples 1:

So, initially, we’ll create an immutableMultiDict with duplicate keys using the werkzeug library’s datastructures class, followed by the ImmutableMultiDict() function. After building ImmutableMultiDict, we’ll utilize one of its methods, lists(), to return a generator object, which we’ll convert to a list using the list function.

Python3

from werkzeug.datastructures import ImmutableMultiDict

d = ImmutableMultiDict([('Subject', 'Chemistry'),

                        ('Period', '1st'),

                        ('Period', '4th')])

print(list(d.lists()))

Output:

[('Subject', ['Chemistry']), ('Period', ['1st', '4th'])]

Example 2:

Python3

from werkzeug.datastructures import ImmutableMultiDict

d = ImmutableMultiDict([('Gadget', 'DSLR'),

                        ('Accessories','Lens 18-105mm'), 

                        ('Accessories', 'Lens 70-200mm'), 

                        ('Accessories', 'Tripod stand')])

list(d.lists())

Output:

[(‘Gadget’, [‘DSLR’]), (‘Accessories’, [‘Lens 18-105mm’, ‘Lens 70-200mm’, ‘Tripod stand’])]

  • So, in the first line, we import the ImmutableMultiDict class from the werkzeug module, and we can create ImmutableMultiDict directly using this class. Remember that, unlike standard Dictionaries, MultiDict is a subclass of Dictionary that can have several values for the same key. 
  • We create ImmutableMultiDict and store it in variable ‘d’ in the second line of code. 
  • We’re passing two keys with the same name, ‘Period,’ and if it were a normal dictionary, the one that came later would have overwritten the value, and only one could have been stored, not both separate values of the same key name. 
  • We can now print this ImmutableMultiDict directly, but our goal is to convert it to a list of lists, so we utilize one of its class’s functions .lists() to change the values associated with a single key from tuples to lists. If we want to double-check it, we may use a for loop to print it.
  • Finally, we wrapped it in a list function, which stores it in a list, and then we print it.

Source: www.geeksforgeeks.org

From Werkzeug docs:

class werkzeug.datastructures.TypeConversionDict

Works like a regular dict but the get() method can perform type conversions. MultiDict and CombinedMultiDict are subclasses of this class and provide the same feature.

Changelog
get(key, default=None, type=None)

Return the default value if the requested data doesn’t exist. If type is provided and is a callable it should convert the value, return it or raise a ValueError if that is not possible. In this case the function will return the default as if the value was not found:

d = TypeConversionDict(foo='42', bar='blub')
d.get('foo', type=int)
42
d.get('bar', -1, type=int)
-1
Parameters:
  • key – The key to be looked up.

  • default – The default value to be returned if the key can’t be looked up. If not further specified None is returned.

  • type – A callable that is used to cast the value in the MultiDict. If a ValueError is raised by this callable the default value is returned.

class werkzeug.datastructures.ImmutableTypeConversionDict

Works like a TypeConversionDict but does not support modifications.

Changelog
copy()

Return a shallow mutable copy of this object. Keep in mind that the standard library’s copy() function is a no-op for this class like for any other python immutable type (eg: tuple).

class werkzeug.datastructures.MultiDict(mapping=None)

A MultiDict is a dictionary subclass customized to deal with multiple values for the same key which is for example used by the parsing functions in the wrappers. This is necessary because some HTML form elements pass multiple values for the same key.

MultiDict implements all standard dictionary methods. Internally, it saves all values for a key as a list, but the standard dict access methods will only return the first value for a key. If you want to gain access to the other values, too, you have to use the list methods as explained below.

Basic Usage:

d = MultiDict([('a', 'b'), ('a', 'c')])
d
MultiDict([('a', 'b'), ('a', 'c')])
d['a']
'b'
d.getlist('a')
['b', 'c']
'a' in d
True

It behaves like a normal dict thus all dict functions will only return the first value when multiple values for one key are found.

From Werkzeug 0.3 onwards, the KeyError raised by this class is also a subclass of the BadRequest HTTP exception and will render a page for a 400 BAD REQUEST if caught in a catch-all for HTTP exceptions.

A MultiDict can be constructed from an iterable of (key, value) tuples, a dict, a MultiDict or from Werkzeug 0.2 onwards some keyword parameters.

Parameters:

mapping – the initial value for the MultiDict. Either a regular dict, an iterable of (key, value) tuples or None.

add(key, value)

Adds a new value for the key.

Changelog
Parameters:
  • key – the key for the value.

  • value – the value to add.

clear() None.  Remove all items from D.

copy()

Return a shallow copy of this object.

deepcopy(memo=None)

Return a deep copy of this object.

fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, type=None)

Return the default value if the requested data doesn’t exist. If type is provided and is a callable it should convert the value, return it or raise a ValueError if that is not possible. In this case the function will return the default as if the value was not found:

d = TypeConversionDict(foo='42', bar='blub')
d.get('foo', type=int)
42
d.get('bar', -1, type=int)
-1
Parameters:
  • key – The key to be looked up.

  • default – The default value to be returned if the key can’t be looked up. If not further specified None is returned.

  • type – A callable that is used to cast the value in the MultiDict. If a ValueError is raised by this callable the default value is returned.

getlist(key, type=None)

Return the list of items for a given key. If that key is not in the MultiDict, the return value will be an empty list. Just like get, getlist accepts a type parameter. All items will be converted with the callable defined there.

Parameters:
  • key – The key to be looked up.

  • type – A callable that is used to cast the value in the MultiDict. If a ValueError is raised by this callable the value will be removed from the list.

Returns:

a list of all the values for the key.

items(multi=False)

Return an iterator of (key, value) pairs.

Parameters:

multi – If set to True the iterator returned will have a pair for each value of each key. Otherwise it will only contain pairs for the first value of each key.

keys() a set-like object providing a view on D's keys

lists()

Return a iterator of (key, values) pairs, where values is the list of all values associated with the key.

listvalues()

Return an iterator of all values associated with a key. Zipping keys() and this is the same as calling lists():

d = MultiDict({"foo": [1, 2, 3]})
zip(d.keys(), d.listvalues()) == d.lists()
True
pop(key, default=no value)

Pop the first item for a list on the dict. Afterwards the key is removed from the dict, so additional values are discarded:

d = MultiDict({"foo": [1, 2, 3]})
d.pop("foo")
1
"foo" in d
False
Parameters:
  • key – the key to pop.

  • default – if provided the value to return if the key was not in the dictionary.

popitem()

Pop an item from the dict.

popitemlist()

Pop a (key, list) tuple from the dict.

poplist(key)

Pop the list for a key from the dict. If the key is not in the dict an empty list is returned.

Changelog
setdefault(key, default=None)

Returns the value for the key if it is in the dict, otherwise it returns default and sets that value for key.

Parameters:
  • key – The key to be looked up.

  • default – The default value to be returned if the key is not in the dict. If not further specified it’s None.

setlist(key, new_list)

Remove the old values for a key and add new ones. Note that the list you pass the values in will be shallow-copied before it is inserted in the dictionary.

d = MultiDict()
d.setlist('foo', ['1', '2'])
d['foo']
'1'
d.getlist('foo')
['1', '2']
Parameters:
  • key – The key for which the values are set.

  • new_list – An iterable with the new values for the key. Old values are removed first.

setlistdefault(key, default_list=None)

Like setdefault but sets multiple values. The list returned is not a copy, but the list that is actually used internally. This means that you can put new values into the dict by appending items to the list:

d = MultiDict({"foo": 1})
d.setlistdefault("foo").extend([2, 3])
d.getlist("foo")
[1, 2, 3]
Parameters:
  • key – The key to be looked up.

  • default_list – An iterable of default values. It is either copied (in case it was a list) or converted into a list before returned.

Returns:

a list

to_dict(flat=True)

Return the contents as regular dict. If flat is True the returned dict will only have the first item present, if flat is False all values will be returned as lists.

Parameters:

flat – If set to False the dict returned will have lists with all the values in it. Otherwise it will only contain the first value for each key.

Returns:

a dict

update(mapping)

update() extends rather than replaces existing key lists:

a = MultiDict({'x': 1})
b = MultiDict({'x': 2, 'y': 3})
a.update(b)
a
MultiDict([('y', 3), ('x', 1), ('x', 2)])

If the value list for a key in other_dict is empty, no new values will be added to the dict and the key will not be created:

x = {'empty_list': []}
y = MultiDict()
y.update(x)
y
MultiDict([])
values()

Returns an iterator of the first value on every key’s value list.

class werkzeug.datastructures.OrderedMultiDict(mapping=None)

Works like a regular MultiDict but preserves the order of the fields. To convert the ordered multi dict into a list you can use the items() method and pass it multi=True.

In general an OrderedMultiDict is an order of magnitude slower than a MultiDict.

note

Due to a limitation in Python you cannot convert an ordered multi dict into a regular dict by using dict(multidict). Instead you have to use the to_dict() method, otherwise the internal bucket objects are exposed.

class werkzeug.datastructures.ImmutableMultiDict(mapping=None)

An immutable MultiDict.

Changelog
copy()

Return a shallow mutable copy of this object. Keep in mind that the standard library’s copy() function is a no-op for this class like for any other python immutable type (eg: tuple).

class werkzeug.datastructures.ImmutableOrderedMultiDict(mapping=None)

An immutable OrderedMultiDict.

Changelog
copy()

Return a shallow mutable copy of this object. Keep in mind that the standard library’s copy() function is a no-op for this class like for any other python immutable type (eg: tuple).

class werkzeug.datastructures.CombinedMultiDict(dicts=None)

A read only MultiDict that you can pass multiple MultiDict instances as sequence and it will combine the return values of all wrapped dicts:

from werkzeug.datastructures import CombinedMultiDict, MultiDict
post = MultiDict([('foo', 'bar')])
get = MultiDict([('blub', 'blah')])
combined = CombinedMultiDict([get, post])
combined['foo']
'bar'
combined['blub']
'blah'

This works for all read operations and will raise a TypeError for methods that usually change data which isn’t possible.

From Werkzeug 0.3 onwards, the KeyError raised by this class is also a subclass of the BadRequest HTTP exception and will render a page for a 400 BAD REQUEST if caught in a catch-all for HTTP exceptions.

class werkzeug.datastructures.ImmutableDict

An immutable dict.

Changelog
copy()

Return a shallow mutable copy of this object. Keep in mind that the standard library’s copy() function is a no-op for this class like for any other python immutable type (eg: tuple).

class werkzeug.datastructures.ImmutableList(iterable=(), /)

An immutable list.

Changelog
Private:


class werkzeug.datastructures.FileMultiDict(mapping=None)

A special MultiDict that has convenience methods to add files to it. This is used for EnvironBuilder and generally useful for unittesting.

Changelog
add_file(name, file, filename=None, content_type=None)

Adds a new file to the dict. file can be a file name or a file-like or a FileStorage object.

Parameters:
  • name – the name of the field.

  • file – a filename or file-like object

  • filename – an optional filename

  • content_type – an optional content type


PFA: Plafoane CAS si CASS introduse prin OG 16/2022 aplicabile din anul 2024 pentru veniturile realizate in 2023

 OG 16/2022 a adus mai multe modificari Codului Fiscal, iar vizate au fost si PFA-urile. A fost redus plafonul pentru norma de venit de la 100.000 euro la 25.000 euro si au fost introduse noi reguli pentru plata CAS si CASS. E bine de stiut ca noile plafoane CAS si CASS se aplica din 2024 pentru veniturile realizate in 2023.

Conform actului normativ, plafonul privind veniturile din activitati independente, PFA, intreprinderi individuale, familiale la norma de venit, a scazut de la 100.000 de euro la 25.000 de euro.

Incepand cu anul fiscal viitor, contribuabilii pentru care venitul net se determina pe baza de norme de venit si care in anul fiscal anterior au inregistrat un venit brut anual mai mare decat echivalentul in lei al sumei de 25.000 euro au obligatia determinarii venitului net anual in sistem real. 

Plafon CAS

Persoanele fizice care realizeaza venituri din activitati independente si/sau venituri din drepturi de proprietate intelectuala,din una sau mai multe surse si/sau categorii de venituri, datoreaza contributia de asigurari sociale la o noua baza de calcul daca estimeaza pentru anul curent venituri nete a caror valoare cumulata este cel putin egala cu 12 salarii minime brute pe tara, in vigoare la termenul de depunere a Declaratiei Unice:

Baza anuala de calcul o reprezinta venitul ales de contribuabil, care nu poate fi mai mic decat:

a) nivelul de 12 salarii minime brute pe tara, in cazul veniturilor realizate cuprinse intre 12 si 24 salarii minime brute pe tara;

Venit intre 12-24 salarii : CAS = 25%* 3.000 lei * 12 = 9.000 lei

b) nivelul de 24 salarii minime brute pe tara, in cazul veniturilor realizate de peste 24 salarii minime brute pe tara.

enit peste 24 salarii : CAS= 25%*3.000 lei *24 = 18.000 lei

Plafon CASS

Persoanele fizice care realizeaza venituri din activitati independente, venituri din drepturi de proprietate intelectuala, venituri din asocierea cu o persoana juridica, din una sau mai multe surse si/sau categorii de venituri, datoreaza contributia de asigurari sociale de sanatate la o noua baza de calcul, daca estimeaza pentru anul curent venituri a caror valoare cumulata este cel putin egala cu 6 salarii minime brute pe tara, in vigoare la termenul de depunere a Declaratiei Unice:

Baza anuala de calcul al contributiei de asigurari sociale de sanatate o reprezinta:

a) nivelul a 6 salarii minime brute pe tara, in cazul veniturilor realizate cuprinse intre 6 si 12 salarii minime brute pe tara;

-venit intre 6-12 salarii minime : 10%*3.000 lei *6 = 1.800 lei

b) nivelul de 12 salarii minime brute pe tara, in cazul veniturilor realizate cuprinse intre 12 si 24 salarii minime brute pe tara;

-venit intre 12-24 salarii minime: 10%*3.000 lei*12 = 3.600 lei

c) nivelul de 24 salarii minime brute pe tara, in cazul veniturilor realizate de peste 24 salarii minime brute pe tara.

-venit peste 24 salarii minime : 10%*3.000 lei* 24 = 7.200 lei

Daca venitul va fi sub 6 salarii minime, PFA-urile nu vor fi obligate la plata CASS.

Atentie! Plafoanele CAS si CASS pentru PFA-uri se aplica din anul 2024 pentru veniturile realizate in 2023.

Sursa: www.fiscalitatea.ro

 

Saturday, January 21, 2023

Romanian-NLP-tools

 

Stemmer

Use the package manager pip to install nltk.

pip install nltk

Usage

from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("romanian")
print(stemmer.stem("alergare"))

Tokeniser, Lemmatiser and POS (Part-Of-Speech)

Use the package manager pip to install spacy and spacy-stanza.

pip install spacy spacy-stanza

Usage

import stanza
from spacy_stanza import StanzaLanguage

snlp = stanza.Pipeline(lang="ro")
nlp = StanzaLanguage(snlp)

doc = nlp("Această propoziție este în limba română.")
for token in doc:
    print(token.text, token.lemma_, token.pos_)

For more info visit https://spacy.io/universe/project/spacy-stanza.

SpaCy

Create Doc objects and play with its tokens:

from spacy.lang.ro import Romanian
nlp = Romanian()
doc = nlp("Aceasta este propoziția mea: eu am 7 mere, ce să fac cu ele?")
print("Index: ", [token.i for token in doc])
print("Text: ", [token.text for token in doc])
print("is alpha: ", [token.is_alpha for token in doc])
print("is punctuation: ", [token.is_punct for token in doc])
print("is like_num: ", [token.like_num for token in doc])

Output:

Index:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Text:  ['Aceasta', 'este', 'propoziția', 'mea', ':', 'eu', 'am', '7', 'mere', ',', 'ce', 'să', 'fac', 'cu', 'ele', '?']
is alpha:  [True, True, True, True, False, True, True, False, True, False, True, True, True, True, True, False]
is punctuation:  [False, False, False, False, True, False, False, False, False, True, False, False, False, False, False, True]
is like_num:  [False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False]

Search for POS and dependencies:

import spacy
from spacy.lang.ro.examples import sentences
#load pre-trained romanian model
nlp = spacy.load("ro_core_news_sm")
doc = nlp("Ea a mâncat pizza")
for token in doc:
    print('{:<12}{:<10}{:<10}{:<10}'.format(token.text, token.pos_, token.dep_, token.head.text))

Output:

Ea          PRON      nsubj     mâncat    
a           AUX       aux       mâncat    
mâncat      VERB      ROOT      mâncat    
pizza       ADV       obj       mâncat 

Predict Named Entities:

import spacy
from spacy.lang.ro.examples import sentences

nlp = spacy.load("ro_core_news_sm")

doc = nlp("Iulia Popescu, cea din Constanta, s-a dus la Lidl să cumpere pâine. Pe drum și-a dat seama că are nevoie de 50 de lei așa că a trecut și pe la bancomat înainte.")

for ent in doc.ents:
    print(ent.text, ent.label_)

Output:

Iulia Popescu PERSON
Constanta GPE
Lidl LOC
50 de lei MONEY

Rule-based Matching

Matching can be done by handlers: LEMMA, POS, TEXT, IS_DIGIT, IS_PUNCT, LOWER, UPPER, OP.
The OP handler can have the following values:

  • '!' = never
  • '?' = never or once
  • '+' = once or more times
  • '*' = never or more times
import spacy
from spacy.matcher import Matcher
#load pre-trained romanian model
nlp = spacy.load('ro_core_news_sm')
#create matcher
matcher = Matcher(nlp.vocab)
#create doc object
doc = nlp("Caracteristicile aplicației includ un design frumos, căutare inteligentă, etichete automate și răspunsuri vocale opționale.")
#create pattern for adjective plus one or two nouns
pattern = [{'POS': 'NOUN'}, {'POS': 'ADJ'}, {'POS': 'ADJ', 'OP': '?'}]
#add the pattern to the matcher
matcher.add('QUALITIES', [pattern])
#apply mather on doc
matches = matcher(doc)
for match_id, start, end in matches:
    matched_span = doc[start:end]
    print(matched_span.text)

Output:

design frumos
căutare inteligentă
etichete automate
răspunsuri vocale opționale

RoWordnet

Use the package manager pip to install rowordnet.

pip install rowordnet

Usage

import rowordnet

wordnet = rowordnet.RoWordNet()
word = 'arbore'
synset_ids = wordnet.synsets(literal=word)
wordnet.print_synset(synset_ids[0])

For more info visit https://github.com/dumitrescustefan/RoWordNet.

BERT for Romanian

from transformers import BertModel, BertTokenizer
# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dumitrescustefan/bert-base-romanian-cased-v1")
model = AutoModel.from_pretrained("dumitrescustefan/bert-base-romanian-cased-v1")
# tokenize a sentence and run through the model
input_ids = torch.tensor(tokenizer.encode("Acesta este un test.", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids)
# get encoding
last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple

For more info visit https://huggingface.co/dumitrescustefan/bert-base-romanian-cased-v1.

Word Vectors

fastText

import fasttext.util
fasttext.util.download_model('ro', if_exists='ignore')
ft = fasttext.load_model('path/to/cc.ro.300.bin')

or download from here.
More info on usage here: https://fasttext.cc/docs/en/crawl-vectors.html.

word2vec

from here: https://github.com/senisioi/ro_resources.

Other Lingvistic resources

  • List of (all, I hope) romanian words - from here
  • List of prefixes - from here
  • List of suffixes - from here
  • RoSentiwordnet - download from here

RoSentiWordNet is a lexical resource in which each RoWordNet synset is associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive, and negative the terms contained in the synset are. It was created by translating SentiWordnet into Romanian using googletrans Python library.

Source: https://github.com/Alegzandra/Romanian-NLP-tools