Tuesday, June 17, 2025
Wednesday, April 30, 2025
Polars Cheat Sheet
Installation
pip install polars# Install Polars with all optional dependencies:
pip install 'polars[all]'
# You can also install a subset of all optional dependencies:
pip install 'polars[numpy,pandas,pyarrow]'
# We also have a conda package (however pip is the preferred way):
conda install -c conda-forge polars
Usage:
Importing Polars:import polars as pl
- Creating DataFrames:
# From dictionary
df = pl.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
# From list of dictionaries
df = pl.DataFrame([{'A': 1, 'B': 'a'}, {'A': 2, 'B': 'b'}])
# From CSV
df = pl.read_csv('file.csv')
# From Pandas DataFrame
pandas_df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
df = pl.from_pandas(pandas_df)
- Basic DataFrame Operations:
# Display DataFrame
print(df)
# Get DataFrame info
df.schema
# Select columns
df.select(['A', 'B'])
# Filter rows
df.filter(pl.col('A') > 2)
# Sort DataFrame
df.sort('A', reverse=True)
# Add new column
df.with_columns(pl.lit('new_col').alias('C'))
# Rename columns
df.rename({'A': 'X', 'B': 'Y'})
# Drop columns
df.drop(['A', 'B'])
# Group by and aggregate
df.groupby('A').agg(pl.sum('B'))
- Data Manipulation:
# Apply function to column
df.with_columns(pl.col('A').map(lambda x: x * 2).alias('A_doubled'))
# Fill null values
df.fill_null(strategy='forward')
# Replace values
df.with_columns(pl.col('A').replace({1: 10, 2: 20}))
# Melt DataFrame
df.melt(id_vars=['A'], value_vars=['B', 'C'])
# Pivot DataFrame
df.pivot(values='value', index='A', columns='variable')
- String Operations:
# Convert to uppercase
df.with_columns(pl.col('B').str.to_uppercase())
# String contains
df.filter(pl.col('B').str.contains('pattern'))
# String replace
df.with_columns(pl.col('B').str.replace('old', 'new'))
# String length
df.with_columns(pl.col('B').str.lengths().alias('B_length'))
- DateTime Operations:
# Parse strings to datetime
df.with_columns(pl.col('date').str.strptime(pl.Datetime, '%Y-%m-%d'))
# Extract components
df.with_columns(pl.col('date').dt.year().alias('year'))
# Date arithmetic
df.with_columns((pl.col('date') + pl.duration(days=1)).alias('next_day'))
- Joining DataFrames:
# Inner join
df1.join(df2, on='key', how='inner')
# Left join
df1.join(df2, on='key', how='left')
# Outer join
df1.join(df2, on='key', how='outer')
- Window Functions:
# Cumulative sum
df.with_columns(pl.col('A').cum_sum().over('B'))
# Rolling average
df.with_columns(pl.col('A').rolling_mean(window_size=3).over('B'))
# Rank
df.with_columns(pl.col('A').rank().over('B'))
- IO Operations:
# Write to CSV
df.write_csv('output.csv')
# Write to Parquet
df.write_parquet('output.parquet')
# Read Parquet
pl.read_parquet('file.parquet')
- Lazy Execution:
# Create lazy DataFrame
lazy_df = pl.scan_csv('large_file.csv')
# Define operations
result = lazy_df.filter(pl.col('A') > 0).groupby('B').agg(pl.sum('C'))
# Execute lazy computation
result.collect()
- Advanced Filtering:
# Multiple conditions
df.filter((pl.col('A') > 5) & (pl.col('B') < 10))
# Is in list
df.filter(pl.col('A').is_in([1, 3, 5]))
# Is null
df.filter(pl.col('A').is_null())
# Is between
df.filter(pl.col('A').is_between(5, 10))
- Sampling:
# Random sample
df.sample(n=10)
# Stratified sample
df.group_by('category').sample(n=5)
- Set Operations:
# Union
df1.vstack(df2)
# Intersection
df1.join(df2, on='key', how='inner')
# Difference
df1.join(df2, on='key', how='anti')
- Advanced Aggregations:
# Multiple aggregations
df.groupby('A').agg([
pl.sum('B').alias('B_sum'),
pl.mean('C').alias('C_mean'),
pl.n_unique('D').alias('D_unique_count')
])
# Custom aggregation
df.groupby('A').agg(pl.col('B').agg_groups(lambda x: x.sum() / x.count()))
- Reshaping Data:
# Explode a list column
df.with_columns(pl.col('list_col').explode())
# Concatenate string columns
df.with_columns(pl.concat_str(['A', 'B'], separator='-').alias('A_B'))
- Time Series Operations:
# Resample time series
df.group_by_dynamic('timestamp', every='1h').agg(pl.sum('value'))
# Shift values
df.with_columns(pl.col('A').shift(1).alias('A_lagged'))
# Difference between consecutive rows
df.with_columns((pl.col('A') - pl.col('A').shift(1)).alias('A_diff'))
- Missing Data Handling:
# Drop rows with any null values
df.drop_nulls()
# Drop rows where specific columns have null values
df.drop_nulls(subset=['A', 'B'])
# Interpolate missing values
df.with_columns(pl.col('A').interpolate())
- Data Type Operations:
# Cast column to different type
df.with_columns(pl.col('A').cast(pl.Float64))
# Get unique values
df.select(pl.col('A').unique())
# Count unique values
df.select(pl.col('A').n_unique())
- Advanced String Operations:
# Extract using regex
df.with_columns(pl.col('text').str.extract(r'(\d+)', group_index=1))
# Split string into multiple columns
df.with_columns([
pl.col('full_name').str.split(' ').list.get(0).alias('first_name'),
pl.col('full_name').str.split(' ').list.get(1).alias('last_name')
])
- Window Functions with Custom Sorting:
# Cumulative sum with custom sorting
df.with_columns(
pl.col('value')
.cum_sum()
.over(['category', 'subcategory'])
.sort('date')
)
- Conditional Expressions:
# When-Then-Otherwise
df.with_columns(
pl.when(pl.col('A') > 5)
.then(pl.lit('High'))
.when(pl.col('A') < 2)
.then(pl.lit('Low'))
.otherwise(pl.lit('Medium'))
.alias('A_category')
)
- Advanced IO Operations:
# Read JSON
pl.read_json('file.json')
# Read from database
pl.read_database(query='SELECT * FROM table', connection_uri='postgresql://user:pass@host/db')
# Write to database
df.write_database(table_name='my_table', connection_uri='postgresql://user:pass@host/db')
- Performance Optimization:
# Parallel execution
pl.Config.set_num_threads(4)
# Streaming mode for large CSV files
for batch in pl.read_csv('large_file.csv', batch_size=10000):
process_batch(batch)
- Expressions and Custom Functions:
# Custom function
def custom_func(x):
return x * 2 + 1
# Apply custom function
df.with_columns(pl.col('A').map(custom_func).alias('A_custom'))
# Complex expressions
df.with_columns(
((pl.col('A') * 2 + pl.col('B')) / pl.col('C')).alias('complex_calc')
)
- List Operations:
# Get list length
df.with_columns(pl.col('list_col').list.lengths().alias('list_length'))
# Get nth element from list
df.with_columns(pl.col('list_col').list.get(1).alias('second_element'))
# Join list elements
df.with_columns(pl.col('list_col').list.join(',').alias('joined_list'))
# Slice list
df.with_columns(pl.col('list_col').list.slice(0, 3).alias('first_three'))
- Struct Operations:
# Create struct column
df.with_columns(pl.struct(['A', 'B']).alias('AB_struct'))
# Access struct field
df.with_columns(pl.col('AB_struct')['A'].alias('A_from_struct'))
# Unnest struct
df.unnest('AB_struct')
- Advanced Groupby Operations:
# Rolling groupby
df.groupby_rolling('date', period='7d').agg(pl.sum('value'))
# Dynamic groupby
df.groupby_dynamic('timestamp', every='1h', offset='30m').agg(pl.mean('value'))
# Groupby with exclusions
df.groupby('category', maintain_order=True).agg(
pl.all().exclude(['category', 'id'])
)
- Vectorized User-Defined Functions (UDFs):
import numpy as np
# Numpy UDF
@pl.api.register_vectorized_udfs(input_type=[pl.Float64], return_type=pl.Float64)
def custom_log(x):
return np.log(x)
df.with_columns(custom_log(pl.col('A')).alias('A_log'))
- Meta Operations:
# Get column names
df.columns
# Get dtypes
df.dtypes
# Get shape
df.shape
# Memory usage
df.estimated_size()
- Advanced Joining:
# Asof join
df1.join_asof(df2, left_on='date', right_on='date', by='id')
# Cross join
df1.join(df2, how='cross')
# Fuzzy join
df1.join(df2, left_on='name', right_on='name', how='left', algorithm='fuzzy', matcher='levenshtein', threshold=2)
- Polars-specific Optimizations:
# Predicate pushdown
(df.lazy()
.filter(pl.col('A') > 0)
.groupby('B')
.agg(pl.sum('C'))
.collect())
# Projection pushdown
(df.lazy()
.select(['A', 'B'])
.filter(pl.col('A') > 0)
.collect())
- Working with Missing Data:
# Fill null with different values based on condition
df.with_columns(
pl.when(pl.col('A').is_null())
.then(pl.col('B'))
.otherwise(pl.col('A'))
.alias('A_filled')
)
# Fill null with forward fill and a limit
df.with_columns(pl.col('A').fill_null(strategy='forward', limit=2))
- Advanced DateTime Operations:
# Truncate to specific time unit
df.with_columns(pl.col('datetime').dt.truncate('1d').alias('day_start'))
# Get day of week
df.with_columns(pl.col('date').dt.weekday().alias('weekday'))
# Date range
pl.date_range(start='2021-01-01', end='2021-12-31', interval='1d')
- Statistical Functions:
# Covariance
df.select(pl.covariance('A', 'B'))
# Correlation
df.select(pl.corr('A', 'B'))
# Quantile
df.select(pl.col('A').quantile(0.75))
- Advanced String Matching:
# Fuzzy matching
df.with_columns(
pl.col('text').str.fuzzy_match('pattern', threshold=80).alias('fuzzy_match')
)
This cheat sheet covers many functions and operations in Polars, which has many more features and capabilities, including advanced filtering, reshaping, time series operations, struct operations, vectorized UDFs, meta operations, performance and Polars-specific optimizations. please refer to the official docs for the most up-to-date and more comprehensive information on available functions and best practices.
## Installation ```bash pip install polars # Install Polars with all optional dependencies: pip install 'polars[all]' # You can also install a subset of all optional dependencies: pip install 'polars[numpy,pandas,pyarrow]' # We also have a conda package (however pip is the preferred way): conda install -c conda-forge polars ``` ## Usage: 1. Importing Polars: ```python import polars as pl ``` 2. Creating DataFrames: ```python # From dictionary df = pl.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']}) # From list of dictionaries df = pl.DataFrame([{'A': 1, 'B': 'a'}, {'A': 2, 'B': 'b'}]) # From CSV df = pl.read_csv('file.csv') # From Pandas DataFrame pandas_df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']}) df = pl.from_pandas(pandas_df) ``` 3. Basic DataFrame Operations: ```python # Display DataFrame print(df) # Get DataFrame info df.schema # Select columns df.select(['A', 'B']) # Filter rows df.filter(pl.col('A') > 2) # Sort DataFrame df.sort('A', reverse=True) # Add new column df.with_columns(pl.lit('new_col').alias('C')) # Rename columns df.rename({'A': 'X', 'B': 'Y'}) # Drop columns df.drop(['A', 'B']) # Group by and aggregate df.groupby('A').agg(pl.sum('B')) ``` 4. Data Manipulation: ```python # Apply function to column df.with_columns(pl.col('A').map(lambda x: x * 2).alias('A_doubled')) # Fill null values df.fill_null(strategy='forward') # Replace values df.with_columns(pl.col('A').replace({1: 10, 2: 20})) # Melt DataFrame df.melt(id_vars=['A'], value_vars=['B', 'C']) # Pivot DataFrame df.pivot(values='value', index='A', columns='variable') ``` 5. String Operations: ```python # Convert to uppercase df.with_columns(pl.col('B').str.to_uppercase()) # String contains df.filter(pl.col('B').str.contains('pattern')) # String replace df.with_columns(pl.col('B').str.replace('old', 'new')) # String length df.with_columns(pl.col('B').str.lengths().alias('B_length')) ``` 6. DateTime Operations: ```python # Parse strings to datetime df.with_columns(pl.col('date').str.strptime(pl.Datetime, '%Y-%m-%d')) # Extract components df.with_columns(pl.col('date').dt.year().alias('year')) # Date arithmetic df.with_columns((pl.col('date') + pl.duration(days=1)).alias('next_day')) ``` 7. Joining DataFrames: ```python # Inner join df1.join(df2, on='key', how='inner') # Left join df1.join(df2, on='key', how='left') # Outer join df1.join(df2, on='key', how='outer') ``` 8. Window Functions: ```python # Cumulative sum df.with_columns(pl.col('A').cum_sum().over('B')) # Rolling average df.with_columns(pl.col('A').rolling_mean(window_size=3).over('B')) # Rank df.with_columns(pl.col('A').rank().over('B')) ``` 9. IO Operations: ```python # Write to CSV df.write_csv('output.csv') # Write to Parquet df.write_parquet('output.parquet') # Read Parquet pl.read_parquet('file.parquet') ``` 10. Lazy Execution: ```python # Create lazy DataFrame lazy_df = pl.scan_csv('large_file.csv') # Define operations result = lazy_df.filter(pl.col('A') > 0).groupby('B').agg(pl.sum('C')) # Execute lazy computation result.collect() ``` 11. Advanced Filtering: ```python # Multiple conditions df.filter((pl.col('A') > 5) & (pl.col('B') < 10)) # Is in list df.filter(pl.col('A').is_in([1, 3, 5])) # Is null df.filter(pl.col('A').is_null()) # Is between df.filter(pl.col('A').is_between(5, 10)) ``` 12. Sampling: ```python # Random sample df.sample(n=10) # Stratified sample df.group_by('category').sample(n=5) ``` 13. Set Operations: ```python # Union df1.vstack(df2) # Intersection df1.join(df2, on='key', how='inner') # Difference df1.join(df2, on='key', how='anti') ``` 14. Advanced Aggregations: ```python # Multiple aggregations df.groupby('A').agg([ pl.sum('B').alias('B_sum'), pl.mean('C').alias('C_mean'), pl.n_unique('D').alias('D_unique_count') ]) # Custom aggregation df.groupby('A').agg(pl.col('B').agg_groups(lambda x: x.sum() / x.count())) ``` 15. Reshaping Data: ```python # Explode a list column df.with_columns(pl.col('list_col').explode()) # Concatenate string columns df.with_columns(pl.concat_str(['A', 'B'], separator='-').alias('A_B')) ``` 16. Time Series Operations: ```python # Resample time series df.group_by_dynamic('timestamp', every='1h').agg(pl.sum('value')) # Shift values df.with_columns(pl.col('A').shift(1).alias('A_lagged')) # Difference between consecutive rows df.with_columns((pl.col('A') - pl.col('A').shift(1)).alias('A_diff')) ``` 17. Missing Data Handling: ```python # Drop rows with any null values df.drop_nulls() # Drop rows where specific columns have null values df.drop_nulls(subset=['A', 'B']) # Interpolate missing values df.with_columns(pl.col('A').interpolate()) ``` 18. Data Type Operations: ```python # Cast column to different type df.with_columns(pl.col('A').cast(pl.Float64)) # Get unique values df.select(pl.col('A').unique()) # Count unique values df.select(pl.col('A').n_unique()) ``` 19. Advanced String Operations: ```python # Extract using regex df.with_columns(pl.col('text').str.extract(r'(\d+)', group_index=1)) # Split string into multiple columns df.with_columns([ pl.col('full_name').str.split(' ').list.get(0).alias('first_name'), pl.col('full_name').str.split(' ').list.get(1).alias('last_name') ]) ``` 20. Window Functions with Custom Sorting: ```python # Cumulative sum with custom sorting df.with_columns( pl.col('value') .cum_sum() .over(['category', 'subcategory']) .sort('date') ) ``` 21. Conditional Expressions: ```python # When-Then-Otherwise df.with_columns( pl.when(pl.col('A') > 5) .then(pl.lit('High')) .when(pl.col('A') < 2) .then(pl.lit('Low')) .otherwise(pl.lit('Medium')) .alias('A_category') ) ``` 22. Advanced IO Operations: ```python # Read JSON pl.read_json('file.json') # Read from database pl.read_database(query='SELECT * FROM table', connection_uri='postgresql://user:pass@host/db') # Write to database df.write_database(table_name='my_table', connection_uri='postgresql://user:pass@host/db') ``` 23. Performance Optimization: ```python # Parallel execution pl.Config.set_num_threads(4) # Streaming mode for large CSV files for batch in pl.read_csv('large_file.csv', batch_size=10000): process_batch(batch) ``` 24. Expressions and Custom Functions: ```python # Custom function def custom_func(x): return x * 2 + 1 # Apply custom function df.with_columns(pl.col('A').map(custom_func).alias('A_custom')) # Complex expressions df.with_columns( ((pl.col('A') * 2 + pl.col('B')) / pl.col('C')).alias('complex_calc') ) ``` 25. List Operations: ```python # Get list length df.with_columns(pl.col('list_col').list.lengths().alias('list_length')) # Get nth element from list df.with_columns(pl.col('list_col').list.get(1).alias('second_element')) # Join list elements df.with_columns(pl.col('list_col').list.join(',').alias('joined_list')) # Slice list df.with_columns(pl.col('list_col').list.slice(0, 3).alias('first_three')) ``` 26. Struct Operations: ```python # Create struct column df.with_columns(pl.struct(['A', 'B']).alias('AB_struct')) # Access struct field df.with_columns(pl.col('AB_struct')['A'].alias('A_from_struct')) # Unnest struct df.unnest('AB_struct') ``` 27. Advanced Groupby Operations: ```python # Rolling groupby df.groupby_rolling('date', period='7d').agg(pl.sum('value')) # Dynamic groupby df.groupby_dynamic('timestamp', every='1h', offset='30m').agg(pl.mean('value')) # Groupby with exclusions df.groupby('category', maintain_order=True).agg( pl.all().exclude(['category', 'id']) ) ``` 28. Vectorized User-Defined Functions (UDFs): ```python import numpy as np # Numpy UDF @pl.api.register_vectorized_udfs(input_type=[pl.Float64], return_type=pl.Float64) def custom_log(x): return np.log(x) df.with_columns(custom_log(pl.col('A')).alias('A_log')) ``` 29. Meta Operations: ```python # Get column names df.columns # Get dtypes df.dtypes # Get shape df.shape # Memory usage df.estimated_size() ``` 30. Advanced Joining: ```python # Asof join df1.join_asof(df2, left_on='date', right_on='date', by='id') # Cross join df1.join(df2, how='cross') # Fuzzy join df1.join(df2, left_on='name', right_on='name', how='left', algorithm='fuzzy', matcher='levenshtein', threshold=2) ``` 31. Polars-specific Optimizations: ```python # Predicate pushdown (df.lazy() .filter(pl.col('A') > 0) .groupby('B') .agg(pl.sum('C')) .collect()) # Projection pushdown (df.lazy() .select(['A', 'B']) .filter(pl.col('A') > 0) .collect()) ``` 32. Working with Missing Data: ```python # Fill null with different values based on condition df.with_columns( pl.when(pl.col('A').is_null()) .then(pl.col('B')) .otherwise(pl.col('A')) .alias('A_filled') ) # Fill null with forward fill and a limit df.with_columns(pl.col('A').fill_null(strategy='forward', limit=2)) ``` 33. Advanced DateTime Operations: ```python # Truncate to specific time unit df.with_columns(pl.col('datetime').dt.truncate('1d').alias('day_start')) # Get day of week df.with_columns(pl.col('date').dt.weekday().alias('weekday')) # Date range pl.date_range(start='2021-01-01', end='2021-12-31', interval='1d') ``` 34. Statistical Functions: ```python # Covariance df.select(pl.covariance('A', 'B')) # Correlation df.select(pl.corr('A', 'B')) # Quantile df.select(pl.col('A').quantile(0.75)) ``` 35. Advanced String Matching: ```python # Fuzzy matching df.with_columns( pl.col('text').str.fuzzy_match('pattern', threshold=80).alias('fuzzy_match') ) ``` Source: https://gist.github.com
Monday, March 24, 2025
Clear formatting from selected text using keyboard shortcuts in Word
To clear formatting from selected text using keyboard shortcuts:
- Press Ctrl + Spacebar to clear character formatting only (such as bold, font and font size) from selected text.
- Press Ctrl + Q to clear paragraph formatting only (such as indents and line spacing) from selected text.
- Press Ctrl + Shift + N to reapply the Normal style to selected text.
Sub ResetParagraphFormat()
'
' Reset Selection Paragraph Formatting
'
'
With Selection.ParagraphFormat
.Reset
.LeftIndent = CentimetersToPoints(0)
.RightIndent = CentimetersToPoints(0)
.SpaceBefore = 0
.SpaceBeforeAuto = False
.SpaceAfter = 0
.SpaceAfterAuto = False
.LineSpacingRule = wdLineSpaceSingle
.Alignment = wdAlignParagraphJustify
.WidowControl = False
.KeepWithNext = False
.KeepTogether = False
.PageBreakBefore = False
.NoLineNumber = False
.Hyphenation = True
.FirstLineIndent = CentimetersToPoints(0)
.OutlineLevel = wdOutlineLevelBodyText
.CharacterUnitLeftIndent = 0
.CharacterUnitRightIndent = 0
.CharacterUnitFirstLineIndent = 0
.LineUnitBefore = 0
.LineUnitAfter = 0
.MirrorIndents = False
.TextboxTightWrap = wdTightNone
.CollapsedByDefault = False
End With
End Sub
Sub SelectionClearFormatting()
'
' Clear All Formatting
'
With Selection
.ClearFormatting
End With
End Sub
Sub ResetParagraph()
' Removes manual paragraph formatting (formatting not applied using a style).
' If you manually right align a paragraph and the underlying style has a different alignment,
' the Reset method changes the alignment to match the formatting of the underlying style.
Selection.Paragraphs.Reset
End Sub
Saturday, March 22, 2025
Economic Terminology German - English - French - ihk.de
Deutsch
|
Amerikanisches Englisch |
Britisches Englisch |
Französisch
|
---|---|---|---|
Aktiengesellschaft (AG)
|
Stock Corporation (Corp. oder Inc.)
|
Public Limited Company (Plc)
|
Société Anonyme (S.A.)
|
Mitglied des Vorstandes
|
Member of the Executive Board,
Member of the Board of Management
|
Member of the Board of Management
|
Membre du Directoire
|
Stv. Mitglied des Vorstandes
|
Deputy Member of the Executive Board,
Deputy Member of the Board of Management
|
Deputy Member of the Board of Management
|
Membre Suppléant du Directoire
|
Vorsitzender des Vorstandes
|
President and Chief Executive Officer,
Chief Executive Officer,
President,
Chairman of the Executive Board,
Chairman of the Board of Management
|
Managing Director,
Chief Executive Officer,
Chairman of the Board of Management
|
Président du Directoire
|
Stv. Vorsitzender des Vorstandes
|
Deputy Chairman of the Executive Board,
Deputy Chairman of the Board of Management
|
Vice Chairman of the Board of Management
|
Vice-Président du Directoire
|
Generalbevollmächtigter
|
General Manager
|
General Manager
|
Directeur Général
|
Arbeitsdirektor
|
Executive for Labor Relations
|
Director of the Labour Relations
|
Directeur des Affaires Sociales
|
Prokurist
|
Authorized Officer
|
Authorised Officer
|
Fondé de Pouvoir
|
Handlungsbevoll-
mächtigter |
Assistant Manager
|
Assistant Manager
|
Mandataire,
Fondé de Pouvoir
|
Aufsichtsrat
|
Supervisory Board
|
Supervisory Board
|
Conseil de Surveillance
|
Mitglied des Aufsichtsrates
|
Member of the Supervisory Board
|
Member of the Supervisory Board
|
Membre du Conseil de Surveillance
|
Vorsitzender des Aufsichtsrates
|
Chairman of the Supervisory Board
|
Chairman of the Supervisory Board
|
Président du Conseil de Surveillance
|
Stv. Vorsitzender des Aufsichtsrates
|
Deputy Chairman of the Supervisory Board
|
Vice Chairman of the Supervisory Board
|
Vice-Président du Conseil de Surveillance
|
Verwaltungsrat
|
Administrative Board
|
Administrative Board
|
Conseil d´Administration
|
Vorsitzender des Verwaltungsrates
|
Chairman of the Administrative Board
|
Chairman of the Administrative Board
|
Président Directeur Général
|
Beirat
|
Advisory Board
|
Advisory Board
|
Comité Consultatif
|
Aktionär
|
Stockholder,
Shareholder
|
Shareholder
|
Actionaire
|
Hauptversammlung
|
Stockholders Meeting,
Shareholders Meeting
|
Shareholders´ Meeting,
General Meeting
|
Assemblée Générale des Actionaires
|
GmbH
|
Closed Corporation Privately-Held Corporation
|
Private Limited Company (Ltd.)
|
Société à responsabilité limitée (S.A.R.L.)
|
Geschäftsführer
|
General Manager,
Managing Director
|
Director
|
Gérant
|
Vorsitzender der Geschäftsführung
|
Chief Executive Officer,
President,
Chairman of the Board of Management
|
Chairman of the Board of Directors
|
Président Directeur Général
|
Prokurist
|
Authorized Officer
|
Authorised Officer
|
Fondé de Pouvoir Supérieur
|
Handlungsbevollmächtigter
|
Assistant Manager
|
Assistant Manager
|
Fondé de Pouvoir,
Mandataire
|
Deutsch
|
Amerikanisch
|
Englisch
|
Französisch
|
Aufsichtsrat
|
Supervisory Board
|
Supervisory Board
|
Conseil de Surveillance
|
Beirat
|
Advisory Board
|
Advisory Board
|
Comité Consultatif
|
Gesellschafter
|
Stockholder,
Shareholder
|
Shareholder,
Member
|
Associé
|
Gesellschafterversammlung
|
Stockholders Meeting,
Shareholders Meeting
|
Shareholders`Meeting,
General Meeting
|
Assemblée Général des Associés
|
OHG
|
Partnership
|
Partnership
|
Société en nom collectif
|
Gesellschafter
|
Partner
|
Partner
|
Associé
|
Geschäftsführender Gesellschafter
|
Managing Partner
|
Managing Partner
|
Associé Gérant
|
KG
|
Limited Partnership
|
Limited Partnership
|
Société en commandite simple
|
Komplementär
|
General Partner
|
General Partner
|
Commandité
|
Persönlich haftender Gesellschafter
|
General Partner
|
General Partner
|
Commandité
|
Kommanditist
|
Limited Partner
|
Limited Partner
|
Commanditaire
|
Geschäftsführender Gesellschafter
|
Managing Partner
|
Managing Partner
|
Associé Gérant
|
GmbH & Co. KG
|
Limited Partnership with Limited Company as General Partner
|
Limited Partnership with Limited Company as General Partner
|
Société à responsabilité limitée et Co., Société en commandite
|
Einzelunternehmen / Einzelkaufmann
|
Sole Proprietorship
|
Sole Proprietorship,
Sole Trader
|
Entreprise individuelle / Etablissement
|
Geschäftsinhaber
|
Proprietor
|
Proprietor
|
Propriétaire exploitant
|
Geschäftsteilhaber
|
Co-owner
|
Co-owner,
Co-Proprietor
|
Co-Proprétaire
|
Alleininhaber
|
Sole Proprietor
|
Sole Proprietor
|
Propriétaire
|
Prokurist
|
Authorized Officer
|
Authorised Officer
|
Fondé de Pouvoir
|
Verband
|
Association
|
Association
|
Association
|
Geschäftsführer
|
Managing Director
|
Director
|
Directeur
|
Hauptgeschäftsführer
|
General Executive Manager
|
Managing Director
|
Secrétaire Général
|
Präsident
|
President
|
President
|
Président
|
Vorstand / Präsidium
|
Board of Directors,
Executive Board
|
Board of Directors,
Executive Board
|
Conseil d`Administration
|
Ehrenvorsitzender
|
Honorary Chairman of the Board of Directors
|
Honorary Chairman of the Board of Directors
|
Président d`Honneur
|
Vorsitzender
|
Chairman of the Board of Directors
|
Chairman of the Executive Board,
Chairman of the Board of Directors
|
Président du Conseil d`Administration
|
Hauptausschuss
|
Executive Committee
|
Executive Committee
|
Comité Executif
|
Sonstige Titel
|
|
|
|
Abteilungsdirektor
|
Division Manager
|
Division Manager
|
Chef de Division/Département
|
Handlungsbevollmächtigter
|
Assistant Manager
|
Assistant Manager
|
Fondé de Pouvoir
|
Bevollmächtigter
|
Authorized Representative
|
Authorised Representative
|
Mandataire
|
Deutsch
|
Amerikanisches Englisch |
Britisches Englisch |
Französisch
|
Leiter der Rechtsabteilung
|
Head of Legal Department,
General Counsel
|
Head of the Legal Department,
General Counsel
|
Chef du Département juridique
|
Leiter der Personalabteilung
|
Head of Personnel Department,
Head of Human Resources Department
|
Head of the Personnel Department,
Director of Personnel
|
Chef du Personnel,
Directeur du Personnel
|
Betriebsdirektor
|
Production Manager
|
Production Manager
|
Directeur Technique
|
Werksleiter
|
Plant Manager
|
Works Manager
|
Directeur d`Usine
|
Hauptabteilungsleiter
|
Head of Division
|
Head of Division
|
Directeur de Division
|
Bereichsleiter
|
Head of Department
|
Head of Department
|
Directeur de Département
|
Betriebsleiter
|
Production Manager
|
Production Manager
|
Chef de Production
|
Tuesday, March 11, 2025
Romanian NLP
Table of contents
- Unlabeled text Corpora
- Semantic Textual Similarity / Paraphrasing
- Natural Language Inference
- Summarization
- Dialect and regional speech identification
- Named Entity Recognition (NER)
- Autorship Attribution
- Sentiment Analysis
- Dependency Parsing
- Diacritics Restoration / Grammar Correction
- Fake News / Clickbait / Satirical News
- Offensive Language
- Questions and Answering
- Spelling, Dictionaries and Gramatical Errors
Unlabeled text Corpora
The FuLG dataset is a comprehensive Romanian language corpus comprising 150 billion tokens, carefully extracted from Common Crawl.
Part of a large multilanguage corpus originated from Common Crawl. It's a raw, unannotated corpus. It has roughly 50 GB of Romanian text in 4.5 million documnets. For details check its homepage and the paper
Similar to Oscar, part of a multilanguage corpus also based on Common Crawl from 2018. Romanian text is 16GB large
Romanian language wikipedia dump.
A collection of varoius unannotated corpora collected around 2018-2019. Includes books, scraped newspapers and juridical documents
A collection of written and spoken text from various sources: Articles, Fairy tales, Fiction, History, Theatre, News
Romanian national legilation from 1881 to 2021. The corpus includes mainly: governmental decisions, ministerial orders, decisions, decrees and laws. Automatically annotated for Named Entities
Mega-COV is a billion-scale dataset from Twitter for studying COVID-19. It is available in over 100+ languages, Romanian being one of them. Tweets need to be rehydrated
A corpus of Romanian tweets related to COVID and vaccination against COVID, created and collected between January 2021 and February 2022. It contains 19319 tweets.
Minutes of the Sittings of the Chamber of Deputies of Romania (2016-2018) Unannotated corpus
contains 500k+ instances of speech from the parliament podium from 1996 to 2018. Sentence splitting and deduplication onm sentence level have been applied as processing steps Unannotated corpus
Romanian presidential discouses (1990-2020) split in 4 files one for each president. Unannotated corpus
Monolingual Romanian corpus, including content from public websites related to culture
Monolingual (ron) corpus, containing 38063991 tokens and 854096 lexical types in the law domain.
Monolingual Romanian corpus, containing 360833 sentences (9064764 words) in the public administration domain.
The New Civil Procedure Code in Romanian (monolingual) comprising 297888 words.
The Romanian updated criminal code: text with law content.
news articles dataset from romanian newssites title, summary and article
multi-language corpus from online available news sources. It contains also 43mil words in Romanian language from Twitter, Blogs and Newspapers
The Romanian novel collection for ELTeC, the European Literary Text Collection Sources: Biblioteca Metropolitana din Bucuresti, Biblioteca Universitara "Mihai Eminescu" din Iasi, Biblioteca Judeteana din Botosani, personal micro-collections uploaded on Zenodo under the following labels: "Hajduks Library"; "RomanianNovel Library"; "CityMysteries Library"; "BibliotecaDHL_Iasi"
Public dataset of 1447 manually annotated Romanian business-oriented emails. The corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes
The corpus consists of texts written by Romanian authors between 19th century and present, representing stories, short-stories, fairy tales and sketches. The current version contains 19 authors, 1263 full texts and 12516 paragraphs of around 200 words each, preserving paragraphs integrity.
A dataset containing 400 Romanian texts written by 10 authors The dataset contains stories, short stories, fairy tales, novels, articles, and sketches written by Ion Creangă, Barbu Ştefănescu Delavrancea, Mihai Eminescu, Nicolae Filimon, Emil Gârleanu, Petre Ispirescu, Mihai Oltean, Emilia Plugaru, Liviu Rebreanu, Ioan Slavici.
891 Cooking Recipes in Romanian Language
Semantic Textual Similarity / Paraphrasing
Semantic Textual Similarity dataset for the Romanian language RO-STS contains 8,628 sentence pairs with their similarity scores
A paraphprase corpus created from 10 different Romanian language Bible versions. The final dataset contains 904,815 similar records and 218,977 non matching records, totaling 1,123,927
Around ~100k examples of paraphrases. No clear explanation on how the dataset was built
A multi-language paraphrase corpus for 73 languages extracted from the Tatoeba database. It has ~ 2000 romanian phrases totaling 941 paraphrase groups.
Natural Language Inference
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs, which are obtained via distant supervision, and 6K validation and test sentence pairs, which are manually annotated with the correct labels.
The repository seems to be just an attempt at starting to build the dataset
Summarization
Around ~72k Full texts and their summary. Source seems to be news websites. No description or explanation available
Dialect and regional speech identification
varied compilation of speech samples from five distinct regions of Romania, covering both urban and rural environments. Around 2800 records labeled with age, gender and type of dialect
MOROCO: The Moldavian and Romanian Dialectal Corpus The MOROCO data set contains Moldavian and Romanian samples of text collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports, tech totaling over 32.000 labeled records
Named Entity Recognition (NER)
Autorship Attribution
Sentiment Analysis
Dependency Parsing
Diacritics Restoration / Grammar Correction
Fake News / Clickbait / Satirical News
Offensive Language
manually annotated 4,052 comments on a Romanian local news website into one of the following classes: non-offensive, targeted insults, racist, homophobic, and sexist.
4455 organic generated comments from Facebook live broadcasts annotated not binary offensive language detection tasks and for fine-grained offensive language detection
4800 Romanian comments annotated with offensive text spans Offensive span detection
3860 labeled hate speech records
Dataset consists of 5000 tweets, from which 924 were labeled as offensive (18.48 %) and 4076 tweets as non-offensive.
The corpus contains 39 245 tweets, annotated by multiple annotators, following the sexist label set of a recent study.
Questions and Answers
This dataset is just the translation of the gsm8k dataset. GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. There is no information on the quality of the translation
RoCode, a competitive programming dataset, consisting of 2,642 problems written in Romanian, 11k solutions in C, C++ and Python and comprehensive testing suites for each problem. The purpose of RoCode is to provide a benchmark for evaluating the code intelligence of language models trained on Romanian / multilingual text as well as a fine-tuning set for pretrained Romanian models.
Spelling, Dictionaries and Gramatical Errors
Synthetic dataset with ~1.9M records. Altered and correct statement as columns
Romanian Archaisms Regionalisms Lexicon containing ~ 1940 Word definitions
Romanian Rules for Dialects - 1940 regionalisms, meanings and the region of provenience