jakstys.lt/content/log/2022/uber-mock-interview-retrospective.md
2022-07-01 13:16:47 +03:00

289 lines
11 KiB
Markdown

---
title: "Uber Mock Interview Retrospective"
date: 2022-07-01T12:55:00+03:00
slug: uber-mock-interview-retrospective
---
Like mentioned in [the previous post]({{< ref "log/2022/big-tech-hiring" >}}),
I did a public mock coding interview. A reminder what that was:
- The goal was to explain how some bits of Uber's tech recruiting works.
- The [meetup page][meetup-page] had 602 attendees as of writing. We expected
quite a few participants in the event.
The mock interview consisted of:
- Introduction by Uber's EMEA recruiter Courtney Cox.
- Myself doing a coding challenge with a 50 minute cap:
- I did not know the exercise upfront.
- Although my job did not depend on it, the ticking timer and people looking
at my work (~260) made it quite stressful.
- I did not complete the exercise. According to my interviewee, I failed the
"phone screen". The good part is that I still get to keep my job. :)
- Half-hour QA session.
## TLDR: Highlights
- Lots of fun for everyone: myself, the interviewer and the spectators.
- Folks seemed to be engaged: the chat room was active throughout, and we had
more questions than time to answer them.
- Even though I have been coding Zig for the last few months, I felt like I had
a strong enough grip on it; the algorithm was the one that tripped me.
## TLDR: Lowlights
Most importantly, I did not complete the exercise. Worse, I did not even come
up with the correct algorithm, therefore the interview was an obvious failure.
I can re-apply in 6 months though!
Biggest mistake? **I did the same mistake that interviewees do all the time:
start coding without knowing the full algorithm.** This is a recipe for
failure. It is always, always better to spend 10-15 minutes with hands off the
keyboard and come up with the solution, and only then start coding.
On the same day I figured out the solution and implemented it next morning. You
can find it below. If you want to show this off in your favorite programming
language, read below in the [challenge](#optional-challenge-for-you) section.
## The Exercise and Solution
I have been coding in Zig since last February (so ~5 months now). [Loris
Cro][loris] keeps telling the me and The Internet that Zig is not suitable for
coding challenges. Well, after a couple of months of working with him, I can
finally say he is wrong! Even though my colleagues tell me Zig was tripping me
(e.g. memory leaks in the unit tests, for which I had to add `defer
hash_map.deinit()`), I think this was due to lack of experience in a particular
"coding challenge setting". Next time I will construct an [arena][arena] and be
done with memory management.
Exercise was taken from [Cracking the coding interview][cracking]:
```
// Each year, the government releases a list of the 10000 most common baby
// names and their frequencies (the number of babies with that name). The only
// problem with this is that some names have multiple spellings. For example,
// "John" and "Jon" are essentially the same name but would be listed
// separately in the list. Given two lists, one of names/frequencies and the
// other of pairs of equivalent names, write an algorithm to print a new list
// of the true frequency of each name. Note that if John and Jon are synonyms,
// and Jon and Johnny are synonyms, then John and Johnny are synonyms. (It is
// both transitive and symmetric.) In the final list, any name can be used as
// the "real" name.
//
// Example:
// Names: John (15), Jon (12), Chris (13), Kris (4), Christopher (19)
// Synonyms: (Jon, John), (John, Johnny), (Chris, Kris), (Chris, Christopher)
// Output: John (27), Kris (36)
```
### My solution
Timeline:
- 50 minutes during the interview. I almost did not use any of it except for
small parsing bits and the unit test.
- 30 minutes after cycling home right after the meetup: thinking about the
problem. At this point I realized this problem reduces to finding
disconnected graphs.
- 2 hours 15 minutes: coding. The result of that is below.
{{< highlight zig "linenos=table" >}}
const std = @import("std");
const mem = std.mem;
const fmt = std.fmt;
const Order = std.math.Order;
const Allocator = std.mem.Allocator;
const ArrayList = std.ArrayList;
const ArrayListUnmanaged = std.ArrayListUnmanaged;
const StringHashMap = std.StringHashMap;
const PriorityQueue = std.PriorityQueue;
// for priority queue
fn lessThan(_: void, a: u32, b: u32) Order {
return std.math.order(a, b);
}
pub fn solution(
allocator: Allocator,
names: []const u8,
synonyms: []const u8,
) error{OutOfMemory}![]const u8 {
var arena1 = std.heap.ArenaAllocator.init(allocator);
defer arena1.deinit();
var arena = arena1.allocator();
var name2id = StringHashMap(u32).init(arena);
var pairs = ArrayList([2]u32).init(arena);
// populate name2id and pairs
const total_members = blk: {
var it = mem.tokenize(u8, synonyms, ", ()");
var idx: u32 = 0;
while (true) {
const left = it.next() orelse break;
const right = it.next().?;
var pair: [2]u32 = undefined;
var i: u2 = 0;
for (&[_][]const u8{ left, right }) |val| {
const result = try name2id.getOrPut(val);
if (!result.found_existing) {
result.value_ptr.* = idx;
pair[i] = idx;
idx += 1;
} else pair[i] = result.value_ptr.*;
i += 1;
}
try pairs.append(pair);
}
// now add all "lone" names that do not have aliases
var it2 = mem.tokenize(u8, names, "(), 0123456789");
while (it2.next()) |name| {
const result = try name2id.getOrPut(name);
if (!result.found_existing) {
result.value_ptr.* = idx;
idx += 1;
}
}
break :blk idx;
};
// create id2name for printing the results
var id2name = try arena.alloc([]const u8, total_members);
{
var it = name2id.iterator();
while (it.next()) |val|
id2name[val.value_ptr.*] = val.key_ptr.*;
}
var graph = try arena.alloc(ArrayListUnmanaged(u32), total_members);
mem.set(ArrayListUnmanaged(u32), graph, ArrayListUnmanaged(u32){});
// populate graph
for (pairs.items) |pair| {
try graph[pair[0]].append(arena, pair[1]);
try graph[pair[1]].append(arena, pair[0]);
}
// navigate through graph. This is DFS:
// - "visited" is a list of user ids that we should not go into.
// - "unvisited" is a queue of user ids that we need to visit. This is
// the driver of the loop: work until this is non-empty.
var visited = try arena.alloc(bool, total_members);
mem.set(bool, visited, false);
// everyone is unvisited now
var unvisited = PriorityQueue(u32, void, lessThan).init(arena, {});
try unvisited.ensureTotalCapacity(total_members);
for (id2name) |_, i|
try unvisited.add(@intCast(u32, i));
// id2synonym is mapping from userid to synonym_id. It just so conveniently
// happens that the synonym_id points to a user id.
var id2synonym = try arena.alloc(u32, total_members);
// traverse the graph and populate id2synonym
{
var synonym_id: u32 = 0;
// scratch is our DFS temporary storage: while traversing the member
// list, which ones to go to when we're done with the current one?
var scratch = PriorityQueue(u32, void, lessThan).init(arena, {});
while (unvisited.removeOrNull()) |i| : (synonym_id += 1) {
if (visited[i]) continue;
try scratch.add(i);
while (scratch.removeOrNull()) |j| {
visited[j] = true;
id2synonym[j] = synonym_id;
for (graph[j].items) |k|
if (!visited[k])
try scratch.add(k);
}
}
}
var id2count = try arena.alloc(u32, total_members);
mem.set(u32, id2count, 0);
// calculate id2count from names and id2synonym
{
var it = mem.tokenize(u8, names, ", ()");
while (true) {
const name = it.next() orelse break;
const id = name2id.get(name).?;
const count = fmt.parseInt(u32, it.next().?, 10) catch unreachable;
id2count[id2synonym[id]] += count;
}
}
var result = ArrayList(u8).init(allocator);
const wr = result.writer();
for (id2count) |count, id| {
if (count == 0) continue;
if (id != 0) result.appendSlice(", ") catch unreachable;
wr.print("{s} ({d})", .{ id2name[id], count }) catch unreachable;
}
return result.toOwnedSlice();
}
const tests = [_]struct {
names: []const u8,
synonyms: []const u8,
want: []const u8,
}{
.{
.names = "John (15), Jon (12), Chris (13), Kris (4), Christopher (19), Žvangalas (10)",
.synonyms = "(Jon, John), (John, Johnny), (Chris, Kris), (Chris, Christopher)",
.want = "Jon (27), Chris (36), Žvangalas (10)",
},
.{
.names = "John (15), Jon (12), Chris (13), Kris (4), Christopher (19)",
.synonyms = "(Jon, John), (John, Johnny), (Chris, Kris), (Chris, Christopher)",
.want = "Jon (27), Chris (36)",
},
.{
.names = "John (15), Jon (12), Johnny (5), Johnn (4), Johnathan (3)",
.synonyms = "(Jon, John), (Johnn, Johnny), (Johnathan, Jon), (John, Johnny)",
.want = "Jon (39)",
},
};
const testing = std.testing;
test "example" {
for (tests) |tt| {
const got = try solution(testing.allocator, tt.names, tt.synonyms);
defer testing.allocator.free(got);
try testing.expectEqualStrings(tt.want, got);
}
}
{{< / highlight >}}
[Jakub Konka][jakub] was watching the interview too! His comment to the
solution above is:
- Your solution looks good to me and I think I'd be oscillating roughly around
the same solution too.
- You've used arena in an interesting way to init string sets: I think I'd use
an unmanaged version and initialize on first use.
- But it's fine either way.
## Optional: challenge for you
Inclined to show off your solution in Zig or your favorite programming
language? Post it to the comment in [meetup page][meetup-page] (preferably use
a [public pastebin][pastebin] to keep the comment size reasonable), and I will
paste my favorite ones here with your name. Please include the time it took you
to code it. The main criteria is, of course, lines of code. :)
## Thanks
Many thanks to Brigita Žemgulytė, Courtney Cox, Ignas Kaziukėnas and Mantas
Mikšys for making this happen. I would do it again.
[loris]: https://kristoff.it/
[meetup-page]: https://www.meetup.com/uber-engineering-events-vilnius/events/286542203/
[jakub]: https://www.jakubkonka.com/
[pastebin]: https://paste.mozilla.org/
[cracking]: https://www.crackingthecodinginterview.com/
[arena]: https://github.com/ziglang/zig/blob/0.9.1/lib/std/heap/arena_allocator.zig#L6-L7